Computer Laboratory

Large-Scale Data Processing and Optimisation (2020-2021 Michaelmas Term)

LSDPO - R244


Open Source Projects

Reading Club papers











Open Source Project Study

 Candidates for Open Source Project Study

The list is not exhausted. If you take anything other than the one in the list, please discuss with me. The purpose of this assignment is to understand the prototype of the proposed architecture, algorithms, and systems through running an actual prototype and present/explain to the other people how the prototype runs, any additional work you have done including your own applications and setup process of the prototype. This experience will give you better understanding of the project. These Open Source Projects come with a set of published papers and you should be able to examine your interests in the paper through running the prototype. Some projects are rather large and may require extensive environment and time; make sure you are able to complete this assignment.

Suggested projects are in red colour font.

  1. Ciel,

  2. Apache Hadoop

  3. DryadLINQ

  4. MapReduce Online

  5. Naiad: data-parallel dataflow computation, and (Rust version)

  6. Apache Giraph: Graph processing based on BSP

  7. Spark: Fast Cluter Computing

  8. X-Stream:

  9. Storm:

  10. GraphX:

  11. Tensorflow:

  12. Chaos:

  13. PyTorch:

  14. CNTK:

  15. Ray:,

  16. RLgraph:

  17. BoTorch: h

  18. BOAT:

  19. Saber:

  20. Snorkel / FlyingSquid:

  21. Park: (

  22. Pyro:

  23. Apache Flink: