Computer Laboratory

Large-Scale Data Processing and Optimisation (2017-2018 Michaelmas Term)

LSDPO - R244

review_log

Open Source Projects

Reading Club papers

Contact

 

 

 

 

 

 

 

  

 

Open Source Project Study

 Candidates for Open Source Project Study

The list is not exhausted. If you take anything other than the one in the list, please discuss with me. The purpose of this assignment is to understand the prototype of the proposed architecture, algorithms, and systems through running an actual prototype and present/explain to the other people how the prototype runs, any additional work you have done including your own applications and setup process of the prototype. This experience will give you better understanding of the project. These Open Source Projects come with a set of published papers and you should be able to examine your interests in the paper through running the prototype. Some projects are rather large and may require extensive environment and time; make sure you are able to complete this assignment.

Suggested projects are in red colour font.

  1. Ciel http://github.com/mrry/skywriting, http://www.cl.cam.ac.uk/netos/ciel/

  2. Apache Hadoop http://hadoop.apache.org/

  3. DryadLINQ http://research.microsoft.com/en-us/projects/dryadlinq/

  4. MapReduce Online http://code.google.com/p/hop/

  5. STREAM http://infolab.stanford.edu/stream/

  6. TelegraphCQ http://telegraph.cs.berkeley.edu/telegraphcq/v0.2/

  7. DSN http://db.cs.berkeley.edu/dsn/

  8. Naiad: data-parallel dataflow computation http://research.microsoft.com/en-us/projects/naiad/, and https://github.com/frankmcsherry/timely-dataflow (Rust version)

  9. Apache Giraph: Graph processing based on BSP http://incubator.apache.org/giraph/

  10. Spark: Fast Cluter Computing http://spark-project.org/

  11. GPS: A Graph Processing System http://infolab.stanford.edu/gps/

  12. GraphLab/PowerGraph: Graph Processing http://graphlab.org/

  13. Clousera Impala: https://github.com/cloudera/impala

  14. Medusa:  http://gc.codehum.com/p/medusa-gpu/

  15. Graphchi https://github.com/GraphChi

  16. X-Stream: http://labos.epfl.ch/x-stream

  17. Storm:  http://storm-project.net/

  18. GraphX: https://github.com/amplab/graphx

  19. DeepDive: http://deepdive.stanford.edu/

  20. Tensorflow: https://www.tensorflow.org/

  21. TensorForce: https://github.com/reinforceio/tensorforce

  22. Chaos: https://github.com/epfl-labos/chaos

  23. PyTorch: http://pytorch.org/

  24. CNTK: https://docs.microsoft.com/en-us/cognitive-toolkit/  https://github.com/Microsoft/CNTK

  25. Kubernetes: https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/