Computer Laboratory

Course pages 2014–15

Data Centric Systems and Networking

Principal lecturer: Dr Eiko Yoneki
Taken by: MPhil ACS, Part III
Code: R212
Hours: 16 (Eight 2-hour seminar sessions (combination of lectures and reading club))
Class limit: 17 students
Prerequisites: Undergraduate network architectures and operating systems courses

Aims

This module provides an introduction to data centric systems and networking, where data is a token in programming flow and networking and its impact on the computer system's architecture. Large-scale distributed applications with big data processing will grow ever more in importance and become a pervasive aspect of the lives of millions of users. Supporting the design and implementation of robust, secure, and heterogeneous large-scale distributed systems is essential.

Syllabus

This course provides various perspectives on data centric systems and networking, including content-based routing, data-flow programming, stream processing, and large-scale graph data processing, thus providing a solid basis to work on the next generation of distributed systems and communication paradigms.

The module consists of 8 sessions, with 5 sessions on specific aspects of data-centric systems and networking research. Each session discusses 2-3 papers, led by the assigned students. One session is a hands-on tutorial on MapReduce using data flow programming with Amazon EC2. The 1st session advises on how to read/review a paper together with a brief introduction of different perspectives in data-centric systems. The last session is dedicated to the presentation of the open-source project studies presented by the students. One guest lecture is planned, covering inspiring current research on stream processing systems.

  1. Introduction to Data Centric Systems and Networking 
  2. Content-Centric Networking (CCN) and Content Distribution Networks (CDN) 
  3. Programming in Data Centric Environment
  4. MapReduce Hands-on Tutorial using CIEL with Amazon EC2   
  5. Stream Data Processing and Data/Query Model  
  6. Large-scale Graph Structured Data: Network, Storage, and Parallel Processing 
  7. Network holds Data in Delay Tolerant Networks  
  8. Presentation of Open Source Project Study

Objectives

On completion of this module, students should:

  • Understand key concepts of data centric approaches in future networking and systems.
  • Obtain a clear understanding of building distributed systems using data centric programming and large-scale data processing.

Coursework

Reading Club:

  • The reading club will involve 1-3 papers every week. At each session, around 2-3 papers are selected under the given topic, and the students present their review work.
  • Hands-on tutorial session of MapReduce parallel computing using CIEL data flow programming with Amazon EC2, including writing an application of processing streaming in Twitter data.

Reports:

The following three reports are required, which could be extended from the assignment of the reading club or a different one within the scope of data centric systems and networking.

  1. Review report on a full length of paper (max 1800 words)
    • Describe the contribution of the paper in depth with criticisms
    • Crystallise the significant novelty in contrast to other related work
    • Suggestions for future work
  2. Survey report on sub-topic in data centric networking (max 2000 words)
    • Pick up to 5 papers as core papers in the survey scope
    • Read the above and expand reading through related work
    • Comprehend the view and finish an own survey paper
  3. Project study and exploration of a prototype (max 2500 words)
    • What is the significance of the project in the research domain?
    • Compare with similar and succeeding projects
    • Demonstrate the project by exploring its prototype

The reports 1 and 2 should be handed in by the end of 5th week and 7th week of the course (not in any particular order). The report 3 should be handed in by the end of the Michaelmas Term.

Assessment

The final grade for the course will be provided as a percentage, and the assessment will consist of two parts:

  1. 20%: for reading club (participation, presentation)
  2. 80%: for the three reports:
    • 20%: Intensive review report
    • 25%: Survey report
    • 35%: Project study

Recommended reading

[1] Malewicz, G., Austern, M., Bik, A., Dehnert, J., Horn, I., Leiser, N. & G. Czajkowski: Pregel: A System for Large-Scale Graph Processing, SIGMOD, 2010.
[2] Jacobson, V., Smetters, D.K., Thornton, J.D., Plass, M.F., Briggs, N.H., & R.L. Braynard: Networking named content, CoNEXT, 2009.
[3] Bhatotia, P., Wieder, A., Rodrigues, R., Acar, A., Pasquini A: Incoop: MapReduce for incremental computation, ACM SOCC, 2011.
[4] Hong, S., Chafi, H., Sedlar, E., Olukotun, K.: Green-Marl: A DSL for Easy and Efficient Graph Analysis, ASPLOS, 2012.
[5] E. Zeitler and T.Risch: Massive scale-out of expensive continuous queries, VLDB, 2011.
[6] A. Kyrola and G. Blelloch: Graphchi: Large-scale graph computation on just a PC, OSDI, 2012. 
[7] D. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, M. Abadi: Naiad: A Timely Dataflow System, SOSP, 2013. 
[8] J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin: Powergraph: distributed graph-parallel computation on natural
graphs. OSDI, 2012.

A complete list can be found on the course material web page. See also 2013-2014 course material web page:  http://www.cl.cam.ac.uk/~ey204/teaching/ACS/R212_2013_2014.