Course pages 2013–14
Data Centric Systems and Networking
Principal lecturer: Dr Eiko Yoneki
Taken by: MPhil ACS, Part III
Code: R212
Hours: 16 (Eight 2-hour seminar sessions (combination of lectures and reading club))
Class limit: 17 students
Prerequisites: An undergraduate network architectures course
Aims
This module provides an introduction to data centric systems and networking, where data is a token in programming flow and networking and its impact on the computer system's architecture. Large-scale distributed applications with big data processing will grow ever more in importance and become a pervasive aspect of the lives of millions of users. Supporting the design and implementation of robust, secure, and heterogeneous large-scale distributed systems is essential.
Syllabus
This course provides various perspectives on data centric systems and networking, including content-based routing, data-flow programming, stream processing, and large-scale graph data processing, thus providing a solid basis to work on the next generation of distributed systems and communication paradigms.
The module consists of 8 sessions, with 5 sessions on specific aspects of data-centric systems and networking research. Each session discusses 2-3 papers, led by the assigned students. One session is a hands-on tutorial on MapReduce using data flow programming with Amazon EC2. The 1st session advises on how to read/review a paper together with a brief introduction of different perspectives in data-centric systems. The last session is dedicated to the presentation of the open-source project studies presented by the students. One guest lecture is planned, covering inspiring current research on stream processing systems.
- Introduction to Data Centric Systems and Networking
- Content-Centric Networking (CCN) and Content Distribution Networks (CDN)
- Programming in Data Centric Environment
- MapReduce Hands-on Tutorial using CIEL with Amazon EC2
- Stream Data Processing and Data/Query Model
- Large-scale Graph Structured Data: Network, Storage, and Parallel Processing
- Network holds Data in Delay Tolerant Networks
- Presentation of Open Source Project Study
Objectives
On completion of this module, students should:
- Understand key concepts of data centric approaches in future networking and systems.
- Obtain a clear understanding of building distributed systems using data centric programming and large-scale data processing.
Coursework
Reading Club:
- The reading club will involve 1-3 papers every week. At each session, around 2-3 papers are selected under the given topic, and the students present their review work.
- Hands-on tutorial session of MapReduce parallel computing using CIEL data flow programming with Amazon EC2, including writing an application of processing streaming in Twitter data.
Reports:
The following three reports are required, which could be extended from the assignment of the reading club or a different one within the scope of data centric systems and networking.
- Review report on a full length of paper (max 1800 words)
- Describe the contribution of the paper in depth with criticisms
- Crystallise the significant novelty in contrast to other related work
- Suggestions for future work
- Survey report on sub-topic in data centric networking (max 2000 words)
- Pick up to 5 papers as core papers in the survey scope
- Read the above and expand reading through related work
- Comprehend the view and finish an own survey paper
- Project study and exploration of a prototype (max 2500 words)
- What is the significance of the project in the research domain?
- Compare with similar and succeeding projects
- Demonstrate the project by exploring its prototype
The reports 1 and 2 should be handed in by the end of 5th week and 7th week of the course (not in any particular order). The report 3 should be handed in by the end of the Lent term.
Assessment
The final grade for the course will be provided as a percentage, and the assessment will consist of two parts:
- 30% for reading club (participation, presentation)
- 70% for the three reports:
- 20%: Intensive review report
- 20%: Survey report
- 30%: Project study
Recommended reading
[1] Malewicz, G., Austern, M., Bik, A., Dehnert, J., Horn, I., Leiser, N. & G. Czajkowski: Pregel: A System for Large-Scale Graph Processing, SIGMOD, 2010.
[2] Jacobson, V., Smetters, D.K., Thornton, J.D., Plass, M.F., Briggs,
N.H., & R.L. Braynard: Networking named content, CoNEXT, 2009.
[3] Murray, D., Schwarzkopf, M., Smowton, C., Smith, S., Madhavapeddy,
A., & Hand, S.: Ciel: a universal execution engine for distributed
data-flow computing, NSDI, 2010.
[4] Bhatotia, P., Wieder, A., Rodrigues, R., Acar, A.,
Pasquini A:
Incoop: MapReduce for incremental computation,
ACM SOCC, 2011.
[5] Hong, S., Chafi, H., Sedlar, E., Olukotun, K.:
Green-Marl: A DSL for Easy and Efficient Graph Analysis, ASPLOS, 2012.
A complete list can be found on the course material web page. See also 2012-2013 course material web page http://www.cl.cam.ac.uk/~ey204/teaching/ACS/R202_2012_2013.