Data Centric Systems and Networking (2013-2014 Lent Term)
This module provides an introduction to data centric systems and networking where data is a communication token in networking and its impact on the computer system's architecture. Large-scale distributed applications with big data processing will grow ever more in importance and become a pervasive aspect of the lives of millions of users. Supporting the design and implementation of robust, secure, and heterogeneous large-scale distributed systems is essential. This course provides various perspectives on data centric systems and networking, including content-based routing, data-flow programming, stream processing, and large-scale graph data processing, thus providing a solid basis to work on the next generation of communication paradigms and system design. On completion of this module, the students should:
The module consists of 8 sessions, of which 6 sessions focus on a specific aspect of the topic in data centric networking and systems research. Each session discusses 2-3 papers, led by the assigned students. Each student will present about 2 paper reviews during the course. The first session advises how to read/review a paper and a brief introduction of different perspectives in data centric networking. The last session is dedicated to the presentation of the open source project studies present by the students. One hands-on session on data-flow programing and one guest lectures are planned (subject to change), covering inspiring current research in the data centric networking and systems domain.
Schedule and Reading List
We’ll meet in SW01 every Tuesday (from January 21 to March 11) in 2014. The time slot is 9:00-11:00 on Tuesday except February 18 and 25.
2014/01/21 Session 1: Introduction to Data Centric Systems and Networking
2014/01/28 Session 2: Programming in Data Centric Environment
Ilias Giechaskiel (slides)
1. Yuan Yu, Michael Isard, D. Fetterly, M. Budiu, U.
Erlingsson, P.K. Gunda, J. Currey:
DryadLINQ: A System for General-Purpose
Distributed Data-Parallel Computing Using a High-Level Language, OSDI, 2008.
4. J. Dean, S. Ghemawat: MapReduce: Simplified Data Processing on Large Clusters, OSDI, 2004.
Niko Stahl (slides)
5. Derek Murray, Malte Schwarzkopf, Christopher Smowton, Steven Smith, Anil Madhavapeddy and Steven Hand: Ciel: a universal execution engine for distributed data-flow computing, NSDI 2011.
6.1. Frank McSherry, Rebecca Isaacs, Michael Isard, and Derek G. Murray, Composable Incremental and Iterative Data-Parallel Computation with Naiad, no. MSR-TR-2012-105, 2012.
6.2. D. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, M. Abadi: Naiad: A Timely Dataflow System, SOSP, 2013.
7. P. Bhatotia, A. Wieder, R. Rodrigues, U. A. Acar, and R. Pasquini: Incoop: MapReduce for incremental computation, ACM SOCC, 2011.
8. Dionysios Logothetis, Christopher Olston, Benjamin Reed, Kevin Webb and Kenneth Yocum: Stateful Bulk Processing for Incremental Analytics, SOCC, 2010.
2014/02/04 Session 3: Processing Models of Large-Scale Graph Data
1. J. Pujol, V.
Erramilli, G. Siganos, X.
Yang, N. Laoutaris, P.
Chhabra, P. Rodriguez:
The Little Engine(s) That Could: Scaling Online Social
Networks, SIGCOMM, 2010.
Valentin Dalibard (slides)
2. G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, and G. Czajkowski: Pregel: A System for Large-Scale Graph Processing, SIGMOD, 2010.
3. U. Kang, C. E.
Tsourakakis, C. Faloutsos:
peta-scale graph mining system - Implementation
Haikal Pribadi (slides)
4. Z. Qian, X. Chen, N. Kang, M. Chen, Y. Yu, T. Moscibroda, Z.Zhang: MadLINQ: large-scale distributed matrix computation for the cloud, EuroSys, 2012.
Will Sewell (slides)
5. S. Hong, H. Chafi, E. Sedlar, K.Olukotun: Green-Marl: A DSL for Easy and Efficient Graph Analysis, ASPLOS, 2012.
6. Dimitrios Prountzos Roman Manevich Keshav Pingali: Elixir: A System for Synthesizing Concurrent Graph Programs, OOPSLA, 2012.
7. D. Nguyen, A. Lenharth, K. Pingali: A Lightweight Infrastructure for Graph Analytics, SOSP 2013.
Gustaf Helgesson (slides)
8. J. E. Gonzalez, Y. Low, H. Gu, D.
Bickson, and C. Guestrin:
graph-parallel computation on natural
9. .J. Shun and G. Blelloch: Ligra: A Lightweight Graph Processing Framework for Shared Memory, PPoPP, 2013.
2014/02/11 Session 4: MapReduce Handson Tutorial with Amazon EC2
2014/02/21 Session 5: Graph Data Processing in Resource Limited Environment
1. W. Han, S. Lee, K. Park, J. Lee, M. iKim, J. Kim,
H. Yu: TurboGraph: A Fast
Parallel Graph Engine Handling
Niko Stahl (slides)
2. A. Kyrola and G. Blelloch: Graphchi: Large-scale graph computation on just a PC, OSDI, 2012.
3. A. Roy, I. Mihailovic, W. Zwaenepoel: X-Stream: Edge-Centric Graph Processing using Streaming Partitions, SOSP, 2013.
lias Giechaskiel (slides)
4. X. Hu1, Y. Tao, C. Chung: Massive Graph Triangulation, SIGMOD, 2013.
5. W. Xie, G. Wang, D.Bindel, A. Demers, J. Gehrke: Fast Iterative Graph Computation with Block Updates, VLDB, 2014.
Will Sewell (slides)
6. J. Zhong, B. He: Medusa: Simplified Graph Processing on GPUs, IEEE TPDS, 2013.
7. A. Gharaibeh, E. Santos-Neto, L. Costa, M. Ripeanu: Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems, IEEE TPC, 2014.
2014/02/28 Session 6: Stream Data Processing and Data/Query Model
1. V. Gulisano, R. Jimenez-Peris, M. Patiño-Martinez, P. Valduriez: StreamCloud: A Large Scale Data Streaming System, ICDCS, 2010.
2. V. Zaharia, T.
Das, H. Li, T.
Hunter, S. Shenker, I.
Fault-Tolerant Streaming Computation at Scale,
6. B.Gedik, H. Andrade, K. Wu, P. Yu, and M. Doo: SPADE: the system S Declarative Stream Processing Engine , SIGMOD. 2008.
8. Raymond Cheng,Ji Hong,Aapo Kyrola,Youshan Miao,Xuetian Weng,Ming Wu,Fan Yang,Lidong Zhou,Feng Zhao,Enhong Chen: Kineograph: Taking the Pulse of a Fast-Changing and Connected World, EuroSys, 2012.
2014/03/04 Session 7: Data Centric Networking
1.2.1. V. Jacobson, D. Smetters, J. Thornton, M. Plass, N. Briggs, R. Braynard: Networking Named Content, CoNEXT, 2009.
Will Sewell (slides)
1.2.2. V. Jacobspresentation/S7/Will_DND.pdfon, D. Smetters, J. Thornton, M. Plass, N. Briggs, R. Braynard: Networking Named Content, CACM, January, 2012.
1.3. A. Ghodsi, T. Koponen, B. Raghavan, S. Shenker, A. Singla, and J. Wilcox: Information-Centric Networking: Seeing the Forest for the Trees, HotNets, 2011.
1.4. P. Jokela, A. Zahemszky, C. E. Rothenberg, S. Arianfar, and P. Nikander: LIPSIN: Line Speed Publish/Subscribe Inter-networking, SIGCOMM, 2009.
1.5. George Xylomenos, Xenofon Vasilakos, Christos Tsilopoulos, Vasilios A. Siris, and George C. Polyzos: Caching and Mobility Support in a Publish-Subscribe Internet Architecture, IEEE Communication, Vol 50, Issue 7, 2012.
1.6. Md. F. Bari, S. Chowdhury, R. Ahmed, R. Boutaba, and B. Mathieu: A Survey of Naming and Routing in Information-Centric Networks, IEEE Communication, Vol 50, Issue 7, 2012.
2. A. Carzaniga, A.L. Wolf: Forwarding in a content-based network, SIGCOMM, 2003.
2.2.1. S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker: A scalable content addressable network, SIGCOMM, 2001.
2014/03/11 Session 8: Presentation of Open Source Project Study
11:45-12:00 Wrap-up Discussion (slides)
Coursework 1 (Reading Club)
The reading club will require you to read between 1
and 3 papers every week. You need to fill out a review_log
(MS word format,
format) prior to each session and email me by 12:00 noon on
Monday. The minimum requirement of review_log is one per session, but
you can read as many as you want and fill the review_log for each paper you
Coursework 2 (Reports)
The following three reports are required, which could be extended from the reading assignment of the reading club or a different one within the scope of data centric networking.
The reports 1 and 2 should be handed in by the end of 5th week (Feb 21, 2014 - 12:00 noon ) and 7th week (March 14, 2014 - 12:00 noon) of the course (not in any particular order). The report 3 should be by the end of the Lent term (April 4, 2014 - 12:00 noon).
The final grade for the course will be provided as a letter grade or percentage and the assessment will consist of two parts:
Open Source Projects
See the candidates of Open Source Projects in data centric networking. The list is not exhausted. If you take anything other than the one in the list, please discuss with me. The purpose of this assignment is to understand the prototype of the proposed architecture, algorithms, and systems through running an actual prototype and present/explain to the other people how the prototype runs, any additional work you have done including your own applications and setup process of the prototype. This experience will give you better understanding of the project. These Open Source Projects come with a set of published papers and you should be able to examine your interests in the paper through running the prototype. Some projects are rather large and may require extensive environment and time; make sure you are able to complete this assignment.
How to Read/Review a Paper
The following papers aid how to read/review a paper.
Further supplement: see ‘how to read/review a paper’ section in Advanced Topics in Computer Systems by Steven Hand
Presentations should be about 20-25 minutes long, where you need to cover the following aspects.
The following document aids in presenting a review.
How to write a survey paper
A survey paper provides the readers with an exposition of existing work that is comprehensive and organized. It must expose relevant details associated in the surveying area, but it is important to keep a consistent level of details and to avoid simply listing the different works. Thus a good survey paper should demonstrate a summary of recent research results in a novel way that integrates and adds understanding to work in the field. For example, you can take an approach by classifying the existing literature in your own way; develop a perspective on the area, and evaluate trends. Thus, after defining the scope of your survey, 1) classify and organize the trend, 2) critical evaluation of approaches (pros/cons), and 3) add your analysis or explanation (e.g. table, figure). Also adding reference and pointer to further in-depth information is important (summary from Rich Wolski’s note).
Papers for OS Principles (Distributed Storage and Deterministic Parallelism)
3. F. Chang et al: BigTable: A Distributed Storage System for Structured Data, USENIX OSDI 2006.
4. G. DeCandia et al: Dynamo: Amazon's Highly Available Key-value Store, ACM SOSP 2007.
6. A. Aviram, et al: Efficient System-Enforced Determistic Parallelism, USENIX OSDI 2010.
7. T. Liu et al: Dthreads: Efficient and Determistic Multithreading, ACM SOSP 2011.
Please email to firstname.lastname@example.org for your submission of course work or any question.