Large-Scale Data Processing and Optimisation (2018-2019 Michaelmas Term)

Overview

This module provides an introduction to large-scale data processing, optimisation, and the impact on computer system's architecture. Large-scale distributed applications with high volume data processing such as training of machine learning will grow ever more in importance. Supporting the design and implementation of robust, secure, and heterogeneous large-scale distributed systems is essential. To deal with distributed systems with a large and complex parameter space, tuning and optimising computer systems is becoming an important and complex task, which also deals with the characteristics of input data and algorithms used in the applications. Algorithm designers are often unaware of the constraints imposed by systems and the best way to consider these when designing algorithms with massive volume of data. On the other hand, computer systems often miss advances in algorithm design that can be used to cut down processing time and scale up systems in terms of the size of the problem they can address. Integrating machine learning approaches for system optimisation will also be explored in this course. On completion of this module, the students should:

Understand key concepts of scalable data processing approaches in future computer systems. Obtain a clear understanding of building distributed systems using data centric programming and large-scale data processing.
Understand a large and complex parameter space in computer system's optimisation and applicability of Machine Learning approach.

Module Structure

The module consists of 8 sessions, with 5 sessions on specific aspects of large-scale data processing research. Each session discusses 3-4 papers, led by the assigned students. One session is a hands-on tutorial on MapReduce using data flow programming and/or Deep Neural Networks using Google TensorFlow. The 1st session advises on how to read/review a paper together with a brief introduction on different perspectives in large-scale data processing and optimisation. The last session is dedicated to the student presentation of open-source project studies. One guest lecture is planned, covering inspiring current research on stream processing systems.

Schedule and Reading List

We’ll meet in SW01 every Wednesday (from October 10 to November 28) in 2018. The time slot is 11:00-13:00.1

2018/10/10 Session 1: Introduction to Large-Scale Data Processing and Optimisation

Introduction to R244 (Slides)
- Assignment details
- Guidance of how to read/review/present a paper
- Guidance to Open Source Project
Overview of technologies for Big Data Processing (Slides)

2018/10/17 Session 2: Data flow programming: Map/Reduce to TensorFlow

Data flow programming, Cluster Computing

1. Yuan Yu, Michael Isard, D. Fetterly, M. Budiu, U. Erlingsson, P.K. Gunda, J. Currey:
DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language, OSDI, 2008.

Shyam Tailor (slides)
*2.M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. Franklin, S. Shenker, I. Stoica:
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, NSDI, 2013.

3. Peter Alvaro, Tyson Condie, Neil Conway, Khaled Elmeleegy, Joseph M. Hellerstein, Russell Sears:
Boom analytics: exploring data-centric, declarative programming for the cloud, Eurosys 2010.

*4. J. Dean, S. Ghemawat: MapReduce: Simplified Data Processing on Large Clusters, OSDI, 2004.

Tejas Kannan (slides)
5. Derek Murray, Malte Schwarzkopf, Christopher Smowton, Steven Smith, Anil Madhavapeddy and Steven Hand:
Ciel: a universal execution engine for distributed data-flow computing, NSDI 2011.

*6. Naiad

Frank McSherry's Talk on Differential Dataflow is here.

6.1. Frank McSherry, Rebecca Isaacs, Michael Isard, and Derek G. Murray,
Composable Incremental and Iterative Data-Parallel Computation with Naiad, no. MSR-TR-2012-105, 2012.

Indigo Orton (slides)
6.2. D. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, M. Abadi: Naiad: A Timely Dataflow System, SOSP, 2013.

7. P. Bhatotia, A. Wieder, R. Rodrigues, U. A. Acar, and R. Pasquini: Incoop: MapReduce for incremental computation, ACM SOCC, 2011.

Aaron Solomom (slides)
*8. M. Abadi et al. Tensorflow: A system for large-scale machine learning. OSDI, 2016.

M. Abadi et al.: TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems, Preliminary White Paper, 2015.

9. M. Looks et al.: Deep Learning with Dynamic Computation Graphs, ICLR, 2017.

*10. M. Abadi, M. Isard and D. Murray: A Computational Model for TensorFlow - An Introduction, MAPL, 2017.

*11. Y. Yu et al.: Dynamic Control Flow in Large-Scale Machine Learning, EuroSys, 2017.

Devin Taylor (slides)
*12. R. Nishihara, P. Moritz, et al.: Ray:A Distributed Framework for Emerging AI Applications, OSDI, 2018.

2018/10/24 Session 3: Large-scale graph data processing: Processing models

Scalable distributed processing of graph structured data, processing model, and programming model

Vikash Singh (slides)
*1. G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, and G. Czajkowski:
Pregel: A System for Large-Scale Graph Processing, SIGMOD, 2010.
4. J. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin: Powergraph: distributed graph-parallel computation on natural graphs. OSDI, 2012.

2. Z. Qian, X. Chen, N. Kang, M. Chen, Y. Yu, T. Moscibroda, Z.Zhang: MadLINQ: large-scale distributed matrix computation for the cloud, EuroSys, 2012.

3. Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, J. Hellerstein: Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud, VLDB, 2012.

Dmitry Kazhdan (slides)
*5. .J. Shun and G. Blelloch: Ligra: A Lightweight Graph Processing Framework for Shared Memory, PPoPP, 2013.

6. J. Gonzalez, R. Xin, A. Dave, D. Crankshaw, M. Franklin, I. Stoica: GraphX: Graph Processing in a Distributed Dataflow Framework, OSDI, 2014.

7. B. Shao, H. Wang, Y. Li: Trinity: A Distributed Graph Engine on a Memory Cloud, SIGMOD, 2013.

8. A. Kyrola and G. Blelloch: Graphchi: Large-scale graph computation on just a PC, OSDI, 2012.

Marek Strelec (slides)
*9. A. Roy, I. Mihailovic, W. Zwaenepoel: X-Stream: Edge-Centric Graph Processing using Streaming Partitions, SOSP, 2013.

10. A. Roy, L. Bindschaedler, J. Malicevic and W. Zwaenepoel: Chaos: Scale-out Graph Processing from Secondary Storage , SOSP, 2015.

11. F. McSherry, M. Isard and D. Murray: Scalability! But at what COST? , HOTOS, 2015.

12. X. Hu, Y. Tao, C. Chung: Massive Graph Triangulation, SIGMOD, 2013.

13. W. Xie, G. Wang, D.Bindel, A. Demers, J. Gehrke: Fast Iterative Graph Computation with Block Updates, VLDB, 2014.

Michael Tang (slides)
*14. S. Hong, H. Chafi, E. Sedlar, K.Olukotun: Green-Marl: A DSL for Easy and Efficient Graph Analysis, ASPLOS, 2012.

15. D. Prountzos, R. Manevich, K. Pingali: Elixir: A System for Synthesizing Concurrent Graph Programs, OOPSLA, 2012.

16. D. Nguyen, A. Lenharth, K. Pingali: A Lightweight Infrastructure for Graph Analytics, SOSP 2013.

17. D. Merrill, M. Garland, A. Grimshaw: Scalable GPU Graph Traversal, PPoPP, 2012.

Sami Alabed (slides)
18. A. Gharaibeh, E. Santos-Neto, L. Costa, M. Ripeanu: Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems, IEEE TPC, 2014.

19. H. Dai, Z. Kozareva, B. Dai, A. Smola and L. Song: Learning Steady-States of Iterative Algorithms over Graphs, ICML, 2018.

21. K. Nilakant, V. Dalibard, A. Roy, and E. Yoneki: PrefEdge: SSD Prefetcher for Large-Scale Graph Traversal. ACM International Systems and Storage Conference (SYSTOR), 2014.

2018/10/31 Session 4: Stream Data Processing and Data/Query Model

Data and continuous query in steam data processing

Guest lecture: Alexandros Koliousis (Imperial College London)

Title: The design of a hybrid stream processing system for heterogeneous servers (slides)

Abstract: In the era of big data and machine learning, many data-intensive applications exhibit requirements (for instance, supporting high throughput with sub-second latency) that cannot be satisfied by traditional batch processing models. Current stream processing systems, such as Spark Streaming or Apache Flink, exploit the aggregated throughput of many processing servers to do so. Meanwhile, however, modern servers have become heterogeneous, often combining multi-core CPU with many-core GPU processors. Such heterogeneous architectures have the potential to improve the performance of data-intensive stream processing applications, but they are not well supported by current systems. For a system to exploit a heterogeneous server, it must execute streaming queries with sufficient data parallelism to fully utilise all available processors, and decide how to use each in the most effective way. It must do this while respecting the window semantics of streaming queries. This talk describes the design of Saber, a hybrid high-performance relational stream processing system for CPU and GPU processors. Saber executes window-based streaming queries in a data-parallel fashion using all available CPU and GPU cores. Instead of statically assigning query operators to heterogeneous processors, Saber employs a new adaptive heterogeneous lookahead scheduling strategy, which increases the share of queries executing on the processor that yields the highest performance. Our experimental comparison against state-of-the-art engines shows that Saber increases processing throughput while maintaining low latency for a wide range of streaming SQL queries with both small and large window sizes. Using the Yahoo Streaming Benchmark we also show that Saber processes 79 million tuples per second with 8 CPU cores on a single node, outperforming Flink (by 3x) and Spark Streaming (by 7x) cluster-based deployment with 40 CPU cores. But even these results with Saber are not satisfactory, as there is still a large performance gap between handwritten code and current operator implementations targeting a particular processor architecture. Our latest work focuses on improving CPU operator implementations by exploiting superscalar execution and SIMD parallelism.

Reading Club:

1. T. Akidau, A. Balikov, K. Bekiroglu, S. Chernyak, J. Haberman, R. Lax, S. McVeety, D. Mills, P. Nordstrom, S. Whittle:
MillWheel: Fault-Tolerant Stream Processing at Internet Scale , VLDB, 2013..

2. V. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, I. Stoica: Discretized Streams: Fault-Tolerant Streaming Computation at Scale, SOSP, 2013.

3. R. Fernandez, M. Migliavacca, E. Kalyvianaki, P. Pietzuch: Making State Explicit for Imperative Big Data Processing, USENIX ATC, 2014.

4. D. Abadi, Y. Ahmad, M. Balazinska et al. : The Design of the Borealis Stream Processing Engine, CIDR, 2005.

5. S. Babu, J. Widom: Continuous Queries over Data Streams, SIGMOD Record 30(3), 2001.

6. B.Gedik, H. Andrade, K. Wu, P. Yu, and M. Doo: SPADE: the system S Declarative Stream Processing Engine , SIGMOD. 2008.

7. T. Akidau, R. Bradshaw, C. Chambers, S. Chernyak, R. Fernandez-Moctezuma, R. Lax, S. McVeety, D. Mills, F. Perry, E. Schmidt, S. Whittle:
The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing, VLDB, 2015.

8. R. Cheng, J. Hong, A. Kyrola, Y. Miao, X. Weng, M. Wu, F. Yang, L. Zhou, F. Zhao, E. Chen:
Kineograph: Taking the Pulse of a Fast-Changing and Connected World, EuroSys, 2012.

Cristian Bodnar (slides)
*9. A. Floratou et al.: Dhalion: self-regulating stream processing in Heron, VLDB, 2017.

*10. T. Li, Z. Xu, J. Tang and Y. Wang: Model-Free Control for Distributed Stream Data Processing using Deep Reinforcement Learning, VLDB, 2018.

Tejas Kannan (slides)
*11. D. O’Keeffe, T. Salonidis, and P. Pietzuch: Frontier: Resilient Edge Processing for the Internet of Things, VLDB, 2018.

2018/11/07 Session 5: Map/Reduce and Deep Neural Network using TensorFlow Hands-on Tutorial

This tutorial session will be in SW02 and will start at 11:15 (until 13:15).
Data Flow Programming Tutorial: Hands-on tutorial session of distributed Map/Reduce using TensorFlow. (Tutor: Michael Schaarschmidt, Assistant: Chris Richardson).
http://www.cambridgeplus.net/tutorials/2018/
Be familiar to use Python
If you want to work using your laptop (Linux), bring it with you!
past year's tutorial session using Naiad and CIEL:
http://www.cambridgeplus.net/tutorials/2015/NaiadTutorial_2015/index.html
http://www.cambridgeplus.net/tutorials/2014/CIEL-DCN/

2018/11/14 Session 6: Machine Learning for Optimisation of Computer Systems

Exploring machine learning for optimisation in computer systems

1. J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. Le, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, A. Ng.:
Large scale distributed deep networks. NIPS, 2012.

2. G. Venkates et al.: Accelerating Deep Convolutional Networks using low-precision and sparsity, ICASSP, 2017.

3. V. Mnih et al.: Asynchronous Methods for Deep Reinforcement Learning, ICML, 2016.

Indigo Orton (slides)
*4 V. Dalibard, M. Schaarschmidt, and E. Yoneki: BOAT: Building Auto-Tuners with Structured Bayesian Optimization, WWW, 2017.

5. J. Ansel et al. Opentuner: an extensible framework for program autotuning. PACT, 2014.

*6. B. Bodin, L. Nardi, MZ Zia et al.: Integrating Algorithmic Parameters into Benchmarking and Design Space Exploration in 3D Scene Understanding, PACT, 2016.

Sami Alabed (slides)
*7. J. Ansel et al.
Petabricks: A language and compiler for algorithmic choice. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI, 2009.

8. V. Mnih et al.: Playing Atari with Deep Reinforcement Learning, NIPS, 2013.

9. J. Snoek, H. Larochelle, and R. Adams: Practical Bayesian Optimization of Machine Learning Algorithms, NIPS, 2012.

10. B. Teabe et al.: Application-specific quantum for multi-core platform scheduler, EuroSys, 2016.

11. G. Tesauro et al.: A Hybrid Reinforcement Learning Approach to Autonomic Resource Allocation, ICAC, 2006.

12. A. Valadarsky et al.: A Machine Learning Approach to Routing, arXiv, 2017.

14. K. Arulkumaran et al.: A Brief Survey of Deep Reinforcement Learning, IEEE Journal of Signal Processing, 2017.

15. F. Hutter et al.: Algorithm runtime prediction: Methods&evaluation, Elsevier J. AI, 2014.

*16. A. Mirhoseini, A. Goldie, H. Pham, B. Steiner, Q. Le and J. Dean: A Hierarchical Mode for Device Placement, ICLR, 2018.

Michael Tang (slides)
*17. T. Kraska, A. Beutel, E. Chi, and J. Dean: The Case for Learned Index Structures, SIGMOD, 2018.

19. H. Liu, K. Simonyan, and Y. Yang: DARTS: Differentiable Architecture Search, arXiv, 2018.

Devin Taylor (slides)
*20. M. Jaderberg, V. Dalibard, S. Osindero, W.M. Czarnecki: Population based training of neural networks, arXiv, 2017.

*21. S. Palkar, J. Thomas, A. Shanbhagy, D. Narayanan, H. Pirky, M. Schwarzkopfy, S. Amarasinghey, and M. Zaharia:
Weld: A Common Runtime for High Performance Data Analytics, CIDR, 2017.

22. S. Palkar, J. Thomas, D. Narayanan, P. Thaker, R. Palamuttam, P. Negi, A. Shanbhag, M. Schwarzkopf, H. Pirk, S. Amarasinghe, S. Madden, M. Zaharia:
Evaluating End-to-End Optimization for Data Analytics Applications in Weld, VLDB, 2018.

23. H. Dai, E. Khalil, Y. Zhang, B. Dilkina, L. Song: Learning Combinatorial Optimization Algorithms over Graphs, NIPS, 2017.

Vikash Singh (slides)
*24. E. Liang et al.: RLlib: Abstractions for Distributed Reinforcement Learning, ICML, 2018.

*25. M. Schaarschmidt, S. Mika, K. Fricke, E. Yoneki: RLgraph: Flexible Computation Graphs for Deep Reinforcement Learning, ArXiv, 2018.

*26. A. Klein, S. Falkner, S. Bartels, P. Hennig, F. Hutter: Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets, AISTAS, 2017.

*27. D. Kingma, J. Ba: Adam: A Method for Stochastic Optimization, ICLR, 2015.

2018/11/21 Session 7: Task scheduling, Performance, and Resource Optimisation

Optimisation examples in computer systems (e.g. scheduling, resource allocation...)

Shyam Tailor (slides)
*1. A. Mirhoseini et al.: Device Placement Optimization with Reinforcement Learning, ICML, 2017.

2. F. Yang et al.: LFTF: A Framework for Efficient Tensor Analytics at Scale, VLDB, 2017.

3. Y. You et al.: Scaling Deep Learning on GPU and Knights Landing clusters, SC, 2017.

4. I. Gog, M. Schwarzkopf, A. Gleave, R. Watson, S. Hand: Firmament: fast, centralized cluster scheduling at scale, OSDI, 2016.

Dmitry Kazhdan (slides)
*5. O. Alipourfard et al.: CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics, NSDI, 2017.

6. C. Delimitrou et al.: Quasar: Resource-Efficient and QoS-Aware Cluster Management, ASPLOS, 2014.

7. H. Mao et al.: Neural Adaptive Video Streaming with Pensieve, SIGCOMM, 2017.

8. S. Venkataraman et al.: Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics, NSDI, 2016.

9. N. Roy et al.: Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting, CLOUD, 2011.

10. K. LaCurts et al.: Cicada: Introducing Predictive Guarantees for Cloud Networks, HOTCLOUD, 2014.

11. M. Carvalho et al.: Long-term SLOs for reclaimed cloud computing resources, SOCC, 2014.

12. H. Hoffmann et al.: Dynamic Knobs for Responsive Power-Aware Computing, Asplos, 2011.

13. N.J. Yadwadkar, B. Hariharan, J. Gonzalez and R. Katz: Faster Jobs in Distributed Data Processing using Multi-Task Learning, SDM, 2015.

14. X. Dutreih et al.: Using Reinforcement Learning for Autonomic Resource Allocation in Clouds: Towards a Fully Automated Workflow, ICAS, 2011.

15. J. Eastep et al.: Smart Data Structures: An Online Machine Learning Approach to Multicore Data Structures, ICAC, 2011.

16. H. Hoffmann et al.: SEEC: A Framework for Self-aware Management of Multicore Resources, MIT Technical Report, 2011.

17. E. Ipek et al.: Self-Optimizing Memory Controllers: A Reinforcement Learning Approach, ISCA, 2008.

18. Y. Kang et al.: Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge, ASPLOS, 2017.

19. S. Teerapittayanon et al.: Distributed Deep Neural Networks over the Cloud, the Edge and End Devices, ICDCS, 2017.

20. B. Zoph et al.: Learning Transferable Architectures for Scalable Image Recognition, arXiv, 2017.

21. D. Golovin et al.: Google Vizier: A Service for Black-Box Optimization, KDD, 2017.

22. D. Baylor et al.: TFX: A TensorFlow-Based Production-Scale Machine Learning Platform, KDD, 2017.

23. H. Mao et al.: Resource Management with Deep Reinforcement Learning, HotNets, 2016.

24. M. Raghu et al.: On the Expressive Power of Deep Neural Networks, PMLR, 2017.

25. D. Aken et al.: Automatic Database Management System Tuning Through Large-scale Machine Learning, SIGMOD, 2017.

26. A. Pavlo et al.: Self-Driving Database Management Systems, CIDR, 2017.

Cristian Bodnar (slides)
*29. Z. Jia, M. Zaharia, and A. Aiken: Beyond Data and Model Parallelism for Deep Neural Networks, ICML, ArXiv, 2018.

*30. L. Espeholt et al.: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures, ICML, 2018.

Aaron Solomom (slides)
*31. T. Chen, T. Moreau, Z. Jiang, L. Zheng, S. Jiao, E. Yan, H. Shen, M. Cowan, L. Wang, Y. Hu, L. Ceze, C. Guestrin, and A. Krishnamurthy:
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning, OSDI, 2018.

32. T. Chen, T. Moreau, Z. Jiang, L. Zheng, S. Jiao, E. Yan, H. Shen, M. Cowan, L. Wang, Y. Hu, L. Ceze, C. Guestrin, and A. Krishnamurthy:
TVM: End-to-End Compilation Stack for Deep Learning, SysML, 2017.

Marek Strelec (slides)
*33. A. Ratner, S. Bach, H. Ehrenberg, J. Fries, S. Wu, and C. Ré: Snorkel: Rapid Training Data Creation with Weak Supervision, VLDB, 2017.

34. A. Ratner, B. Hancock, J. Dunnmon, R. Goldman, and C. Ré: Snorkel MeTaL: Weak Supervision for Multi-Task Learning, DEEM, 2018.

2018/11/28 Session 8: Presentation of Open Source Project Study

Start @11:00 in SW01
Presentation of Open Source Project Study by all (~7- 8 minutes of presentation plus Q&A for each presentation)

11:00 Aaron Solomon (Naiad) Parallel Graph Genome Assembly in Naiad (slides)
11:10 Chi Ian Tang (Snorkel) Object Detection in Snorkel (slides)
11:20 Cristian Bodnar (TensorFlow +(PyTorch)) NeuroEvolution of Augmented Topologies in TensorFlow (slides)
11:30 Devin Taylor (PyTorch) Investigating scalability of recurrent network using dynamic batching in PyTorch (slides)
11:40 Dmitry Kazhdan (Snorkel) Multi-Modal Training Data Creation with Snorkel (slides)
11:50 Indigo Orton (RLGraph) Assessing RLGraph prototyping capability (slides)
12:00 Marek Strelec (Ray) Fast decoding in neural machine translation with Ray (slides)
12:10 Sami Alabed (Ray+Snorkel) Weak Supervised Learning on Ray using Snorkel (slides)
12:20 Shyam Tailor (Snorkel) Using Snorkel to Generate Human Activity Data Labels (slides)
12:30 Tejas Kannan (Apache Spark ) Benchmarking Apache Spark against Elasticsearch on Partial-Matching Textual Queries (slides)
12:40 Vikash Singh (Keras (TF, Theano, CNTK)) Analysing Keras Performance Using Tensorflow, Theano, and CNTK backends (slides)

12:50 Wrap-up Discussion

Coursework 1 (Reading Club)

The reading club will require you to read between 1 and 3 papers every week. You need to fill out simple review_log (MS word format, text format) prior to each session and email me by the end of Sunday. The minimum requirement of review_log is one per session, but you can read as many as you want and fill the review_log for each paper you read. review_log is not marked but 'tick'.

At each session, 3 - 4 papers are selected under the session topic, and if you are assigned to present your review work, please prepare 15-20 minutes slides for presenting your review work. Your presented material should also be emailed by the following day Wednesday. You would present your review work approximately twice during the course. The paper includes following two types and you can focus on the specified aspects upon reviewing the paper.

Full length papers
- What is the significant contribution?
- What is the difference from the existing works?
Short length papers
- What is the novel idea?
- What is required to complete the work?

Coursework 2 (Reports)

The following three reports are required, which could be extended from the reading assignment of the reading club or a different one within the scope of data centric networking.

Review report on a full length of paper (max 1800 words)
- Describe the contribution of paper in depth with criticism
- Crystallise the significant novelty in contrast to the other related work
- Suggestion for future work
Survey report on sub-topic in data centric networking (~1800 - max 2000 words)
- Pick up to 5 papers as core papers in your survey scope
- Read the above and expand your reading through related work
- Comprehend your view and finish as your survey paper
- See how to write a survey paper
Project study and exploration of a prototype (max 2500 words)
- See Open Source Projects
- What is the significance of the project in the research domain?
- Compare with the similar and succeeding projects
- Demonstrate the project by exploring its prototype
- Please email your project selection (MS word format or text format <150 words) by 17:00 on November 1, 2018.
- Project presentation on November 28, 2018.

The report 1 should be handed in by the end of 5^th week (November 9, 2018 - 16:00) and the report 2 by 7^th week (November 30, 2018 - 16:00). The report 3 should be by the end of the Michaelmas term (January 16, 2019 - 16:00 - but if you could finish by 20th of December, that will be good!).

Assessment

The final grade for the course will be provided as a letter grade or percentage and the assessment will consist of two parts:

25%: for reading club (Participation, Presentation + tick of review_log and hands-on tutorial)
- 10%: Presentation
- 15%: Participation
75%: for the three reports

15%: Intensive review report
25%: Survey report
35%: Project study

Open Source Projects

See the candidates of Open Source Projects in data centric networking. The list is not exhausted. If you take anything other than the one in the list, please discuss with me. The purpose of this assignment is to understand the prototype of the proposed architecture, algorithms, and systems through running an actual prototype and present/explain to the other people how the prototype runs, any additional work you have done including your own applications and setup process of the prototype. This experience will give you better understanding of the project. These Open Source Projects come with a set of published papers and you should be able to examine your interests in the paper through running the prototype. Some projects are rather large and may require extensive environment and time; make sure you are able to complete this assignment.

How to Read/Review a Paper

The following papers aid how to read/review a paper.

S. Keshav: How to Read a Paper, ACM SIGCOMM Computer Communication Review 83 Volume 37, Number 3, July 2007.
T. Roscoe: Writing Reviews for Systems Conferences, 2007.
Simon Peyton-Jones: How to write a great paper and give a great talk about it, Microsoft Research Cambridge.
David A. Patterson: How to Have a Bad Career in Research/Academia, 2001.

Further supplement: see ‘how to read/review a paper’ section in Advanced Topics in Computer Systems by Steven Hand.

Presentations

Presentations should be about 15-20 minutes long, where you need to cover the following aspects.

What are the background and the problem domain of the paper? What is the motivation of the presented work? What is the difference from the existing works? What is the novel idea? How did the paper change/unchange the research in the research community?
What is the significant contribution? How did the authors tackle the problem? Did the authors obtain expected result from their trial?
How do you like the paper and why? What is the takeaway message to you (and to research community)? What is required to complete the work?

The following document aids in presenting a review.

Perdita Stevens in University of Edinburgh How to [read, present, review] a research paper.

How to write a survey paper

A survey paper provides the readers with an exposition of existing work that is comprehensive and organized. It must expose relevant details associated in the surveying area, but it is important to keep a consistent level of details and to avoid simply listing the different works. Thus a good survey paper should demonstrate a summary of recent research results in a novel way that integrates and adds understanding to work in the field. For example, you can take an approach by classifying the existing literature in your own way; develop a perspective on the area, and evaluate trends. Thus, after defining the scope of your survey, 1) classify and organize the trend, 2) critical evaluation of approaches (pros/cons), and 3) add your analysis or explanation (e.g. table, figure). Also adding reference and pointer to further in-depth information is important (summary from Rich Wolski’s note).

Papers for OS Principles (Distributed Storage and Deterministic Parallelism)

Following papers will help you to understand distributed storage and parallelism.

Systems Research and System Design

1. B. Lampson: Hints for Computer Systems Design (Revised), ACM OSR 1983.

Distributed Storage

2. S. Ghemawat, H. Gobioff, and S. Leung: The Google File System, ACM SOSP 2003.
3. F. Chang et al: BigTable: A Distributed Storage System for Structured Data, USENIX OSDI 2006.
4. G. DeCandia et al: Dynamo: Amazon's Highly Available Key-value Store, ACM SOSP 2007.

Deterministic Parallelism

5. J. Devietti et al: DMP: Deterministic Shared Memory Multiprocessing, ACM ASPLOS 2009.
6. A. Aviram, et al: Efficient System-Enforced Determistic Parallelism, USENIX OSDI 2010.
7. T. Liu et al: Dthreads: Efficient and Determistic Multithreading, ACM SOSP 2011.

Contact Email

Please email to eiko.yoneki@cl.cam.ac.uk for your submission of course work or any question.

Computer Laboratory