Large-Scale Data Processing and Optimisation (2025-2026 Michaelmas Term)

Overview

This module provides an introduction to large-scale data processing, optimisation, and the impact on computer system's architecture. Large-scale distributed applications with high volume data processing such as training of machine learning will grow ever more in importance. Supporting the design and implementation of robust, secure, and heterogeneous large-scale distributed systems is essential. To deal with distributed systems with a large and complex parameter space, tuning and optimising computer systems is becoming an important and complex task, which also deals with the characteristics of input data and algorithms used in the applications. Algorithm designers are often unaware of the constraints imposed by systems and the best way to consider these when designing algorithms with massive volume of data. On the other hand, computer systems often miss advances in algorithm design that can be used to cut down processing time and scale up systems in terms of the size of the problem they can address. Integrating machine learning approaches (e.g. Bayesian Optimisation, Reinforcement Learning) for system optimisation will also be explored in this course.

Recent computer systems have a massive task to enable heavy data processing for fast training and inference, e.g. dealing with Large Language Model (LLM). This demands more sophisticated system architecture, advanced hardware, and fast tensor operation capable compilers. Complex optimisation plays a crucial role here and it is fundamental at various stages of the development.

On completion of this module, the students should:

Understand key concepts of scalable data processing approaches in future computer systems
Obtain a clear understanding of building distributed systems using data centric programming and large-scale data processing
Understand a large and complex parameter space in computer system's optimisation and applicability of Machine Learning approach

Module Structure

The module consists of 8 sessions, with 5 sessions on specific aspects of large-scale data processing research. Each session discusses 4-5 papers, led by the assigned students. One session is a hands-on tutorial on dataflow programming fundamentals. The first session advises on how to read/review a paper together with a brief introduction on different perspectives in large-scale data processing and optimisation. The last session is dedicated to the student presentation of opensource project studies.

Schedule and Reading List

All the sessions will be in the class room (FW26). We will meet every Wednesday (from October 15 to December 3) in 2025. The time slot is 10:00-12:00.1

2025/10/15 Session 1: Introduction to Large-Scale Data Processing and Optimisation

Introduction to R244 (slides: Course Guide)
- Assignment details
- Guidance of how to read/review/present a paper
- Guidance to Open Source Mini Project
Overview of technologies for a large scale data processing and optimisation (slides: Overview)

2025/10/22 Session 2: Data flow programming

Data flow programming and Cluster Computing are essential for a large scale data processing. In ML, dataflow programming holds the key to natural, modular, streamlined ML specification integrated with pre- and post-processing and covering typical ML needs.

1. Yuan Yu, Michael Isard, D. Fetterly, M. Budiu, U. Erlingsson, P.K. Gunda, J. Currey:
DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language, OSDI, 2008.

2. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. Franklin, S. Shenker, I. Stoica:
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, NSDI, 2013.

3. J. Gjengset, M. Schwarzkopf, J. Behrens, L. T. Araujo, M. Ek, E. Kohler, M. F. Kaashoek and R. Morris:
Noria: dynamic, partially-stateful data-flowfor high-performance web applications, OSDI 2018.

4. J. Dean, S. Ghemawat: MapReduce: Simplified Data Processing on Large Clusters, OSDI, 2004.

Deniz Alkan (slides)
5. Derek Murray, Malte Schwarzkopf, Christopher Smowton, Steven Smith, Anil Madhavapeddy and Steven Hand:
Ciel: a universal execution engine for distributed data-flow computing, NSDI 2011.

6. Naiad Frank McSherry's Talk on Differential Dataflow is here.

6.1. Frank McSherry, Rebecca Isaacs, Michael Isard, and Derek G. Murray,
Composable Incremental and Iterative Data-Parallel Computation with Naiad, no. MSR-TR-2012-105, 2012.

Florian Klein (slides)
6.2. D. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, M. Abadi: Naiad: A Timely Dataflow System, SOSP, 2013.

7. P. Bhatotia, A. Wieder, R. Rodrigues, U. A. Acar, and R. Pasquini: Incoop: MapReduce for incremental computation, ACM SOCC, 2011.

8. M. Abadi et al. Tensorflow: A system for large-scale machine learning. OSDI, 2016.

Woon Yee Ng (slides)

8.1 M. Abadi et al.: TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems, Preliminary White Paper, 2015.

8.2. M. Abadi, M. Isard and D. Murray: A Computational Model for TensorFlow - An Introduction, MAPL, 2017.

9. M. Looks et al.: Deep Learning with Dynamic Computation Graphs, ICLR, 2017.

10. Y. Yu et al.: Dynamic Control Flow in Large-Scale Machine Learning, EuroSys, 2017.

11. R. Nishihara, P. Moritz, et al.: Ray:A Distributed Framework for Emerging AI Applications, OSDI, 2018.

12. M. Schaarschmidt, S. Mika, K. Fricke, E. Yoneki: RLgraph: Flexible Computation Graphs for Deep Reinforcement Learning, SysML, 2019.

14. S. Li, Y. Zhao, R. Varma, et. al: PyTorch Distributed: Experiences on Accelerating Data Parallel Training, VLDB, 2020.

15. T. Lévai, F. Németh, and G. Rétvári: Batchy Batch scheduling Data Flow Graphs with Service level Objective, NSDI, 2020.

Yavuz Ferhatosmanoglu (slides)

16 . P. Barham, et al.: Pathways: Asynchronous Distributed Dataflow for ML, MLSys, 2022.

17 . L. Zheng, et al.: Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning, OSDI, 2022.

Ognen Pendarovski (slides)

18 . H. Zhu, et al.: MSRL: Distributed Reinforcement Learning with Dataflow Fragments, USENIX_ATC, 2023.

2025/10/29 Session 3: Data Flow Programming Tutorial

This tutorial session will be in FW26. You will use your own laptop.

Data Flow Programming Tutorial Hands-on tutorial session of distributed Data Flow Programing.
Tutor: Sami Alabed (Google DeepMind email: sami.alabed@gmail.com) and assistants: Youhe Jiang (email: yj367@cam.ac.uk).

Be familiar to use Python.
Linux is preferable, but any OS in the laptop should be fine!

Past year's tutorial sessions:
2024: Data Flow Programming Tutorial
2021: Dataflow programming using TensorFlow
2020: Dataflow programming using TensorFlow
2019: Dataflow programming using TensorFlow
2015: Naiad Tutorial
2014: Dataflow programming using CIEL

2025/11/05 Session 4: Large-scale graph data processing and Search Space Optimisation

Scalable distributed processing of graph structured data, processing model, and programming model.

Search Space Optimisation and optimisation methodologies.

1. G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, and G. Czajkowski: Pregel: A System for Large-Scale Graph Processing, SIGMOD, 2010.

2. Z. Qian, X. Chen, N. Kang, M. Chen, Y. Yu, T. Moscibroda, Z.Zhang: MadLINQ: large-scale distributed matrix computation for the cloud, EuroSys, 2012.

3. Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, J. Hellerstein: Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud, VLDB, 2012.

Shreyas Ravishankar (slides)
4. J. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin: Powergraph: distributed graph-parallel computation on natural graphs. OSDI, 2012.

Ruquaiya Shuaibu (slides)
5. J. Shun and G. Blelloch: Ligra: A Lightweight Graph Processing Framework for Shared Memory, PPoPP, 2013.

6. J. Gonzalez, R. Xin, A. Dave, D. Crankshaw, M. Franklin, I. Stoica: GraphX: Graph Processing in a Distributed Dataflow Framework, OSDI, 2014.

8. A. Kyrola and G. Blelloch: Graphchi: Large-scale graph computation on just a PC, OSDI, 2012.

Jan Pytel (slides)
9. A. Roy, I. Mihailovic, W. Zwaenepoel: X-Stream: Edge-Centric Graph Processing using Streaming Partitions, SOSP, 2013.

10. A. Roy, L. Bindschaedler, J. Malicevic and W. Zwaenepoel: Chaos: Scale-out Graph Processing from Secondary Storage , SOSP, 2015.

11. F. McSherry, M. Isard and D. Murray: Scalability! But at what COST? , HOTOS, 2015.

14. S. Hong, H. Chafi, E. Sedlar, f K.Olukotun: Green-Marl: A DSL for Easy and Efficient Graph Analysis, ASPLOS, 2012.

15. D. Prountzos, R. Manevich, K. Pingali: Elixir: A System for Synthesizing Concurrent Graph Programs, OOPSLA, 2012.

19. Z. Jia, Y. Kwon, G. Shipman, P. McCormick, M. Erez, A. Aiken: A Distributed Multi-GPU System for Fast Graph Processing , VLDB, 2018.

21. K. Nilakant, V. Dalibard, A. Roy, and E. Yoneki: PrefEdge: SSD Prefetcher for Large-Scale Graph Traversal. ACM International Systems and Storage Conference (SYSTOR), 2014.

22. L. Bindschaedler, J. Malicevic, N. Schiper, A. Goel, W. Zwaenepoel: Rock you like a Hurricane: taming skew in large scale anaylitcs. EuroSys, 2018.

24. T. Hamilton, et al.: Inductive Representation Learning on Large Graphs. NIPS, 2017.

Carl Seifert (slides)

27. J. Ansel et al. Opentuner: an extensible framework for program autotuning. PACT, 2014.

Syed Hasan (slides)

28. J. Ansel et al.: Petabricks: A language and compiler for algorithmic choice. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI, 2009.

29. O. Alipourfard et al.: CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics, NSDI, 2017.

2025/11/12 Session 5: Probabilistic Programming

Guest lecture: Brooks Paige (University of College London) @11:00.

Title: Deep generative models with latent variables (slides)

Abstract: In this lecture I will give an introduction to auto-encoding variational Bayes, a flexible framework for defining and training deep generative models from data. These models define a probabilistic model as a “decoder” or generator from a latent representation space, while simultaneously training an “encoder” which approximates the Bayesian posterior over representations given data. I will give a brief introduction to probabilistic generative models more broadly, describe more specifically how (and why) to train these particular deep generative models, and (time permitting) briefly show two applications: (i) using the continuous latent representation space for optimisation; and (ii) semi-supervised learning, where we have a large set of unlabelled data but only a small number of labels and would like to fit a predictive model.

Bio: Brooks Paige is an Associate Professor in Machine Learning in the AI Centre, Department of Computer Science, since 2019. Prior to joining UCL he was Turing Research Fellow, from 2016 to 2019, at the University of Cambridge and the Alan Turing Institute. He holds a D.Phil in Engineering Science from the University of Oxford (2016), an M.A. in Statistics from Columbia University (2013), and a B.A. in Mathematics from Amherst College (2006). His research focuses include Bayesian inference, deep generative models, AI for drug design, and explainable machine learning.

Reading Club @10:00.

1. E. Bingham et al.: Pyro: Deep Universal Probabilistic Programming, Journal of Machine Learning Research, 2019.

2. D. Tran et al.: Edward: A library for probabilistic modeling, inference, and criticism, arXiv, 2017.

4. F. Wood, J. van de Meent, V. Mansinghka: A new approach to probabilistic programming inference, AISTATS, 2014.

5. B. Paige and F. Wood: A compilation target for probabilistic programming languages, ICML, 2014.

6. J. Ai et al.: HackPPL: a universal probabilistic programming language, MAPL, 2019.

Andreas Pletschko (slides)
7. V. Dalibard, M. Schaarschmidt, and E. Yoneki: BOAT: Building Auto-Tuners with Structured Bayesian Optimization, WWW, 2017.

9. W. Neiswanger et al.: ProBO: Versatile Bayesian Optimization Using Any Probabilistic Programming Language, Arxiv, 2019.

11. M. Balandat et al.: BOTORCH: Bayesian Optimization in PyTorch, Arxiv 2020.

Juntong Deng (slides)

14 . J. Shao, et al.: Tensor Program Optimization with Probabilistic Programs, NeurIPS, 2022.

16. Maraval, A. et al.: Sample-Efficient Optimisation with Probabilistic Transformer Surrogates, ArXiv, 2022.

17. J. Snoek, H. Larochelle, and R. Adams: Practical Bayesian Optimization of Machine Learning Algorithms, NIPS, 2012.

18. A. Klein, S. Falkner, S. Bartels, P. Hennig, F. Hutter: Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets, AISTAS, 2017.

19. R. Liaw, E. Liang, R. Nishihara, P. Moritz, J. Gonzalez, I. Stoica: Tune: A Research Platform for Distributed Model Selection and Training, ICML, 2018.

Florian Klein (slides)

21. Z. Wang, C. Gehring, P. Kohli, and S. Jegelka: Batched Large-scale Bayesian Optimization in High-dimensional Spaces, AISTATS, 2018.

22. Grosnit et al.: High-Dimensional Bayesian Optimisation with Variational Autoencoders and Deep Metric Learning, arxiv, 2021.

25. C. Lin, J. Miano, and E. Dyer: Bayesian optimization for modular black-box systems with switching costs, UAI, 2021.

26. E. H. Lee, D. Eriksson, V. Perrone and M. Seeger: A Nonmyopic Approach to Cost-Constrained Bayesian Optimization, UAI, 2021.

27. G. Claret, et al.: Bayesian Inference using Data Flow Analysis, ESEC/FSE, 2013.

30. A. Lew, et al.: Probabilistic Programming with Stochastic Probabilities, PLDI, 2023.

2025/11/19 Session 6: Optimisations in Computer Systems

Many faces of Optimisations in Computer Systems with ML (e.g. AutoTuner, Automatic Parallelism, Database, Devie Placement/Scheduling).

Deniz Alkan (slides)
1.1 A. Mirhoseini et al.: Device Placement Optimization with Reinforcement Learning, ICML, 2017.
1.2. A. Mirhoseini, A. Goldie, H. Pham, B. Steiner, Q. Le and J. Dean: A Hierarchical Mode for Device Placement, ICLR, 2018.

2.1 A. Mirhoseini, A. Goldie, et al.: A graph placement methodology for fast chip design, Nature, 2021.
2.2 A. Mirhoseini, A. Goldie, et al.: Chip Placement with Deep Reinforcement Learning, ISPD, 2020.

3. R. Addanki, S. B. Venkatakrishnan, S. Gupta, H. Mao, M. Alizadeh: Placeto: Learning Generalizable Device Placement Algorithms for Distributed Machine Learning, arXiv, 2019.

4. R. Marcus, P. Negi, Parimarjan, H. Mao, C. Zhang, M. Alizadeh, T. Kraska, O. Papaemmanouil, and N. Tatbul: Neo: A Learned Query Optimizer, VLDB, 2019.

6. A. Pavlo et al.: Self-Driving Database Management Systems, CIDR, 2017.

7. H. Mao et al.: Park: An Open Platform for Learning-Augmented Computer Systems, OpenReview, 2019.

8. A. Floratou et al.: Dhalion: self-regulating stream processing in Heron, VLDB, 2017.

9. G. Li, X. Zhou, S. Li, and B. Gao: Qtune: RL for DB query optimisation, VLDB, 2019.

10. D Van Aken, A Pavlo, GJ Gordon, and B Zhang: Automatic database management system tuning through large-scale machine learning, SIGMOD, 2017.

12. D. Aken , D. Yang, S. Brillard, A. Fiorino, B. Zhang, C. Bilien, and A. Pavlo: An Inquiry into Machine Learning-based Automatic Configuration Tuning Services on Real-World Database Management Systems, VLDB, 2021.

13. Zhao, Y. et al.: TOD: GPU-accelerated Outlier Detection via Tensor Operations, VLDB, 2022.

16. Kraska, T., Alizadeh, M., Beutel, A., Chi, E.H., Ding, J., Kristo, A., Leclerc, G., Madden, S., Mao, H. and Nathan, V.: SageDB: A learned database system, CIDR, 2019.

Ognen Pendarovski (slides)
19. D. Ha and J. Schmidhuber: World Models, arXiv, 2018 (https://worldmodels.github.io).

20. R. Marcus, P. Negi, H. Mao, N. Tatbul, M. Alizadeh, and T. Kraska: Bao: Learning to Steer Query Optimizers, VLDB, 2020.

21 . E. Liang, Z. Wu, M. Luo, S. Mika, J. Gonzalez, and I. Stoica: RLlib Flow: Distributed Reinforcement Learning is a Dataflow Problem, NeurIPS, 2021.

32. S. Cereda, S. Valladares, P. Cremonesi and S. Doni: CGPTuner: a Contextual Gaussian Process Bandit Approach for the Automatic Tuning of IT Configurations Under Varying Workload Conditions, VLDB, 2021.

Jan Pytel (slides)
33. Z. Jia, et al.: Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization, OSDI, 2022.

Woon Yee Ng (slides)

34. J. Xing, et al.: Bolt: Bridging the Gap between Auto-tuners and Hardware-native Performance, MLSYS, 2022.

35. M. Lindauer, et al.: SMAC3: A Versatile Bayesian Optimization Package for Hyperparameter Optimization, Journal of Machine Learning Research, JMLR, 2021.

37. M. Wagenländer, et al.: Tenplex: Dynamic Parallelism for Deep Learning using Parallelizable Tensor Collections, SOSP, 2024.

39. X. Miao, et al.: FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning, arXiv, 2024.

41. Y. Mei, et al.: Helix: Distributed Serving of Large Language Models via Max-Flow on Heterogeneous GPUs, arXiv, 2024.

42. J. Juravsky, et al.: Hydragen: High-Throughput LLM Inference with Shared Prefixes, arXiv, 2024.

43. H. Zhang, et al.: LLMCompass: Enabling Efficient Hardware Design for Large Language Model Inference, ISCA, 2024.

45. J. Cheng, et al.: A Dataflow Compiler for Efficient LLM Inference using Custom Microscaling Formats, arXiv, 2023.

46. Y. Zhong, et al.: DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving, OSDI, 2024.

47. W. Kwon, et al.: Efficient Memory Management for Large Language Model Serving with PagedAttention, SOSP, 2023.

Shreyas Ravishankar (slides)

48. Y. Jiang, et al.: ThunderServe: High-performance and Cost-efficient LLM Serving in Cloud Environments, MLSYS, 2025.

2025/11/26 Session 7: Optimisations in ML Compiler

ML Compiler, SuperOptimisation, Hi-dimensional Parameter Space, Phase Ordering Problem, Reinforcement Learning.

1. Trofin, M. et al.: MLGO: a Machine Learning Guided Compiler Optimizations Framework, ArXiv, 2021.

Yavuz Ferhatosmanoglu (slides)

2. He, G., Parker, S., Yoneki, E.: X-RLflow: Graph Reinforcement Learning for Neural Network Subgraphs Transformation, MLSys, 2023.

3. Ma, L. et al.: Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks, OSDI, 2020.

Ruquaiya Shuaibu (slides)

4. Zheng, L. et al.: EinNet: Optimizing Tensor Programs with Derivation-Based Transformations, OSDI, 2023.

5. Nakandala, S. et al.: A Tensor Compiler for Unified Machine Learning Prediction Serving, OSDI, 2020.

6. Ding, Y. et al.: Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs, ASPLOS, 2023.

7. T. Chen et al.: Learning to Optimize Tensor Programs, NIPS, 2018.

8. T. Chen, T. Moreau, Z. Jiang, L. Zheng, S. Jiao, E. Yan, H. Shen, M. Cowan, L. Wang, Y. Hu, L. Ceze, C. Guestrin, and A. Krishnamurthy: TVM: An Automated End-to-End Optimizing Compiler for Deep Learning, OSDI, 2018.

9. T. Chen, T. Moreau, Z. Jiang, L. Zheng, S. Jiao, E. Yan, H. Shen, M. Cowan, L. Wang, Y. Hu, L. Ceze, C. Guestrin, and A. Krishnamurthy: TVM: End-to-End Compilation Stack for Deep Learning, SysML, 2017.

Andreas Pletschko (slides)
11. Z. Jia, O. Padon, J. Thomas, T. Warszawski, M. Zaharia, A. Aiken: TASO: Optimizing Deep Learning Computation with Automated Generation of Graph Substitutions: SOSP, 2019.

12. KH. Wang, J. Zhai, M. Gao, Z. Ma, S. Tang, L. Zheng, Y. Li, K. Rong, Y. Chen, and Z. Jia: PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections, ODSI, 2021.

13. A. Qiao, S. K. Choe, S. Subramanya, W. Neiswanger, Q. Ho, H. Zhang, G. R. Ganger, E. Xing: Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning, OSDI, 2021.

Syed Hasan (slides)
14. Y. Yang, et al.: Equality Saturation for Tensor Graph Superoptimization, MLSys, 2021.

15. L. Zheng, et al.: TenSet: A Large-scale Program Performance Dataset for Learned Tensor Compilers, NeurIPS, 2021.

17. R. Senanayake, et al.: A sparse iteration space transformation framework for sparse tensor algebra, OOPSLA, 2020.

Juntong Deng (slides)

18. L. Zheng, et al.: Ansor : Generating High-Performance Tensor Programs for Deep Learning, OSDI, 2020.

20. S. Zheng, et al.: FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System, ASPLOS, 2020.

24. A. Paliwal et al.: REGAL: Transfer Learning For Fast Optimization of Computation Graphs, arxiv, 2019.

Carl Seifert (slides)

25. F. Kjolstad, et al.: The tensor algebra compiler, OOPSLA, 2017.

26. C. Cummins, et al.: Meta Large Language Model Compiler: Foundation Models of Compiler Optimization, Meta, 2024.

27. J. Magalhães, et al.: C2taco: Lifting tensor code to taco, GPCE, 2023.

28. C. Hvarfner, et al.: Vanilla Bayesian Optimization Performs Great in High Dimensions, PMLR, 2024.

29. D. Eriksson, et al.: Scalable Global Optimization via Local Bayesian Optimization, NeurIPS, 2019.

30. Y. Wu, et al.: Swift: Compiled Inference for Probabilistic Programming Languages, IJCAI, 2016.

Optimisation in Computer System: Additional Reference Papaers.

See the additional papers in optimisation in Computer Systems for extended reading. Here!

2025/12/03 Session 8: Presentation of Open Source Project Study

Presentation of Mini Project (work in progress, plan).

Andreas Pletschko (timely-dataflow/differential-datatflow)   Real-time Flight Data Processing with timely-dataflow (slides)

Carl Seifert (Timely Dataflow / iceoryx2)   Using iceoryx2 for low-latency worker communication in Timely Dataflow (slides)

Deniz Alkan (Pyro)   Optimising Latent Space Representations for Variational Autoencoders in Pyro (slides)

Florian Klein (Timely Dataflow / Naiad)   Arrow-Backed Streaming Dataframes in Timely Dataflow (slides)

Jan Pytel (X-Stream)   Evaluating X-Stream: Edge-centric Graph Processing using Streaming Partitions (slides)

Juntong Deng (vLLM)   Memory operations tracing in LLM inference (slides)

Ognen Pendarovski (vLLM)   Memory Optimizations and Multi-LoRA serving prototype (slides)

Ruquaiya Shuaibu (Apache Flink)   Real-time Air Quality Anomaly Detection using Apache Flink (slides)

Syed Hasan (BoTorch)   An evaluation of BoTorch for Structured Bayesian Optimisation and Human-in-the-loop Approaches (slides)

Shreyas Ravishankar (JAX/Pytorch)   Analysing Compilation and Parallelisation Strategies in JAX and Pytorch (slides)

Woon Yee Ng (vLLM)   vLLM Multi-Objective Bayesian Optimization (slides)

Yavuz Ferhatosmanoglu (Apache Flink / vLLM)   Streaming Architectures for Low-Latency LLM Serving Using Apache Flink and vLLM (slides)

Wrap-up!



Coursework 1 (Reading Club)

The reading club will require you to read between 1 and 3 papers every week. You need to fill out simple review_log (MS word format, text format) prior to each session and email me by a day before - Tuesday (noon). The minimum requirement of review_log is one per session, but you can read as many as you want and fill the review_log for each paper you read. review_log is not marked but 'tick'.

At each session, 3 - 4 papers are selected under the session topic, and if you are assigned to present your review work, please prepare 15-20 minutes slides for presenting your review work. Your presented material should also be emailed by the following day Thursday. You would present your review work approximately twice during the course. The paper includes following two types and you can focus on the specified aspects upon reviewing the paper.

Full length papers

What is the significant contribution?

What is the difference from the existing works?

Short length papers

What is the novel idea?

What is required to complete the work?

Coursework 2 (Reports)

The following three reports are required, which could be extended from the reading assignment of the reading club or a different one within the scope of data centric networking.

Review report on a full length of paper (max 1800 words)

Describe the contribution of paper in depth with criticism

Crystallise the significant novelty in contrast to the other related work

Suggestion for future work

Survey report on sub-topic in data centric networking (~1800 - max 2000 words)

Pick up to 5 papers as core papers in your survey scope

Read the above and expand your reading through related work

Comprehend your view and finish as your survey paper

See 'how to write a survey paper' in Assessment section.

Project study and exploration of a prototype (max 2500 words)

See Open Source Projects

What is the significance of the project in the research domain?

Compare with the similar and succeeding projects

Demonstrate the project by exploring its prototype

Please email your project selection (MS word format or text format <150 words) by 16:00 on November 21, 2025.

Project presentation on December 3, 2025.

The report 1 should be handed in by November 14, 2025 - 16:00 and the report 2 by December 12, 2025 - 16:00 . The report 3 by January 20, 2026 - 16:00 - (Try to finish the mini project by the end of 2025!).

Assessment

The final grade for the course will be provided as a letter grade or percentage and the assessment will consist of two parts:

25%: for reading club (Participation, Presentation + tick of review_log and hands-on tutorial)

10%: Presentation

15%: Participation

75%: for the three reports

15%: Intensive review report

25%: Survey report

35%: Project study

Open Source Projects

See the candidates of Open Source Projects in data centric networking. The list is not exhausted. If you take anything other than the one in the list, please discuss with me. The purpose of this assignment is to understand the prototype of the proposed architecture, algorithms, and systems through running an actual prototype and present/explain to the other people how the prototype runs, any additional work you have done including your own applications and setup process of the prototype. This experience will give you better understanding of the project. These Open Source Projects come with a set of published papers and you should be able to examine your interests in the paper through running the prototype. Some projects are rather large and may require extensive environment and time; make sure you are able to complete this assignment.

How to Read/Review a Paper

The following papers aid how to read/review a paper.

S. Keshav: How to Read a Paper, ACM SIGCOMM Computer Communication Review 83 Volume 37, Number 3, July 2007.

T. Roscoe: Writing Reviews for Systems Conferences, 2007.

Simon Peyton-Jones: How to write a great paper and give a great talk about it, Microsoft Research Cambridge.

David A. Patterson: How to Have a Bad Career in Research/Academia, 2001.

Further supplement: see ‘how to read/review a paper’ section in Advanced Topics in Computer Systems by Steven Hand.

Presentations

Presentations should be about 15-20 minutes long, where you need to cover the following aspects.

What are the background and the problem domain of the paper? What is the motivation of the presented work? What is the difference from the existing works? What is the novel idea? How did the paper change/unchange the research in the research community?

What is the significant contribution? How did the authors tackle the problem? Did the authors obtain expected result from their trial?

How do you like the paper and why? What is the takeaway message to you (and to research community)? What is required to complete the work?

The following document aids in presenting a review.

Perdita Stevens in University of Edinburgh How to [read, present, review] a research paper.

How to write a survey paper

A survey paper provides the readers with an exposition of existing work that is comprehensive and organized. It must expose relevant details associated in the surveying area, but it is important to keep a consistent level of details and to avoid simply listing the different works. Thus a good survey paper should demonstrate a summary of recent research results in a novel way that integrates and adds understanding to work in the field. For example, you can take an approach by classifying the existing literature in your own way; develop a perspective on the area, and evaluate trends. Thus, after defining the scope of your survey, 1) classify and organize the trend, 2) critical evaluation of approaches (pros/cons), and 3) add your analysis or explanation (e.g. table, figure). Also adding reference and pointer to further in-depth information is important (summary from Rich Wolski’s note).
Papers for OS Principles (Distributed Storage and Deterministic Parallelism)

Following papers will help you to understand distributed storage and parallelism.

Systems Research and System Design

1. B. Lampson: Hints for Computer Systems Design (Revised), ACM OSR 1983.

Distributed Storage

2. S. Ghemawat, H. Gobioff, and S. Leung: The Google File System, ACM SOSP 2003.
3. F. Chang et al: BigTable: A Distributed Storage System for Structured Data, USENIX OSDI 2006.
4. G. DeCandia et al: Dynamo: Amazon's Highly Available Key-value Store, ACM SOSP 2007.

Deterministic Parallelism

5. J. Devietti et al: DMP: Deterministic Shared Memory Multiprocessing, ACM ASPLOS 2009.
6. A. Aviram, et al: Efficient System-Enforced Determistic Parallelism, USENIX OSDI 2010.
7. T. Liu et al: Dthreads: Efficient and Determistic Multithreading, ACM SOSP 2011.

Contact Email

Please email to eiko.yoneki@cl.cam.ac.uk for the question.

Computer Laboratory

Large-Scale Data Processing and Optimisation (2025-2026 Michaelmas Term)

Overview

Module Structure

Schedule and Reading List

2025/10/15 Session 1: Introduction to Large-Scale Data Processing and Optimisation

2025/10/22 Session 2: Data flow programming

2025/10/29 Session 3: Data Flow Programming Tutorial

2025/11/05 Session 4: Large-scale graph data processing and Search Space Optimisation

2025/11/12 Session 5: Probabilistic Programming

2025/11/19 Session 6: Optimisations in Computer Systems

2025/11/26 Session 7: Optimisations in ML Compiler

Optimisation in Computer System: Additional Reference Papaers.

2025/12/03 Session 8: Presentation of Open Source Project Study

Coursework 1 (Reading Club)

Coursework 2 (Reports)

Assessment

Open Source Projects

How to Read/Review a Paper

Presentations

How to write a survey paper

Papers for OS Principles (Distributed Storage and Deterministic Parallelism)

Contact Email