Large-Scale Data Processing and Optimisation (2024-2025 Michaelmas Term)
|
OverviewThis module provides an introduction to large-scale data
processing, optimisation, and the impact on computer system's architecture.
Large-scale distributed applications with high volume data processing such as
training of machine learning will grow ever more in importance. Supporting the
design and implementation of robust, secure, and heterogeneous large-scale
distributed systems is essential. To deal with distributed systems with a large
and complex parameter space, tuning and optimising computer systems is becoming
an important and complex task, which also deals with the characteristics of
input data and algorithms used in the applications. Algorithm designers are
often unaware of the constraints imposed by systems and the best way to consider
these when designing algorithms with massive volume of data. On the other hand,
computer systems often miss advances in algorithm design that can be used to cut
down processing time and scale up systems in terms of the size of the problem
they can address. Integrating machine learning approaches (e.g. Bayesian
Optimisation, Reinforcement Learning) for system optimisation will also be
explored in this course.
Module Structure
Schedule and Reading ListAll the sessions will be in the class room (FW26). We will meet every Wednesday (from October 16 to December 4) in 2024. The time slot is 10:00-12:00.1 2024/10/16 Session 1: Introduction to Large-Scale Data Processing and Optimisation
2024/10/23 Session 2: Data flow programming
1. Yuan Yu, Michael Isard, D. Fetterly, M. Budiu, U.
Erlingsson, P.K. Gunda, J. Currey:
2. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma,
M. McCauley, M. Franklin, S. Shenker, I. Stoica:
Noria: dynamic, partially-stateful data-flowfor high-performance web applications, OSDI 2018. 4. J. Dean, S. Ghemawat: MapReduce: Simplified Data Processing on Large Clusters, OSDI, 2004.
Andy Zhou
(slides)
6. Naiad
Frank McSherry's Talk on Differential Dataflow is
here.
Luca Choteborsky
(slides)
6.3. F. McSherry, A. Lattuada, M. Schwarzkopf, T. Roscoe: Shared Arrangements: practical inter-query sharingfor streaming dataflows, VLDB, 2020. 7. P. Bhatotia, A. Wieder, R. Rodrigues, U. A. Acar, and R. Pasquini: Incoop: MapReduce for incremental computation, ACM SOCC, 2011.
Gabriel Mahler
(slides)
8.1 M. Abadi et al.: TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems, Preliminary White Paper, 2015. 8.2. M. Abadi, M. Isard and D.
Murray: A Computational
Model for TensorFlow - An Introduction, MAPL, 2017. 9. M. Looks et al.: Deep Learning with Dynamic Computation Graphs, ICLR, 2017.
Chris Tomy
(slides)
13. T. Akidau, R. Bradshaw, C. Chambers, S. Chernyak, R. Fernandez-Moctezuma, R. Lax, S. McVeety, D. Mills, F. Perry, E. Schmidt, S. Whittle: The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing, VLDB, 2015. 14. S. Li, Y. Zhao, R. Varma, et. al: PyTorch Distributed: Experiences on Accelerating Data Parallel Training, VLDB, 2020. 15. T. Lévai, F. Németh, and G. Rétvári: Batchy Batch scheduling Data Flow Graphs with Service level Objective, NSDI, 2020. Andrzej Szablewski (slides)16 . P. Barham, et al.: Pathways: Asynchronous Distributed Dataflow for ML, MLSys, 2022. 17 . L. Zheng, et al.: Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning, OSDI, 2022. Mark Li (slides)18 . H. Zhu, et al.: MSRL: Distributed Reinforcement Learning with Dataflow Fragments, USENIX_ATC, 2023. 2024/10/30 Session 3: Large-scale graph data processing
Hetong Shen
(slides)
2. Z. Qian, X. Chen, N. Kang, M. Chen, Y. Yu, T. Moscibroda, Z.Zhang: MadLINQ: large-scale distributed matrix computation for the cloud, EuroSys, 2012. 3. Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, J. Hellerstein: Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud, VLDB, 2012.
Edmund Goodman
(slides)
Timi Adeniran
(slides)
6. J. Gonzalez, R. Xin, A. Dave, D. Crankshaw, M. Franklin, I. Stoica: GraphX: Graph Processing in a Distributed Dataflow Framework, OSDI, 2014. 7. B. Shao, H. Wang, Y. Li: Trinity: A Distributed Graph Engine on a Memory Cloud, SIGMOD, 2013. 8. A. Kyrola and G. Blelloch: Graphchi: Large-scale graph computation on just a PC, OSDI, 2012.
Aryan Shah
(slides)
10. A. Roy, L. Bindschaedler, J. Malicevic and W. Zwaenepoel: Chaos: Scale-out Graph Processing from Secondary Storage , SOSP, 2015. Jakub Bachurski (slides)11. F. McSherry, M. Isard and D. Murray: Scalability! But at what COST? , HOTOS, 2015. 12. X. Hu, Y. Tao, C. Chung: Massive Graph Triangulation, SIGMOD, 2013. 13. W. Xie, G. Wang, D.Bindel, A. Demers, J. Gehrke: Fast Iterative Graph Computation with Block Updates, VLDB, 2014. 14. S. Hong, H. Chafi, E. Sedlar, f K.Olukotun: Green-Marl: A DSL for Easy and Efficient Graph Analysis, ASPLOS, 2012. 15. D. Prountzos, R. Manevich, K. Pingali: Elixir: A System for Synthesizing Concurrent Graph Programs, OOPSLA, 2012. 16. D. Nguyen, A. Lenharth, K. Pingali: A Lightweight Infrastructure for Graph Analytics, SOSP 2013. 17. D. Merrill, M. Garland, A. Grimshaw: Scalable GPU Graph Traversal, PPoPP, 2012. 18. A. Gharaibeh, E. Santos-Neto, L. Costa, M. Ripeanu: Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems, IEEE TPC, 2014.
Asbjorn Thede Lorenzen
(slides)
20. H. Dai, Z. Kozareva, B. Dai,
A. Smola and L. Song:
Learning Steady-States of Iterative Algorithms over Graphs, ICML, 2018.
Sid Nagappan (slides)
2024/11/06 Session 4: Data Flow Programming Tutorial
2024/11/13 Session 5: Probabilistic Programming
Guest lecture: Hong Ge (Department of Engineering, University of Cambridge) @11:00. Title: Probabilistic Programming and Inference (slides) Abstract: Probabilistic programming combines two fundamental ideas in statistics, machine learning, and computer science: inference and programming. It aims to eliminate the mental overhead of performing Bayesian inference in modelling tasks. It encourages practitioners to focus more on models than low-level details about inference and computation, allowing them to develop more accurate models for their data and foster more reproducible science. I will present a gentle introduction to modern, general-purpose probabilistic programming. I will show two systems, Turing and JuliaBUGS, with examples. I will conclude with some challenges when adopting probabilistic programming. Bio: Hong Ge is a Group Leader in the Machine learning group in the Engineering department and a Research Lead at the Alan Turing Institute. He completed his PhD from the same group in 2015, advised by Zoubin Ghahramani. He was a Junior Research Fellow at Darwin College. His research interests are general-purpose inference algorithms in probabilistic programming and novel learning theories for modern neural networks. Hong created the Turing probabilistic programming language. Hong is a co-organiser of NeurIPS 2013 with Zoubin Ghahramani and Max Welling. Reading Club @10:00. 1. E. Bingham et al.: Pyro: Deep Universal Probabilistic Programming, Journal of Machine Learning Research, 2019. 2. D. Tran et al.: Edward: A library for probabilistic modeling, inference, and criticism, arXiv, 2017. 4. F. Wood, J. van de Meent, V. Mansinghka: A new approach to probabilistic programming inference, AISTATS, 2014. 5. B. Paige and F. Wood: A compilation target for probabilistic programming languages, ICML, 2014. 6. J. Ai et al.: HackPPL: a universal probabilistic programming language, MAPL, 2019.
Xavier Chen
(slides)
8. Ge, H., Xu, K. and Ghahramani, Z.: Turing: A language for flexible probabilistic inference, AISTATS, 2018. 9. W. Neiswanger et al.: ProBO: Versatile Bayesian Optimization Using Any Probabilistic Programming Language, Arxiv, 2019. 11. M. Balandat et al.: BOTORCH: Bayesian Optimization in PyTorch, Arxiv 2020. 12. D. Tran, M. D. Hoffman, D. Moore, C. Suter, S. Vasudevan, A. Radul, M. Johnson, and R. A. Saurous: Simple, Distributed, and Accelerated Probabilistic Programming, NeurIPS, 2018. 13. G. Baydin et al.: Etalumis: Bringing Probabilistic Programming to Scientific Simulators at Scale, SC, 2018.
Andrew Krapivin
(slides)
14 . J. Shao, et al.: Tensor Program Optimization with Probabilistic Programs, NeurIPS, 2022. 15 . X. Wan, et al.: Bayesian Generational Population-Based Training, ALOE, ICLR, 2022. 16. Maraval, A. et al.: Sample-Efficient Optimisation with Probabilistic Transformer Surrogates, ArXiv, 2022. 17. J. Snoek, H. Larochelle, and R. Adams: Practical Bayesian Optimization of Machine Learning Algorithms, NIPS, 2012.
19. R. Liaw, E. Liang, R. Nishihara, P. Moritz, J. Gonzalez, I. Stoica: Tune: A Research Platform for Distributed Model Selection and Training, ICML, 2018. 20. Z. Wang, C. Li, S. Jegelka, and P. Kohli: Batched High-dimensional Bayesian Optimization via Structural Kernel Learning, PLMR, 2017. Luca Choteborsky (slides)21. Z. Wang, C. Gehring, P. Kohli, and S. Jegelka: Batched Large-scale Bayesian Optimization in High-dimensional Spaces, AISTATS, 2018. 22. Grosnit et al.: High-Dimensional Bayesian Optimisation with Variational Autoencoders and Deep Metric Learning, arxiv, 2021. 23. E. Siivola, J. Gonzalez, A. Paleyes, A. Vehtari: Good practices for Bayesian Optimization of high dimensional structured spaces, arxiv, 2021. 24. J. Grosse, C. Zhang, and P. Hennig: Probabilistic DAG Search, UAI, 2021. 25. C. Lin, J. Miano, and E. Dyer: Bayesian optimization for modular black-box systems with switching costs, UAI, 2021. 26. E. H. Lee, D. Eriksson, V. Perrone and M. Seeger: A Nonmyopic Approach to Cost-Constrained Bayesian Optimization, UAI, 2021. 27. G. Claret, et al.: Bayesian Inference using Data Flow Analysis, ESEC/FSE, 2013. 28. J. Martinelli, et al.: Learning Relevant Contextual Variables Within Bayesian Optimization, arXiv, 2024. 29. A. Zhao, et al.: Exploring High-dimensional Search Space via Voronoi Graph Traversing, UAI, 2021. 2024/11/20 Session 6: Optimisations in Computer Systems
Mark Li
(slides)
2.1 A. Mirhoseini, A. Goldie, et al.: A graph placement methodology for fast chip design, Nature, 2021. 2.2 A. Mirhoseini, A. Goldie, et al.: Chip Placement with Deep Reinforcement Learning, ISPD, 2020. Asbjorn Thede Lorenzen (slides)3. R. Addanki, S. B. Venkatakrishnan, S. Gupta, H. Mao, M. Alizadeh: Placeto: Learning Generalizable Device Placement Algorithms for Distributed Machine Learning, arXiv, 2019. 4. R. Marcus, P. Negi, Parimarjan, H. Mao, C. Zhang, M. Alizadeh, T. Kraska, O. Papaemmanouil, and N. Tatbul: Neo: A Learned Query Optimizer, VLDB, 2019. 5. D. Aken et al.: Automatic Database Management System Tuning Through Large-scale Machine Learning, SIGMOD, 2017.
6. A. Pavlo et al.:
Self-Driving Database
Management Systems, CIDR, 2017. 8. A. Floratou et al.: Dhalion: self-regulating stream processing in Heron, VLDB, 2017. 9. G. Li, X. Zhou, S. Li, and B. Gao: Qtune: RL for DB query optimisation, VLDB, 2019.10. D Van Aken, A Pavlo, GJ Gordon, and B Zhang: Automatic database management system tuning through large-scale machine learning, SIGMOD, 2017. 11. A. Pavlo, M. Butrovich, L. Ma, P. Menon, W. Shen Lim, D. Van Aken, W. Zhang: Make Your Database System Dream of Electric Sheep: Towards Self-Driving Operation, VLDB, 2021.
Aryan Shah
(slides)
13. Zhao, Y. et al.: TOD: GPU-accelerated Outlier Detection via Tensor Operations, VLDB, 2022. 14. Chowdhery, A. et al.: PaLM: Scaling Language Modeling with Pathways, ArXiv, 2022.
16. Kraska, T., Alizadeh, M., Beutel, A., Chi, E.H., Ding, J., Kristo, A., Leclerc, G., Madden, S., Mao, H. and Nathan, V.: SageDB: A learned database system, CIDR, 2019. 17. Ma, L., Ding, B., Das, S. and Swaminathan, A.: Active Learning for ML Enhanced Database Systems, SIGMOD, 2020. 18. A. Kipf, R. Marcus, A. van Renen, M. Stoian, A. Kemper, T. Kraska, and T. Neumann: SOSD: A Benchmark for Learned Indexes, NeurIPS Workshop on ML for Systems, 2019.
Edmund Goodman
(slides)
21 . E. Liang, Z. Wu, M. Luo, S. Mika, J. Gonzalez, and I. Stoica: RLlib Flow: Distributed Reinforcement Learning is a Dataflow Problem, NeurIPS, 2021. 22 . X. Zhang, et al.: Restune: Resource oriented tuning boosted by meta-learning for cloud databases, SIGMOD, 2021. 23 . j. Ding, et al.: Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads, ArXiv, 2020. 24 . Ng and S. Russell: Algorithms for inverse reinforcement learning, ICML 2000. 25. L. Espeholt et al.:
IMPALA: Scalable
Distributed Deep-RL with Importance Weighted Actor-Learner Architectures,
ICML, 2018. 26. T. Li, Z. Xu, J. Tang and Y. Wang: Model-Free Control for Distributed Stream Data Processing using Deep Reinforcement Learning, VLDB, 2018. 28. J. Ansel et al. Opentuner: an extensible framework for program autotuning. PACT, 2014. 29. J. Ansel et al.: Petabricks: A language and compiler for algorithmic choice. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI, 2009. Chris Tomy (slides)30. O. Alipourfard et al.: CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics, NSDI, 2017. 31. Z. Jia, M. Zaharia, and A. Aiken: Beyond Data and Model Parallelism for Deep Neural Networks, SYSML, 2019. 32. S. Cereda, S. Valladares, P. Cremonesi and S. Doni: CGPTuner: a Contextual Gaussian Process Bandit Approach for the Automatic Tuning of IT Configurations Under Varying Workload Conditions, VLDB, 2021.
Andrzej Szablewski
(slides)
34. J. Xing, et al.: Bolt: Bridging the Gap between Auto-tuners and Hardware-native Performance, MLSYS, 2022. 35. M. Lindauer, et al.: SMAC3: A Versatile Bayesian Optimization Package for Hyperparameter Optimization, Journal of Machine Learning Research, JMLR, 2021. 36. X. Miao, et al.: SpotServe: Serving Generative Large Language Models on Preemptible Instances , ASPLOS, 2024. 37. M. Wagenländer, et al.: Tenplex: Dynamic Parallelism for Deep Learning using Parallelizable Tensor Collections, SOSP, 2024. 38. J. Duan, et al.: Parcae: Proactive, Liveput-Optimized DNN Training on Preemptible Instances, JMLR, 2021. 39. X. Miao, et al.: FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning, arXiv, 2024. 40. Y. Sheng, et al.: S-LoRA: Serving Thousands of Concurrent LoRA Adapters, arXiv, 2024. 41. Y. Mei, et al.: Helix: Distributed Serving of Large Language Models via Max-Flow on Heterogeneous GPUs, arXiv, 2024. 42. J. Juravsky, et al.: Hydragen: High-Throughput LLM Inference with Shared Prefixes, arXiv, 2024. 43. H. Zhang, et al.: LLMCompass: Enabling Efficient Hardware Design for Large Language Model Inference, ISCA, 2024. 44. L. Chen, et al.: Punica: Multi-Tenant LoRA Serving, arXiv, 2024. 43. H. Zhang, et al.: LLMCompass: Enabling Efficient Hardware Design for Large Language Model Inference, arXiv, 2023. 44. B. Jeon, et al.: GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism, arXiv, 2024. 45. J. Cheng, et al.: A Dataflow Compiler for Efficient LLM Inference using Custom Microscaling Formats, arXiv, 2023. 2024/11/27 Session 7: Optimisations in ML Compiler
1. Trofin, M. et al.: MLGO: a Machine Learning Guided Compiler Optimizations Framework, ArXiv, 2021. Andrew Krapivin (slides)2. He, G., Parker, S., Yoneki, E.: X-RLflow: Graph Reinforcement Learning for Neural Network Subgraphs Transformation, MLSys, 2023. 3. Ma, L. et al.: Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks, OSDI, 2020. Sid Nagappan (slides)4. Zheng, L. et al.: EinNet: Optimizing Tensor Programs with Derivation-Based Transformations, OSDI, 2023. 5. Nakandala, S. et al.: A Tensor Compiler for Unified Machine Learning Prediction Serving, OSDI, 2020. 6. Ding, Y. et al.: Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs, ASPLOS, 2023.
8. T. Chen, T. Moreau, Z. Jiang,
L. Zheng, S. Jiao, E. Yan, H. Shen, M. Cowan, L. Wang, Y. Hu, L. Ceze, C.
Guestrin, and A. Krishnamurthy:
TVM: An Automated End-to-End
Optimizing Compiler for Deep Learning, OSDI, 2018. 9. T. Chen, T. Moreau, Z. Jiang,
L. Zheng, S. Jiao, E. Yan, H. Shen, M. Cowan, L. Wang, Y. Hu, L. Ceze, C.
Guestrin, and A. Krishnamurthy:
TVM: End-to-End Compilation
Stack for Deep Learning, SysML, 2017.
Andy Zhou
(slides)
13. A. Qiao, S. K. Choe, S. Subramanya, W. Neiswanger, Q. Ho, H. Zhang, G. R. Ganger, E. Xing: Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning, OSDI, 2021.
Hetong Shen
(slides)
15. L. Zheng, et al.: TenSet: A Large-scale Program Performance Dataset for Learned Tensor Compilers, NeurIPS, 2021. 16. C. Nandi, et al.: Rewrite Rule Inference Using Equality Saturation, OOPSLA, 2021. 17. R. Senanayake, et al.: A sparse iteration space transformation framework for sparse tensor algebra, OOPSLA, 2020. Xavier Chen (slides)18. L. Zheng, et al.: Ansor : Generating High-Performance Tensor Programs for Deep Learning, OSDI, 2020. 19. M. Li, et al.: AdaTune: Adaptive Tensor Program Compilation Made Efficient, NeurIPS, 2020. 20. S. Zheng, et al.: FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System, ASPLOS, 2020. 21. J. Turner, et al.: Neural Architecture Search as Program Transformation Exploration, ASPLOS, 2021. 22. Y. Zhou, et al.: Transferable Graph Optimizers for ML Compilers, NEURIPS, 2020. 23. X. Chen, et al.: Learning to Perform Local Rewriting for Combinatorial Optimization, NeurIPS, 2019. 24. A. Paliwal et al.: REGAL: Transfer Learning For Fast Optimization of Computation Graphs, arxiv, 2019. Jakub Bachurski (slides)25. F. Kjolstad, et al.: The tensor algebra compiler, OOPSLA, 2017. 26. C. Cummins, et al.: Meta Large Language Model Compiler: Foundation Models of Compiler Optimization, Meta, 2024. 27. J. Magalhães, et al.: C2taco: Lifting tensor code to taco, GPCE, 2023. 28. C. Hvarfner, et al.: Vanilla Bayesian Optimization Performs Great in High Dimensions, PMLR, 2024. 29. D. Eriksson, et al.: Scalable Global Optimization via Local Bayesian Optimization, NeurIPS, 2019. Optimisation in Computer System: Additional Reference Papaers.
2024/12/04 Session 8: Presentation of Open Source Project Study
Wrap-up Discussion
Coursework 1 (Reading Club)The reading club will require you to read between 1
and 3 papers every week. You need to fill out simple review_log
(MS word format,
text
format) prior to each session and email me by a day before - Tuesday (noon). The minimum requirement of review_log is one per session, but
you can read as many as you want and fill the review_log for each paper you
read. review_log is not marked but 'tick'.
Coursework 2 (Reports)The following three reports are required, which could be extended from the reading assignment of the reading club or a different one within the scope of data centric networking.
The report 1 should be handed in by November 15, 2024 - 16:00 and the report 2 by December 13, 2024 - 16:00 . The report 3 by January 21, 2025 - 16:00 - (Try to finish the mini project by the end of 2024!). AssessmentThe final grade for the course will be provided as a letter grade or percentage and the assessment will consist of two parts:
Open Source ProjectsSee the candidates of Open Source Projects in data centric networking. The list is not exhausted. If you take anything other than the one in the list, please discuss with me. The purpose of this assignment is to understand the prototype of the proposed architecture, algorithms, and systems through running an actual prototype and present/explain to the other people how the prototype runs, any additional work you have done including your own applications and setup process of the prototype. This experience will give you better understanding of the project. These Open Source Projects come with a set of published papers and you should be able to examine your interests in the paper through running the prototype. Some projects are rather large and may require extensive environment and time; make sure you are able to complete this assignment. How to Read/Review a PaperThe following papers aid how to read/review a paper.
Further supplement: see ‘how to read/review a paper’ section in Advanced Topics in Computer Systems by Steven Hand. PresentationsPresentations should be about 15-20 minutes long, where you need to cover the following aspects.
The following document aids in presenting a review.
How to write a survey paperA survey paper provides the readers with an exposition of existing work that is comprehensive and organized. It must expose relevant details associated in the surveying area, but it is important to keep a consistent level of details and to avoid simply listing the different works. Thus a good survey paper should demonstrate a summary of recent research results in a novel way that integrates and adds understanding to work in the field. For example, you can take an approach by classifying the existing literature in your own way; develop a perspective on the area, and evaluate trends. Thus, after defining the scope of your survey, 1) classify and organize the trend, 2) critical evaluation of approaches (pros/cons), and 3) add your analysis or explanation (e.g. table, figure). Also adding reference and pointer to further in-depth information is important (summary from Rich Wolski’s note). Papers for OS Principles (Distributed Storage and Deterministic Parallelism)
3. F. Chang et al: BigTable: A Distributed Storage System for Structured Data, USENIX OSDI 2006. 4. G. DeCandia et al: Dynamo: Amazon's Highly Available Key-value Store, ACM SOSP 2007.
6. A. Aviram, et al: Efficient System-Enforced Determistic Parallelism, USENIX OSDI 2010. 7. T. Liu et al: Dthreads: Efficient and Determistic Multithreading, ACM SOSP 2011. Contact EmailPlease email to eiko.yoneki@cl.cam.ac.uk for the question. |