MPhil, Part III, and Part II Project Suggestions (2023-2024)

Contact

Please contact Eiko Yoneki (email: eiko.yoneki@cl.cam.ac.uk) if you are interested in any project below. * indicates high recommendation!

1. Tensor expression superoptimisation via deep reinforcement learning ***

Contact: Eiko Yoneki

AlphaDev [1][2] shows RL agent can discover faster sorting algorithm via playing the assembly game. In recent advances of machine learning compiler, EinNet [3] [4] shows how to discover faster tensor programs via rule-based transformation. We are interested in investigating how RL works in discovering faster tensor programs at tensor expression level transformation. You should implement a graph RL agent to play in the tensor game to discover faster tensor programs. Alternatively, we have an internal RL-driven graph transformation system, which you can leverage and apply to the tensor expression level transformation. To get started, build and benchmark EinNet to see how it works with rule-based transformation on a simple DNN (e.g. self-attention). Then, replace their search algorithm with RL algorithm.

References:
[1] Faster sorting algorithms discovered using deep reinforcement learning, Nature, 2023.
[2] https://github.com/google-deepmind/alphadev.
[3] EINNET: Optimizing Tensor Programs with Derivation-Based Transformations. OSDI, 2023.
[4] https://github.com/InfiniTensor/InfiniTensor.
[5] https://github.com/ucamrl/xrlflow/tree/main.

2. Bayesian Optimisation for tensor code generation ***

Contact: Eiko Yoneki

Desired Skills: Strong interest in tensor program optimisation, Some knowledge/interest in Bayesian optimisation, Python, with some knowledge in C++

Tensor codes are run on massively parallel hardware. When generating tensor code (also called auto-scheduling), TVM Compiler [1] needs to search through a number of parameters. A State-of-the-art auto-scheduler is Ansor [2], which applies rule-based mutation to generate tensor code templates and then fine-tunes those templates via Evolutionary Search. Bayesian Optimisation (BayesOpt) [3] is a better approach to efficiently search the tensor code templates than Evolutionary Search. In this project, at first, TVM is set up for benchmarking with ResNet and Bert using CPU and GPU (possible a few different types of GPU). Next same benchmarking with Compiler Cutlass by NVIDIA should be experimented. Afterwards, exploring using BayesOpt for high-performance tensor code generation, and benchmark black-box algorithms for tensor code generation. The main interface for tensor code generation in TVM will be through MetaScheduler [6], which provides a fairly simple Python interface for various search methodologies [7]. The project has a particular interest in tensor code generation for tensor cores, which are equipped by recent generation of GPUs (since the Turing micro-architectures) as a domain-specific architectures to massively accelerate tensor programs.

References:
[1] TVM: An Automated End-to-End Optimizing Compiler for Deep Learning https://www.usenix.org/system/files/osdi18-chen.pdf.
[2] Ansor: Generating High-Performance Tensor Programs for Deep Learning https://www.usenix.org/system/files/osdi20-zheng.pdf.
[3] A Tutorial on Bayesian Optimization https://arxiv.org/abs/1807.02811.
[4] HEBO Pushing The Limits of Sample-Efficient Hyperparameter Optimisation https://arxiv.org/abs/2012.03826.
[5] Are Random Decompositions all we need in High Dimensional Bayesian Optimisation? https://arxiv.org/abs/2301.12844.
[6] Tensor Program Optimization with Probabilistic Programs https://arxiv.org/abs/2205.13603.
[7] https://github.com/apache/tvm/blob/4267fbf6a173cd742acb293fab4f77693dc4b887/python/tvm/meta_schedule/search_strategy/search_strategy.py#L238.
[8] NVIDIA Compiler https://github.com/NVIDIA/cutlass.

3. Inferring reward function via Inverse Reinforcement Learning **

Contact: Eiko Yoneki

Desired Skills: Inverse Reinforcement Learning, LLVM Optimisation

Reinforcement Learning (RL) has achieved impressive results recently, from playing video games to solving robotic control problems. However, it is challenging in many applications to design a reward function that robustly describes the desired task. Inverse Reinforcement Learning (IRL) [1], which evolved from imitation learning, can be formulated as a problem of inferring the reward function of an agent, given its policy or observed behaviour. Recent works focused more on the robustness of IRL in unknown environments [2,3]. IRL is suitable for environments where it's hard to define the reward function, or it's expensive to obtain the real-time reward [4]. System-level RL tuning problems [6] are in such cases when people typically define reward functions based on the actual execution time. One of the most popular and open-source IRL methods is Generative Adversarial Imitation Learning (GAIL) [5]. GAIL combines Generative Adversarial Network (GAN) and IRL and alternatively updates the agent's reward and policy until reaching optimum. One challenge here is that IRL needs expert data to update the agent's reward functions and policy. However, unlike video games, building a virtual environment for system-level optimisation problems to collect such data is sometimes impossible. Some underlying optimisers may help here. Thus, one good exploration could be using GAIL to infer reward functions within such expensive environments. In this project you would explore the reward function using GAIL in our existing project on the optimisation of LLVM Pass List using Reinforcement Learning [7].

References: [1] Ng, Andrew Y., and Stuart Russell: Algorithms for inverse reinforcement learning. ICML 2000.
[2] Lindner et al.: Active Exploration for Inverse Reinforcement Learning. ICML workshop 2022.
[3] Fu et al.: Learning robust rewards with adversarial inverse reinforcement learning. ICLR 2018.
[4] Arora, Saurabh and Prashant Doshi: A survey of inverse reinforcement learning: Challenges, methods and progress.
[5] Ho J., Ermon S.: Generative adversarial imitation learning.
[6] Zhang, Ji, et al.: An end-to-end automatic cloud database tuning system using deep reinforcement learning.
[7] Yilin Sun: Optimizing LLVM Pass List using Reinforcement Learning.

4. Advancing Computer Systems Optimisation through Model-Based Reinforcement Learning **

Contact: Eiko Yoneki

Desired Skills: Reinforcement Learning, LLVM Optimisation

Reinforcement Learning (RL) is gaining interest as a generic optimisation and control method in data management tasks such as resource management, scheduling, database tuning, or stream processing. However, its application is hindered by sample inefficiency and extensive decision evaluation times. Model-Based Reinforcement Learning (MBRL), employing techniques like Probabilistic Ensemble-based models [1], World Models [2] and Dyna-style planning [4] etc. addresses these issues by learning environment models. This project tackles the LLVM compiler optimisation on selection of the pass list. This is critical as pass list selection affects code performance and size, and MBRL's capability to model these interactions is invaluable. Additionally, MBRL challenges on enhancing generalisation ability across different programs, essential due to the expensive training of RL agents. You would use MBRL for intelligent pass list selection in LLVM and work on enhancing generalisation across programs. They will investigate various MBRL techniques and evaluate their effectiveness in modeling the compiler environment and improving transfer learning. You would also evaluate a world-model based approach against a model free approach using software infrastructure. The previous project work [3] can be used as a starting point. Evaluating the project with SPEC 2017 (https://www.spec.org/cpu2017/). benchmarking will be ideal.

References:
[1] Kurtland Chua, Roberto Calandra, Rowan McAllister, and Sergey Levine. 2018. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. NIPS'18. https://dl.acm.org/doi/10.5555/3327345.3327385.
[2] World Models. https://arxiv.org/abs/1803.10122.
[3] Yilin Sun: Optimizing LLVM Pass List using Reinforcement Learning.
[4] Sutton, Richard S., et al. Dyna-style planning with linear function approximation and prioritized sweeping. https://arxiv.org/pdf/1206.3285.pdf.

5. Enabling fast and efficient hyper-parameter tuning with search space partitions for large models **

Contact: Eiko Yoneki

Desired Skills: Reinforcement Learning, Gaussian Processes, Bayesian Optimisation, Parallel Computing

Hyperparameter tuning of large-scale deep learning models, such as ResNet-18, VGG-19, ResNet-50, and ResNet-152, is a critical but computationally intensive task due to the large search space. This project proposes a novel hyperparameter tuning algorithm that integrates Monte Carlo Tree Search (MCTS) and Bayesian Optimisation (BO). MCTS, known for its efficiency in large and high-dimensional spaces, will be employed to partition the hyperparameter space into smaller subspaces, which inherently enables parallelism. MCTS's probabilistic and heuristic-driven approach makes it lightweight and well-suited for black-box optimisation [1, 3]. On the other hand, BO is particularly efficient in smaller parameter spaces, making it an ideal choice for searching within the partitions created by MCTS. By combining MCTS's efficiency in high-dimensional spaces with BO's efficiency in lower-dimensional spaces, this approach aims to significantly enhance the speed and performance of hyperparameter tuning. Some suggested benchmarks could be Cifar10 and Cifar100 datasets [7] and compared against existing methods in terms of computational efficiency, accuracy improvements, and resource allocation strategies.

References:
[1] Wang L, et al. Learning search space partition for black-box optimization using monte carlo tree search. https://proceedings.neurips.cc/paper/2020/file/e2ce14e81dba66dbff9cbc35ecfdb704-Paper.pdf.
[2] Paul et al.: Fast efficient hyperparameter tuning for policy gradients. NIPS 2019. https://arxiv.org/abs/1902.06583.
[3] Facebook Research. LaMCTS: Large-scale Parallel Architecture Search Using Monte Carlo Tree Search. https://github.com/facebookresearch/LaMCTS.
[4] X. Dong and Y. Yang. Nas-bench-201: Extending the scope of reproducible neural architecture search. 2020. https://openreview.net/pdf?id=HJxyZkBKDr.
[5] Zhang, Baohe, et al.: On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning http://proceedings.mlr.press/v130/zhang21n/zhang21n.pdf.

6. Harnessing Structured Action Spaces for Efficient RL-based System Optimisation*

Contact: Eiko Yoneki

Desired Skills: Reinforcement Learning, Database Systems, Machine Learning, Parameter Tuning

Reinforcement Learning (RL) is increasingly applied to system optimisation, but handling large and combinatorial action spaces remains a challenge. This project aims to address this by developing methods to compress and structure action spaces, thereby enhancing RL efficiency and performance in system optimisation tasks. The central idea is to leverage hierarchical and structured action spaces, enabling RL agents to make decisions at varying granularities and foster rapid convergence. Additionally, the integration of action pruning and attention mechanisms can further focus the learning process on relevant actions. To validate the developed methods, the project will include evaluations across diverse system optimisation tasks, such as database tuning (high-dimensional action space) [1] and index advising (large discrete action space) [2], as examples. This is not limited to these examples, as the methodologies are intended to be generalisable across various system optimisation domains. The anticipated outcomes include innovative algorithms and techniques that significantly improve the efficiency and effectiveness of RL in system optimisation by intelligently compressing and structuring the action space. This project paves the way for more robust and scalable RL applications in computer systems. Many previous works and methods can be widely used here [3,4,5]

References:
[1] Zhang, Ji, et al.: An end-to-end automatic cloud database tuning system using deep reinforcement learning. https://dl.acm.org/doi/pdf/10.1145/3299869.3300085.
[2] Kossmann, Jan, Alexander Kastius, and Rainer Schlosser: SWIRL: Selection of Workload-aware Indexes using Reinforcement Learning. EDBT. 2022. https://openproceedings.org/2022/conf/edbt/paper-37.pdf.
[3] Dulac-Arnold, Gabriel, et al.: Deep reinforcement learning in large discrete action spaces. https://arxiv.org/abs/1512.07679.
[4] Tavakoli, Arash, Fabio Pardo, and Petar Kormushev: Action branching architectures for deep reinforcement learning. AAAI 2018. https://ojs.aaai.org/index.php/AAAI/article/view/11798.
[5] Chandak, Yash, et al. "Learning action representations for reinforcement learning. PMLR, 2019. https://proceedings.mlr.press/v97/chandak19a.html.

7. Learning from Demonstration to Guide Query Rewrite*

Contact: Eiko Yoneki

Desired Skills: Machine Learning, Database Systems, Query Optimisation of SQL Database

This This project aims to utilise Learning from Demonstration (LfD) to augment the query rewriting process within database query optimizers, with a specific emphasis on using Monte Carlo Tree Search (MCTS) as the backbone algorithm for intelligent rewriting strategies [1]. LfD is employed as a mechanism to integrate knowledge from existing query optimizers [3]. Traditionally, optimizers rely on heuristic or rule-based strategies, which may not be optimal for complex queries and evolving datasets. Through LfD, this project will extract and learn from the successful query rewriting patterns and strategies in existing optimizers, taking into account both expert-crafted rewrites and rules discovered by WeTune [2], an automated system for the discovery and verification of query rewrite rules. MCTS will be the core algorithm powering the model's decision-making process. MCTS is renowned for its effectiveness in handling large and high-dimensional search spaces, which makes it ideal for navigating through the space of possible query rewrites. Through the integration of LfD and MCTS, the project aims to develop an adaptive query rewriting system that is not only powerful but also highly flexible. By learning from evolving data and query patterns, this system aims to substantially improve query performance by applying intelligent rewriting strategies informed by LfD and guided by MCTS. A very good start point could an early-stage open-source project [4].

References:
[1] Xuanhe Zhou, Guoliang Li, Chengliang Chai, and Jianhua Feng. 2021. A learned query rewrite system using Monte Carlo tree search. VLDB, 2021.fjointf https://doi.org/10.14778/3485450.3485456.
[2] Zhaoguo Wang, Zhou Zhou, Yicun Yang, Haoran Ding, Gansen Hu, Ding Ding, Chuzhe Tang, Haibo Chen, and Jinyang Li. 2022. WeTune: Automatic Discovery and Verification of Query Rewrite Rules. SIGMOD 2022). https://doi.org/10.1145/3514221.3526125.
[3] Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, and Tim Kraska. 2021. Bao: Making Learned Query Optimization Practical. SIGMOD 2021. https://doi.org/10.1145/3448016.3452838.
[4] https://github.com/zhouxh19/LearnedRewrite.

9. Structured Bayesian optimisation with probabilistic programs *

Contact: Eiko Yoneki

Bayesian optimisation (e.g., [1]) offers a data-efficient approach to global optimisation of black-box functions by building a model of the unknown function, including our uncertainty about its values we have yet to observe. [2] introduces a general framework for performing structured Bayesian optimisation, where a probabilistic programming language provides a flexible means for defining surrogate models that can incorporate prior knowledge of structure in the underlying problem. In [3], we have further integrated the structure of the performance model as Directed Acyclic Graph (DAG) with BoTorch, where the experts can encode their understanding of the system in probabilistic models connected by edges denoting dependencies. The DAG model incorporates the parametric models defined in Pyro or PyTorch. The goal of this project is to further extend [3] with the ability to deal with high dimension parameter space, as well as adding multi-objective functionality, where you need to extend the design of DAG.

[1] Practical Bayesian Optimization of Machine Learning Algorithms. https://papers.nips.cc/paper/2012/hash/05311655a15b75fab86956663e1819cd-Abstract.html.
[2] BOAT: Building Auto-Tuners with Structured Bayesian Optimization https://dl.acm.org/doi/10.1145/3038912.3052662.
[3] Auto-tuning Spark with Bayesian optimisation. https://www.cl.cam.ac.uk/~ey204/pubs/MPHIL_P3/2021_Ross.pdf.

10. Bayesian optimisation tuning with transfer Learning from simulation to databases *

Contact: Eiko Yoneki

Databases come with a large number of tunable configurations [1]. We successfully used Bayesian optimisation in tuning varying numbers of parameters. However, a common issue when tuning these parameters is that a single experiment takes a long time to evaluate. One way to deal with the long evaluation time is to leverage other data sources. This project goal is to study using fast-to-execute simulators to aid the convergence speed of the optimiser. FoundationDB [2] is a recent popular database built as a simulator first and provides first-class support to performance and simulation testing at 1/10th the actual execution time [3]. This project proposes using Multi-fidelity [4] as an optimisation method to integrate the simulator in the Bayesian optimisation loop. This framework enables transfer learning from a low-accuracy-cheap model (simulator) to a high-accuracy-expensive model (FoundationDB).

[1] https://github.com/apple/foundationdb/blob/master/flow/Knobs.cpp.
[2] https://www.foundationdb.org.
[3] https://apple.github.io/foundationdb/testing.html.
[4] Song, J., Chen, Y. and Yue, Y., 2019, April. A general framework for multi-fidelity Bayesian optimization with gaussian processes. PMLR.

11. Explainable Reinforcement Learning for Device Placement *

Contact: Eiko Yoneki

Task scheduling over a finite number of resources in order to maximise resources efficiency is a traditionally difficult problem with heuristics made to approximate it. Previous work used Hierarchical Reinforcement Learning [1] to find optimal task scheduling for TensorFlow graph operations. This project proposes to investigate Contextual Multi-Armed bandit [2] with Bayesian learning (possible Bayesian neural networks) to allow injecting expert knowledge in the system. Using Bayesian models to define a hierarchy of the system allows it to converge faster and be easier to understand. For this project, Park [3] provides a simulation-based environment to test task scheduling on a TensorFlow graph. TensorFlow Probability [2] provides a probabilistic programming language that makes it possible to define Bayesian models that interacts with deep learning models.

[1] https://ai.google/research/pubs/pub46646.
[2] https://arxiv.org/pdf/1510.00757.
[3] https://github.com/park-project/park.

12. Hierarchical Device Placement from Probabilistic Perspective

Contact: Eiko Yoneki

This project tackles the hierarchical device placement from probabilistic perspective to seek further improvement to the original work by Google [1]. In [1], two networks, one a grouper and one a placer are used. The grouper learns valid grouping of TensorFlow operations and the placer learns optimal placement on multiple devices. The idea in this project is to replace the grouper network with a Variational Autoencoder (VAE) network that in the latent space and using Bayesian Optimisation (BO) to generate valid grouping representation, similar approach to [2]. The output will be fed into another BO that learns the optimal placement, which is a similar approach described in our BOAT project 3] learning the placement for distributed SGD. Furthermore you can add the second placer learning multiple objectives rather than just the runtime improvement: by making it learn more correlated tasks it should be expected to be converged faster.

[1] A. Mirhoseini et al.: A Hierarchical Model for Device Placement, LCLR, 2018. https://openreview.net/pdf?id=Hkc-TeZ0W.
[2] Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules.
[3] BOAT: Building Auto-Tuners with Structured Bayesian Optimization https://dl.acm.org/doi/10.1145/3038912.3052662.

13. Generalizable compiler (query planner) optimization using Equality Saturation with Reinforcement Learning **

Contact: Eiko Yoneki

Omelette [1] and Aurora [2] use reinforcement learning to guide the application of rules in Equality Saturation. A pressing challenge is generalisation. While the proposed methodologies can generalise across unseen expressions (e.g. query rewrite problems) in theory, expecting an RL agent to solve NP-hard combinatorial problems across sparse problem instances (such as the space of possible SQL query e-graphs) is not realistic. To improve generalisation, we propose an investigation into population-based Reinforcement Learning (PBT, example: Poppy [3]) which comes with theoretical guarantees for solving combinatorial optimisation problems more efficiently.

[1] Zak Sigh: Omelette: Deep Reinforcement Learning for Equality Saturation.
[2] George-Octavian Barbulescu: An Inquiry into Database Query Optimisation with Equality Saturation and Reinforcement Learning.
[3] POPULATION-BASED REINFORCEMENT LEARNING FOR COMBINATORIAL OPTIMIZATION.

14. Learned Extraction for Equality Saturation *

Contact: Eiko Yoneki

One of the core practical limitations of applying Equality Saturation for rewriting query plans is the latency overhead. In the database domain, Aurora [1] shows that most of the planning latency is linked to extracting the optimal expression from the e-graph. This is achieved by formulating the query plan extraction as an integer linear program (ILP). To minimise planning latency, a promising idea is to learn the e-graph extraction with reinforcement learning. However, a naive application of RL for extracting queries (or other types of expressions) from the set of all possible e-graphs is set for failure due to the enormous search space. To solve this, we propose an investigation into guiding ILP with RL (e.g. learning to branch [2]), to minimise the search space.

[1] George-Octavian Barbulescu: An Inquiry into Database Query Optimisation with Equality Saturation and Reinforcement Learning.
[2] Learning to Branch.

15. Cost modelling for tensor programs **

Contact: Eiko Yoneki

Cost models provide cheap estimates to get the performance of tensor programs without actual execution, so it is at the heart of accelerating tenor programs generation. For example, TVM [1] uses XGboost [2] to estimate tensor program runtime. More recently, advanced DNN-based cost models are proposed to perform more accurate estimation such as GNN [3] and transformers [4]. Cost modelling can be even cross-hardware [5]. On the other hand, distributed DNN training also needs delicate cost modelling to provide cheap estimation of the throughput of parallelization strategies, such as [6] and [7]. However, distributed cost modelling is mostly mathematical estimation. This project aims to investigate and benchmark the state-of-the-art cost modelling. In particular, we want to understand how good those cost models are, and if possible, we want to replace the distributed cost modelling with a DNN-based cost model.

[1] TVM: An Automated End-to-End Optimizing Compiler for Deep Learning.
[2] Learning to Optimize Tensor Programs.
[3] A Graph Neural Network-Based Performance Model for Deep Learning Applications.

Contact Email

Please email to eiko.yoneki@cl.cam.ac.uk for any question.

Computer Laboratory

MPhil, Part III, and Part II Project Suggestions (2023-2024)

Contact Email