MPhil, Part III, and Part II Project Suggestions (2022-2023)

Contact

Please contact Eiko Yoneki (email: eiko.yoneki@cl.cam.ac.uk) if you are interested in any project below. * indicates high recommendation!

1. Compiler Optimisation (or Database Query Planner Optimisation) using Equality Saturation with Reinforcement Learning ***

Contact: Eiko Yoneki

Equality Saturation (ES) [2] has been used for Compiler Optimisation, e.g. efficient exploration of a large space of rewrite rules, to effectively search through the program-space. However ES has limitations, and Omelette [1] investigated the limitation ES algorithm by identifying structures the algorithm may encounter that lead to non-termination states and require significant hand-crafted rules to get out of these states, where algorithms do not have a notion of knowing when to stop exploring the problem space and rather consume all of the computation budget. Omelette utilised a reinforcement learning (RL) and graph neural networks (GNN) to efficiently learn how to apply an ES and the agent uses an on-policy algorithm and GNN to learn to navigate the sequence of rewritten rules. The agent allows ES to become general-purpose and adapt to any new environment by simply interacting with the environment rather than using hand-crafted rules. Omelette is built on top of Egg [3], which has 2 phases, 1) building the egraph, and 2) extract subgraphs from built egraph. E-graph is a graph-like data structure that can represent many rewrite rules simultaneously. Omelette is currently applied RL on the first phase, and our group is working on applying RL onto the second phase. In this project, you would work on the case study in Compiler Optimisation (e.g. [4], [5]) (or Database Query Planner Optimisation) with the above optimised ES with RL specifically exploring its scalability.

[1] Zak Sigh: Omelette: Deep Reinforcement Learning for Equality Saturation.
[2] Ross Tate, Michael Stepp, Zachary Tatlock, and Sorin Lerner. Equality saturation: A new approach to optimization. Logical Methods in Computer Science.
[3] egg project: https://egraphs-good.github.io.
[4] Equality Saturation for Tensor Graph Superoptimization: https://arxiv.org/abs/2101.01332.
[5] Yilin Sun: Optimizing LLVM Pass List using Reinforcement Learning: https://www.cl.cam.ac.uk/~ey204/pubs/MPHIL_P3/2022_Yilin.pdf.

2. ML-compiler: Joint optimisation of tensor graph and GPU kernels***

Contact: Eiko Yoneki

TASO [2] optimises tensor graph structure, and its backend uses CuDNN to execute tensor Operators [1]. A recent hardware library, Cutlass, allows users to customise their own kernels, as if users can configure their hardware. This brings the opportunities of jointly optimisation of tensor graph and GPU kernels. The project aims to replace TASO's backend from CuDNN [3] to Cutlass [4] . Ideal candidates should have some knowledge writing C++ level codes and preferably know the basics of CUDA programming. The candidates can get started by benchmarking common tensor operators, such as MatMul, Conv2D etc, and compare the performance between different ML compilers. It is a great fit for those who want to know more and deeper about ML compilers.

[1] End-to-end comparison.
[2] Z. Jia, et al.: TASO: Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions, SOSP, 2019.
[3] CuDNN: NVIDIA cuDNN Installation Guide
[4] Cutlass: CUDA C++ template abstractions

3. Model-based Reinforcement Learning in Computer Systems **

Contact: Eiko Yoneki

Reinforcement learning (RL) is gaining interest as a generic optimisation and control method in data management tasks such as resource management/scheduling, database tuning, or stream processing. A central challenge in applying RL to such problems are long decision evaluation times. A single step in a real system may take multiple minutes, as opposed to simulators for standard benchmark tasks (e.g. Atari games), which can execute thousands of steps per second. One way of improving sample efficiency is model-based RL [1], wherein a model of the environment is learned and subsequently used to plan and evaluate decisions. A 'world-model' approach [2] first learns a compressed version of a state representation, then a recurrent mixture density network to predict future state / reward distributions of compressed states which can be used in place of a simulator. In this project, the student will evaluate a world-model based approach against a model free approach using software infrastructure. The world model is algorithm agnostic and you can further investigate other algorithms such as the Rainbow comparison [4], which was used Model-Based Reinforcement Learning for Atari paper for their comparisons. You can explore PPO or C51, which could yield better sample complexity and/or max performance. Example applications include stream processing scheduling, static graph rewriting (e.g. TensorFlow, LLVM IRs), and various database administration tasks. The aim of the project is to evaluate different representations (e.g. graph neural networks) and planning modes to gain novel insights into cost/benefits of model-based deep RL for this domain The suggestive project is selection of LLVM compiler optimisation on selection of the pass list, which requires dealing with the combinatorial problem shown in our previous work on LLVM optimisation. The previous project work [3] can be used as a starting point. Evaluating the project with SPEC 2017 (https://www.spec.org/cpu2017/). benchmarking will be ideal.

[1] World Models. https://arxiv.org/abs/1803.10122.
[2] https://worldmodels.github.io.
[3] Yilin Sun: Optimizing LLVM Pass List using Reinforcement Learning.
[4] Hessel, M. et al. Rainbow: Combining Improvements in Deep Reinforcement Learning. AAAI, 2018. https://arxiv.org/pdf/1710.02298.pdf.

4. Hyperparameter optimisation for DRL via Monte Carlo Tree Search **

Contact: Eiko Yoneki

Hyperparameter optimisation (HPO) is important for machine learning models to obtain high performance [2]. The simpler approaches include grid search, and random search, however they suffer from the lack of guidance. Learned models like Bayesian Optimisation (BO) require high-quality data points and fails when the parameters' dimension increases. Monte Carlo Tree Search (MCTS) is a light model which fits the black-box optimisation within large and high-dimensional space. It is a probabilistic and heuristic-driven search algorithm that combines the classic tree search implementations alongside machine learning principles of reinforcement learning. MCTS approach has been explored in architecture search problems [1, 3]. Hyper parameters such as batch size, discount factor, learning rate, and regularisation parameters, are manually chosen in many cases. Deep Reinforcement Learning models like DQN and DDPG are sensitive to these common hyperparameter settings, where these hyperparameters include both continuous and discrete types, and the search space is also high-dimensional.

This project explores MCTS to search for the best hyperparameter settings for sensitive RL models, especially model-based RL [5]. The case study will be the extensions on Omelette project [6] and LLVM optimization [7] from our group, where the basic idea here is to use MCTS to do the search space partitioning and then apply light local optimization models like BO to tune the Hyperparameters used in LLVM optimization or Omelette, the start point can be referred in [1].

[1] Wang L, et al. Learning search space partition for black-box optimization using monte carlo tree search. https://arxiv.org/abs/2007.00708.
[2] Parker-Holder, Jack, et al. "Automated reinforcement learning: A survey and open problems. https://arxiv.org/pdf/2201.03916.pdf.
[3] https://github.com/facebookresearch/LaMCTS.
[4] Paul et al.: Fast efficient hyperparameter tuning for policy gradients. NIPS 2019. https://arxiv.org/abs/1902.06583.
[5] Zhang, Baohe, et al.: On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning. https://arxiv.org/abs/2102.13651.
[6] Zak Sigh: Omelette: Deep Reinforcement Learning for Equality Saturation.
[7] Yilin Sun: Optimizing LLVM Pass List using Reinforcement Learning.

5. Monte Carlo Tree Search for Equality Saturation **

Contact: Eiko Yoneki

The Omelette[1] project uses a PPO agent to build the E-graph for Equality saturation, such that the E-graph is more optimized. In this project, we want to explore the use of MCTS to build the E-graph. MCTS[2] approach should be more efficient than model-free method like PPO based RL, especially for large E-graph. We aim to study how well MCTS can scale as the E-graph grows and compare with Omelette's PPO agent. If time permits, we will apply MCTS based E-graph builder to real-world ML compilers, e.g. Tensat.

[1] Zak Sigh: Omelette: Deep Reinforcement Learning for Equality Saturation.
[1] Wang L, et al. Learning search space partition for black-box optimization using monte carlo tree search. https://arxiv.org/abs/2007.00708.
[2] MCTS: http://www.incompleteideas.net/609%20dropbox/other%20readings%20and%20resources/MCTS-survey.pdf.
[3] Equality Saturation for Tensor Graph Superoptimization: https://arxiv.org/abs/2101.01332.

6. Inferring reward function via Inverse Reinforcement Learning **

Contact: Eiko Yoneki

Reinforcement Learning (RL) has achieved impressive results recently, from playing video games to solving robotic control problems. However, it is challenging in many applications to design a reward function that robustly describes the desired task. First proposed by [1] in 1998, Inverse Reinforcement Learning (IRL), which evolved from imitation learning, can be formulated as a problem of inferring the reward function of an agent, given its policy or observed behaviour. Recent works focused more on the robustness of IRL in unknown environments [2,3]. IRL is suitable for environments where it's hard to define the reward function, or it's expensive to obtain the real-time reward [4]. System-level RL tuning problems [6] are in such cases when people typically define reward functions based on the actual execution time. One of the most popular and open-source IRL methods is Generative Adversarial Imitation Learning (GAIL) [5]. GAIL combines Generative Adversarial Network (GAN) and IRL and alternatively updates the agent's reward and policy until reaching optimum. One challenge here is that IRL needs expert data to update the agent's reward functions and policy. However, unlike video games, building a virtual environment for system-level optimisation problems to collect such data is sometimes impossible. Some underlying optimizers may help here. Thus, one good exploration could be using GAIL to infer reward functions within such expensive environments. In this project you would explore the reward function using GAIL in our existing project on the optimisation of LLVM Pass List using Reinforcement Learning [7].

[1] Ng, Andrew Y., and Stuart Russell: Algorithms for inverse reinforcement learning. ICML 2000.
[2] Lindner et al.: Active Exploration for Inverse Reinforcement Learning. ICML workshop 2022.
[3] Fu et al.: Learning robust rewards with adversarial inverse reinforcement learning. ICLR 2018.
[4] Arora, Saurabh, and Prashant Doshi: A survey of inverse reinforcement learning: Challenges, methods and progress.
[5] Ho J, Ermon S. Generative adversarial imitation learning.
[6] Zhang, Ji, et al.: An end-to-end automatic cloud database tuning system using deep reinforcement learning.
[7] Yilin Sun: Optimizing LLVM Pass List using Reinforcement Learning.

7. Structured Bayesian optimisation with probabilistic programs **

Contact: Eiko Yoneki

Bayesian optimisation (e.g., [1]) offers a data-efficient approach to global optimisation of black-box functions by building a model of the unknown function, including our uncertainty about its values we have yet to observe. [2] introduces a general framework for performing structured Bayesian optimisation, where a probabilistic programming language provides a flexible means for defining surrogate models that can incorporate prior knowledge of structure in the underlying problem. In [3], we have further integrated the structure of the performance model as Directed Acyclic Graph (DAG) with BoTorch, where the experts can encode their understanding of the system in probabilistic models connected by edges denoting dependencies. The DAG model incorporates the parametric models defined in Pyro or PyTorch. The goal of this project is to further extend [3] with the ability to deal with high dimension parameter space, as well as adding multi-objective functionality, where you need to extend the design of DAG.

[1] Practical Bayesian Optimization of Machine Learning Algorithms. https://papers.nips.cc/paper/2012/hash/05311655a15b75fab86956663e1819cd-Abstract.html.
[2] BOAT: Building Auto-Tuners with Structured Bayesian Optimization https://dl.acm.org/doi/10.1145/3038912.3052662.
[3] Auto-tuning Spark with Bayesian optimisation. https://www.cl.cam.ac.uk/~ey204/pubs/MPHIL_P3/2021_Ross.pdf.

8. Bayesian optimisation tuning with transfer Learning from simulation to databases **

Contact: Eiko Yoneki

Databases come with a large number of tunable configurations [1]. We successfully used Bayesian optimisation in tuning varying numbers of parameters. However, a common issue when tuning these parameters is that a single experiment takes a long time to evaluate. One way to deal with the long evaluation time is to leverage other data sources. This project goal is to study using fast-to-execute simulators to aid the convergence speed of the optimiser. FoundationDB [2] is a recent popular database built as a simulator first and provides first-class support to performance and simulation testing at 1/10th the actual execution time [3]. This project proposes using Multi-fidelity [4] as an optimisation method to integrate the simulator in the Bayesian optimisation loop. This framework enables transfer learning from a low-accuracy-cheap model (simulator) to a high-accuracy-expensive model (FoundationDB).

[1] https://github.com/apple/foundationdb/blob/master/flow/Knobs.cpp.
[2] https://www.foundationdb.org.
[3] https://apple.github.io/foundationdb/testing.html.
[4] Song, J., Chen, Y. and Yue, Y., 2019, April. A general framework for multi-fidelity Bayesian optimization with gaussian processes. PMLR.

9. Explainable Reinforcement Learning for Device Placement *

Contact: Eiko Yoneki

Task scheduling over a finite number of resources in order to maximise resources efficiency is a traditionally difficult problem with heuristics made to approximate it. Previous work used Hierarchical Reinforcement Learning [1] to find optimal task scheduling for TensorFlow graph operations. This project proposes to investigate Contextual Multi-Armed bandit [2] with Bayesian learning (possible Bayesian neural networks) to allow injecting expert knowledge in the system. Using Bayesian models to define a hierarchy of the system allows it to converge faster and be easier to understand. For this project, Park [3] provides a simulation-based environment to test task scheduling on a TensorFlow graph. TensorFlow Probability [2] provides a probabilistic programming language that makes it possible to define Bayesian models that interacts with deep learning models.

[1] https://ai.google/research/pubs/pub46646.
[2] https://arxiv.org/pdf/1510.00757.
[3] https://github.com/park-project/park.

10. Hierarchical Device Placement from Probabilistic Perspective

Contact: Eiko Yoneki

This project tackles the hierarchical device placement from probabilistic perspective to seek further improvement to the original work by Google [1]. In [1], two networks, one a grouper and one a placer are used. The grouper learns valid grouping of TensorFlow operations and the placer learns optimal placement on multiple devices. The idea in this project is to replace the grouper network with a Variational Autoencoder (VAE) network that in the latent space and using Bayesian Optimisation (BO) to generate valid grouping representation, similar approach to [2]. The output will be fed into another BO that learns the optimal placement, which is a similar approach described in our BOAT project 3] learning the placement for distributed SGD. Furthermore you can add the second placer learning multiple objectives rather than just the runtime improvement: by making it learn more correlated tasks it should be expected to be converged faster.

[1] A. Mirhoseini et al.: A Hierarchical Model for Device Placement, LCLR, 2018. https://openreview.net/pdf?id=Hkc-TeZ0W.
[2] Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules.
[3] BOAT: Building Auto-Tuners with Structured Bayesian Optimization https://dl.acm.org/doi/10.1145/3038912.3052662.

Contact Email

Please email to eiko.yoneki@cl.cam.ac.uk for any question.

Computer Laboratory

MPhil, Part III, and Part II Project Suggestions (2022-2023)

Contact Email