MPhil, Part III, and Part II Project Suggestions (2022-2023)
|
Please contact Eiko Yoneki (email: eiko.yoneki@cl.cam.ac.uk) if you are interested in any project below. * indicates high recommendation! 1. Compiler Optimisation (or Database Query Planner Optimisation) using Equality Saturation with Reinforcement Learning *** Contact:
Eiko Yoneki
Equality Saturation (ES) [2] has been used for Compiler Optimisation, e.g. efficient exploration of a large space of rewrite rules, to effectively search through the program-space.
However ES has limitations, and Omelette [1] investigated the limitation ES algorithm by identifying structures the algorithm may encounter that lead to non-termination states and require significant hand-crafted rules to get out of these states, where algorithms do not have a notion of knowing when to stop exploring the problem space and rather consume all of the computation budget. Omelette utilised a reinforcement learning (RL) and graph neural networks (GNN) to efficiently learn how to apply an ES and the agent uses an on-policy algorithm and GNN to learn to navigate the sequence of rewritten rules. The agent allows ES to become general-purpose and adapt to any new environment by simply interacting with the environment rather than using hand-crafted rules.
Omelette is built on top of Egg [3], which has 2 phases, 1) building the egraph, and 2) extract subgraphs from built egraph. E-graph is a graph-like data structure that can represent many rewrite rules simultaneously. Omelette is currently applied RL on the first phase, and our group is working on applying RL onto the second phase.
In this project, you would work on the case study in Compiler Optimisation (e.g. [4], [5]) (or Database Query Planner Optimisation) with the above optimised ES with RL specifically exploring its scalability.
[1] Zak Sigh: Omelette: Deep Reinforcement Learning for Equality Saturation. 2. ML-compiler: Joint optimisation of tensor graph and GPU kernels*** Contact:
Eiko Yoneki
TASO [2] optimises tensor graph structure, and its backend uses CuDNN to execute tensor Operators [1]. A recent hardware library, Cutlass, allows users to customise their own kernels, as if users can configure their hardware. This brings the opportunities of jointly optimisation of tensor graph and GPU kernels.
The project aims to replace TASO's backend from CuDNN [3] to Cutlass [4] . Ideal candidates should have some knowledge writing C++ level codes and preferably know the basics of CUDA programming. The candidates can get started by benchmarking common tensor operators, such as MatMul, Conv2D etc, and compare the performance between different ML compilers. It is a great fit for those who want to know more and deeper about ML compilers.
[1] End-to-end comparison. 3. Model-based Reinforcement Learning in Computer Systems ** Contact:
Eiko Yoneki
Reinforcement learning (RL) is gaining interest as a generic optimisation and control method in data management tasks such as resource management/scheduling, database tuning, or stream processing. A central challenge in applying RL to such problems are long decision evaluation times. A single step in a real system may take multiple minutes, as opposed to simulators for standard benchmark tasks (e.g. Atari games), which can execute thousands of steps per second.
One way of improving sample efficiency is model-based RL [1], wherein a model of the environment is learned and subsequently used to plan and evaluate decisions. A 'world-model' approach [2] first learns a compressed version of a state representation, then a recurrent mixture density network to predict future state / reward distributions of compressed states which can be used in place of a simulator. In this project, the student will evaluate a world-model based approach against a model free approach using software infrastructure. The world model is algorithm agnostic and you can further investigate other algorithms such as the Rainbow comparison [4], which was used Model-Based Reinforcement Learning for Atari paper for their comparisons. You can explore PPO or C51, which could yield better sample complexity and/or max performance.
Example applications include stream processing scheduling, static graph rewriting (e.g. TensorFlow, LLVM IRs), and various database administration tasks. The aim of the project is to evaluate different representations (e.g. graph neural networks) and planning modes to gain novel insights into cost/benefits of model-based deep RL for this domain The suggestive project is selection of LLVM compiler optimisation on selection of the pass list, which requires dealing with the combinatorial problem shown in our previous work on LLVM optimisation. The previous project work [3] can be used as a starting point. Evaluating the project with SPEC 2017 (https://www.spec.org/cpu2017/). benchmarking will be ideal.
[1] World Models. https://arxiv.org/abs/1803.10122. 4. Hyperparameter optimisation for DRL via Monte Carlo Tree Search ** Contact:
Eiko Yoneki
Hyperparameter optimisation (HPO) is important for machine learning models to obtain high performance [2]. The simpler approaches include grid search, and random search, however they suffer from the lack of guidance. Learned models like Bayesian Optimisation (BO) require high-quality data points and fails when the parameters' dimension increases. Monte Carlo Tree Search (MCTS) is a light model which fits the black-box optimisation within large and high-dimensional space. It is a probabilistic and heuristic-driven search algorithm that combines the classic tree search implementations alongside machine learning principles of reinforcement learning. MCTS approach has been explored in architecture search problems [1, 3]. Hyper parameters such as batch size, discount factor, learning rate, and regularisation parameters, are manually chosen in many cases. Deep Reinforcement Learning models like DQN and DDPG are sensitive to these common hyperparameter settings, where these hyperparameters include both continuous and discrete types, and the search space is also high-dimensional.
[1] Wang L, et al. Learning search space partition for black-box optimization using monte carlo tree search. https://arxiv.org/abs/2007.00708. 5. Monte Carlo Tree Search for Equality Saturation ** Contact:
Eiko Yoneki
The Omelette[1] project uses a PPO agent to build the E-graph for Equality saturation, such that the E-graph is more optimized. In this project, we want to explore the use of MCTS to build the E-graph. MCTS[2] approach should be more efficient than model-free method like PPO based RL, especially for large E-graph. We aim to study how well MCTS can scale as the E-graph grows and compare with Omelette's PPO agent. If time permits, we will apply MCTS based E-graph builder to real-world ML compilers, e.g. Tensat.
[1] Zak Sigh: Omelette: Deep Reinforcement Learning for Equality Saturation. 6. Inferring reward function via Inverse Reinforcement Learning ** Contact:
Eiko Yoneki
Reinforcement Learning (RL) has achieved impressive results recently, from playing video games to solving robotic control problems. However, it is challenging in many applications to design a reward function that robustly describes the desired task. First proposed by [1] in 1998, Inverse Reinforcement Learning (IRL), which evolved from imitation learning, can be formulated as a problem of inferring the reward function of an agent, given its policy or observed behaviour. Recent works focused more on the robustness of IRL in unknown environments [2,3].
IRL is suitable for environments where it's hard to define the reward function, or it's expensive to obtain the real-time reward [4]. System-level RL tuning problems [6] are in such cases when people typically define reward functions based on the actual execution time.
One of the most popular and open-source IRL methods is Generative Adversarial Imitation Learning (GAIL) [5]. GAIL combines Generative Adversarial Network (GAN) and IRL and alternatively updates the agent's reward and policy until reaching optimum. One challenge here is that IRL needs expert data to update the agent's reward functions and policy. However, unlike video games, building a virtual environment for system-level optimisation problems to collect such data is sometimes impossible. Some underlying optimizers may help here. Thus, one good exploration could be using GAIL to infer reward functions within such expensive environments.
In this project you would explore the reward function using GAIL in our existing project on the optimisation of LLVM Pass List using Reinforcement Learning [7].
[1] Ng, Andrew Y., and Stuart Russell: Algorithms for inverse reinforcement learning. ICML 2000. 7. Structured Bayesian optimisation with probabilistic programs ** Contact:
Eiko Yoneki
Bayesian optimisation (e.g., [1]) offers a data-efficient approach to global optimisation of black-box functions by building a model of the unknown function, including our uncertainty about its values we have yet to observe. [2] introduces a general framework for performing structured Bayesian optimisation, where a probabilistic programming language provides a flexible means for defining surrogate models that can incorporate prior knowledge of structure in the underlying problem. In [3], we have further integrated the structure of the performance model as Directed Acyclic Graph (DAG) with BoTorch, where the experts can encode their understanding of the system in probabilistic models connected by edges denoting dependencies. The DAG model incorporates the parametric models defined in Pyro or PyTorch.
The goal of this project is to further extend [3] with the ability to deal with high dimension parameter space, as well as adding multi-objective functionality, where you need to extend the design of DAG.
[1] Practical Bayesian Optimization of Machine Learning Algorithms. https://papers.nips.cc/paper/2012/hash/05311655a15b75fab86956663e1819cd-Abstract.html. 8. Bayesian optimisation tuning with transfer Learning from simulation to databases ** Contact:
Eiko Yoneki
Databases come with a large number of tunable configurations [1]. We successfully used Bayesian optimisation in tuning varying numbers of parameters. However, a common issue when tuning these parameters is that a single experiment takes a long time to evaluate. One way to deal with the long evaluation time is to leverage other data sources. This project goal is to study using fast-to-execute simulators to aid the convergence speed of the optimiser. FoundationDB [2] is a recent popular database built as a simulator first and provides first-class support to performance and simulation testing at 1/10th the actual execution time [3]. This project proposes using Multi-fidelity [4] as an optimisation method to integrate the simulator in the Bayesian optimisation loop. This framework enables transfer learning from a low-accuracy-cheap model (simulator) to a high-accuracy-expensive model (FoundationDB).
[1] https://github.com/apple/foundationdb/blob/master/flow/Knobs.cpp. 9. Explainable Reinforcement Learning for Device Placement * Contact:
Eiko Yoneki
Task scheduling over a finite number of resources in order to maximise resources efficiency is a traditionally difficult problem with heuristics made to approximate it. Previous work used Hierarchical Reinforcement Learning [1] to find optimal task scheduling for TensorFlow graph operations.
This project proposes to investigate Contextual Multi-Armed bandit [2] with Bayesian learning (possible Bayesian neural networks) to allow injecting expert knowledge in the system. Using Bayesian models to define a hierarchy of the system allows it to converge faster and be easier to understand.
For this project, Park [3] provides a simulation-based environment to test task scheduling on a TensorFlow graph. TensorFlow Probability [2] provides a probabilistic programming language that makes it possible to define Bayesian models that interacts with deep learning models.
[1] https://ai.google/research/pubs/pub46646. 10. Hierarchical Device Placement from Probabilistic Perspective Contact:
Eiko Yoneki
This project tackles the hierarchical device placement from probabilistic perspective to seek further improvement to the original work by Google [1]. In [1], two networks, one a grouper and one a placer are used. The grouper learns valid grouping of TensorFlow operations and the placer learns optimal placement on multiple devices. The idea in this project is to replace the grouper network with a Variational Autoencoder (VAE) network that in the latent space and using Bayesian Optimisation (BO) to generate valid grouping representation, similar approach to [2]. The output will be fed into another BO that learns the optimal placement, which is a similar approach described in our BOAT project 3] learning the placement for distributed SGD. Furthermore you can add the second placer learning multiple objectives rather than just the runtime improvement: by making it learn more correlated tasks it should be expected to be converged faster.
[1] A. Mirhoseini et al.: A Hierarchical Model for Device Placement, LCLR, 2018. https://openreview.net/pdf?id=Hkc-TeZ0W. Contact EmailPlease email to eiko.yoneki@cl.cam.ac.uk for any question. |