MPhil, Part III, and Part II Project Suggestions (2023-2024)
|
Please contact Eiko Yoneki (email: eiko.yoneki@cl.cam.ac.uk) if you are interested in any project below. * indicates high recommendation! 1. Tensor expression superoptimisation via deep reinforcement learning *** Contact:
Eiko Yoneki
AlphaDev [1][2] shows RL agent can discover faster sorting algorithm via playing the assembly game. In recent advances of machine learning compiler, EinNet [3] [4] shows how to discover faster tensor programs via rule-based transformation. We are interested in investigating how RL works in discovering faster tensor programs at tensor expression level transformation. You should implement a graph RL agent to play in the tensor game to discover faster tensor programs. Alternatively, we have an internal RL-driven graph transformation system, which you can leverage and apply to the tensor expression level transformation.
To get started, build and benchmark EinNet to see how it works with rule-based transformation on a simple DNN (e.g. self-attention). Then, replace their search algorithm with RL algorithm.
References:
[1] Faster sorting algorithms discovered using deep reinforcement learning, Nature, 2023. 2. Bayesian Optimisation for tensor code generation *** Contact:
Eiko Yoneki
Desired Skills: Strong interest in tensor program optimisation, Some knowledge/interest in Bayesian optimisation, Python, with some knowledge in C++
Tensor codes are run on massively parallel hardware. When generating tensor code (also called auto-scheduling), TVM Compiler [1] needs to search through a number of parameters. A State-of-the-art auto-scheduler is Ansor [2], which applies rule-based mutation to generate tensor code templates and then fine-tunes those templates via Evolutionary Search. Bayesian Optimisation (BayesOpt) [3] is a better approach to efficiently search the tensor code templates than Evolutionary Search.
In this project, at first, TVM is set up for benchmarking with ResNet and Bert using CPU and GPU (possible a few different types of GPU). Next same benchmarking with Compiler Cutlass by NVIDIA should be experimented.
Afterwards, exploring using BayesOpt for high-performance tensor code generation, and benchmark black-box algorithms for tensor code generation. The main interface for tensor code generation in TVM will be through MetaScheduler [6], which provides a fairly simple Python interface for various search methodologies [7]. The project has a particular interest in tensor code generation for tensor cores, which are equipped by recent generation of GPUs (since the Turing micro-architectures) as a domain-specific architectures to massively accelerate tensor programs.
References:
[1] TVM: An Automated End-to-End Optimizing Compiler for Deep Learning https://www.usenix.org/system/files/osdi18-chen.pdf. 3. Inferring reward function via Inverse Reinforcement Learning ** Contact:
Eiko Yoneki
Desired Skills: Inverse Reinforcement Learning, LLVM Optimisation
Reinforcement Learning (RL) has achieved impressive results recently, from playing video games to solving robotic control problems. However, it is challenging in many applications to design a reward function that robustly describes the desired task. Inverse Reinforcement Learning (IRL) [1], which evolved from imitation learning, can be formulated as a problem of inferring the reward function of an agent, given its policy or observed behaviour. Recent works focused more on the robustness of IRL in unknown environments [2,3]. IRL is suitable for environments where it's hard to define the reward function, or it's expensive to obtain the real-time reward [4]. System-level RL tuning problems [6] are in such cases when people typically define reward functions based on the actual execution time. One of the most popular and open-source IRL methods is Generative Adversarial Imitation Learning (GAIL) [5]. GAIL combines Generative Adversarial Network (GAN) and IRL and alternatively updates the agent's reward and policy until reaching optimum. One challenge here is that IRL needs expert data to update the agent's reward functions and policy. However, unlike video games, building a virtual environment for system-level optimisation problems to collect such data is sometimes impossible. Some underlying optimisers may help here. Thus, one good exploration could be using GAIL to infer reward functions within such expensive environments. In this project you would explore the reward function using GAIL in our existing project on the optimisation of LLVM Pass List using Reinforcement Learning [7].
References:
[1] Ng, Andrew Y., and Stuart Russell: Algorithms for inverse reinforcement learning. ICML 2000. 4. Advancing Computer Systems Optimisation through Model-Based Reinforcement Learning ** Contact:
Eiko Yoneki
Desired Skills: Reinforcement Learning, LLVM Optimisation
Reinforcement Learning (RL) is gaining interest as a generic optimisation and control method in data management tasks such as resource management, scheduling, database tuning, or stream processing. However, its application is hindered by sample inefficiency and extensive decision evaluation times. Model-Based Reinforcement Learning (MBRL), employing techniques like Probabilistic Ensemble-based models [1], World Models [2] and Dyna-style planning [4] etc. addresses these issues by learning environment models.
This project tackles the LLVM compiler optimisation on selection of the pass list. This is critical as pass list selection affects code performance and size, and MBRL's capability to model these interactions is invaluable. Additionally, MBRL challenges on enhancing generalisation ability across different programs, essential due to the expensive training of RL agents.
You would use MBRL for intelligent pass list selection in LLVM and work on enhancing generalisation across programs. They will investigate various MBRL techniques and evaluate their effectiveness in modeling the compiler environment and improving transfer learning.
You would also evaluate a world-model based approach against a model free approach using software infrastructure. The previous project work [3] can be used as a starting point. Evaluating the project with SPEC 2017 (https://www.spec.org/cpu2017/). benchmarking will be ideal.
References:
[1] Kurtland Chua, Roberto Calandra, Rowan McAllister, and Sergey Levine. 2018. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. NIPS'18. https://dl.acm.org/doi/10.5555/3327345.3327385. 5. Enabling fast and efficient hyper-parameter tuning with search space partitions for large models ** Contact:
Eiko Yoneki
Desired Skills: Reinforcement Learning, Gaussian Processes, Bayesian Optimisation, Parallel Computing
Hyperparameter tuning of large-scale deep learning models, such as ResNet-18, VGG-19, ResNet-50, and ResNet-152, is a critical but computationally intensive task due to the large search space. This project proposes a novel hyperparameter tuning algorithm that integrates Monte Carlo Tree Search (MCTS) and Bayesian Optimisation (BO). MCTS, known for its efficiency in large and high-dimensional spaces, will be employed to partition the hyperparameter space into smaller subspaces, which inherently enables parallelism.
MCTS's probabilistic and heuristic-driven approach makes it lightweight and well-suited for black-box optimisation [1, 3]. On the other hand, BO is particularly efficient in smaller parameter spaces, making it an ideal choice for searching within the partitions created by MCTS. By combining MCTS's efficiency in high-dimensional spaces with BO's efficiency in lower-dimensional spaces, this approach aims to significantly enhance the speed and performance of hyperparameter tuning. Some suggested benchmarks could be Cifar10 and Cifar100 datasets [7] and compared against existing methods in terms of computational efficiency, accuracy improvements, and resource allocation strategies.
References:
[1] Wang L, et al. Learning search space partition for black-box optimization using monte carlo tree search. https://proceedings.neurips.cc/paper/2020/file/e2ce14e81dba66dbff9cbc35ecfdb704-Paper.pdf. 6. Harnessing Structured Action Spaces for Efficient RL-based System Optimisation* Contact:
Eiko Yoneki
Desired Skills: Reinforcement Learning, Database Systems, Machine Learning, Parameter Tuning
Reinforcement Learning (RL) is increasingly applied to system optimisation, but handling large and combinatorial action spaces remains a challenge. This project aims to address this by developing methods to compress and structure action spaces, thereby enhancing RL efficiency and performance in system optimisation tasks.
The central idea is to leverage hierarchical and structured action spaces, enabling RL agents to make decisions at varying granularities and foster rapid convergence. Additionally, the integration of action pruning and attention mechanisms can further focus the learning process on relevant actions.
To validate the developed methods, the project will include evaluations across diverse system optimisation tasks, such as database tuning (high-dimensional action space) [1] and index advising (large discrete action space) [2], as examples. This is not limited to these examples, as the methodologies are intended to be generalisable across various system optimisation domains.
The anticipated outcomes include innovative algorithms and techniques that significantly improve the efficiency and effectiveness of RL in system optimisation by intelligently compressing and structuring the action space. This project paves the way for more robust and scalable RL applications in computer systems. Many previous works and methods can be widely used here [3,4,5]
References:
[1] Zhang, Ji, et al.: An end-to-end automatic cloud database tuning system using deep reinforcement learning. https://dl.acm.org/doi/pdf/10.1145/3299869.3300085. 7. Learning from Demonstration to Guide Query Rewrite* Contact:
Eiko Yoneki
Desired Skills: Machine Learning, Database Systems, Query Optimisation of SQL Database
This This project aims to utilise Learning from Demonstration (LfD) to augment the query rewriting process within database query optimizers, with a specific emphasis on using Monte Carlo Tree Search (MCTS) as the backbone algorithm for intelligent rewriting strategies [1].
LfD is employed as a mechanism to integrate knowledge from existing query optimizers [3]. Traditionally, optimizers rely on heuristic or rule-based strategies, which may not be optimal for complex queries and evolving datasets. Through LfD, this project will extract and learn from the successful query rewriting patterns and strategies in existing optimizers, taking into account both expert-crafted rewrites and rules discovered by WeTune [2], an automated system for the discovery and verification of query rewrite rules.
MCTS will be the core algorithm powering the model's decision-making process. MCTS is renowned for its effectiveness in handling large and high-dimensional search spaces, which makes it ideal for navigating through the space of possible query rewrites. Through the integration of LfD and MCTS, the project aims to develop an adaptive query rewriting system that is not only powerful but also highly flexible.
By learning from evolving data and query patterns, this system aims to substantially improve query performance by applying intelligent rewriting strategies informed by LfD and guided by MCTS. A very good start point could an early-stage open-source project [4].
References:
[1] Xuanhe Zhou, Guoliang Li, Chengliang Chai, and Jianhua Feng. 2021. A learned query rewrite system using Monte Carlo tree search. VLDB, 2021.fjointf https://doi.org/10.14778/3485450.3485456. 9. Structured Bayesian optimisation with probabilistic programs * Contact:
Eiko Yoneki
Bayesian optimisation (e.g., [1]) offers a data-efficient approach to global optimisation of black-box functions by building a model of the unknown function, including our uncertainty about its values we have yet to observe. [2] introduces a general framework for performing structured Bayesian optimisation, where a probabilistic programming language provides a flexible means for defining surrogate models that can incorporate prior knowledge of structure in the underlying problem. In [3], we have further integrated the structure of the performance model as Directed Acyclic Graph (DAG) with BoTorch, where the experts can encode their understanding of the system in probabilistic models connected by edges denoting dependencies. The DAG model incorporates the parametric models defined in Pyro or PyTorch.
The goal of this project is to further extend [3] with the ability to deal with high dimension parameter space, as well as adding multi-objective functionality, where you need to extend the design of DAG.
[1] Practical Bayesian Optimization of Machine Learning Algorithms. https://papers.nips.cc/paper/2012/hash/05311655a15b75fab86956663e1819cd-Abstract.html. 10. Bayesian optimisation tuning with transfer Learning from simulation to databases * Contact:
Eiko Yoneki
Databases come with a large number of tunable configurations [1]. We successfully used Bayesian optimisation in tuning varying numbers of parameters. However, a common issue when tuning these parameters is that a single experiment takes a long time to evaluate. One way to deal with the long evaluation time is to leverage other data sources. This project goal is to study using fast-to-execute simulators to aid the convergence speed of the optimiser. FoundationDB [2] is a recent popular database built as a simulator first and provides first-class support to performance and simulation testing at 1/10th the actual execution time [3]. This project proposes using Multi-fidelity [4] as an optimisation method to integrate the simulator in the Bayesian optimisation loop. This framework enables transfer learning from a low-accuracy-cheap model (simulator) to a high-accuracy-expensive model (FoundationDB).
[1] https://github.com/apple/foundationdb/blob/master/flow/Knobs.cpp. 11. Explainable Reinforcement Learning for Device Placement * Contact:
Eiko Yoneki
Task scheduling over a finite number of resources in order to maximise resources efficiency is a traditionally difficult problem with heuristics made to approximate it. Previous work used Hierarchical Reinforcement Learning [1] to find optimal task scheduling for TensorFlow graph operations.
This project proposes to investigate Contextual Multi-Armed bandit [2] with Bayesian learning (possible Bayesian neural networks) to allow injecting expert knowledge in the system. Using Bayesian models to define a hierarchy of the system allows it to converge faster and be easier to understand.
For this project, Park [3] provides a simulation-based environment to test task scheduling on a TensorFlow graph. TensorFlow Probability [2] provides a probabilistic programming language that makes it possible to define Bayesian models that interacts with deep learning models.
[1] https://ai.google/research/pubs/pub46646. 12. Hierarchical Device Placement from Probabilistic Perspective Contact:
Eiko Yoneki
This project tackles the hierarchical device placement from probabilistic perspective to seek further improvement to the original work by Google [1]. In [1], two networks, one a grouper and one a placer are used. The grouper learns valid grouping of TensorFlow operations and the placer learns optimal placement on multiple devices. The idea in this project is to replace the grouper network with a Variational Autoencoder (VAE) network that in the latent space and using Bayesian Optimisation (BO) to generate valid grouping representation, similar approach to [2]. The output will be fed into another BO that learns the optimal placement, which is a similar approach described in our BOAT project 3] learning the placement for distributed SGD. Furthermore you can add the second placer learning multiple objectives rather than just the runtime improvement: by making it learn more correlated tasks it should be expected to be converged faster.
[1] A. Mirhoseini et al.: A Hierarchical Model for Device Placement, LCLR, 2018. https://openreview.net/pdf?id=Hkc-TeZ0W. 13. Generalizable compiler (query planner) optimization using Equality Saturation with Reinforcement Learning ** Contact:
Eiko Yoneki
Omelette [1] and Aurora [2] use reinforcement learning to guide the application of rules in Equality Saturation. A pressing challenge is generalisation.
While the proposed methodologies can generalise across unseen expressions (e.g. query rewrite problems) in theory, expecting an RL agent to solve NP-hard combinatorial problems across sparse problem instances (such as the space of possible SQL query e-graphs) is not realistic.
To improve generalisation, we propose an investigation into population-based Reinforcement
Learning (PBT, example: Poppy [3]) which comes with theoretical guarantees for solving combinatorial optimisation problems more efficiently.
[1] Zak Sigh: Omelette: Deep Reinforcement Learning for Equality Saturation. 14. Learned Extraction for Equality Saturation * Contact:
Eiko Yoneki
One of the core practical limitations of applying Equality Saturation for rewriting query plans is the latency overhead. In the database domain,
Aurora [1] shows that most of the planning latency is linked to extracting the optimal expression from the e-graph. This is achieved by formulating the query plan extraction as an integer
linear program (ILP). To minimise planning latency, a promising idea is to learn the e-graph extraction with reinforcement learning. However, a naive application of RL for extracting queries
(or other types of expressions) from the set of all possible e-graphs is set for failure due to the enormous search space.
To solve this, we propose an investigation into guiding ILP with RL (e.g. learning to branch [2]), to minimise the search space.
[1] George-Octavian Barbulescu: An Inquiry into Database Query Optimisation
with Equality Saturation and Reinforcement Learning. 15. Cost modelling for tensor programs ** Contact:
Eiko Yoneki
Cost models provide cheap estimates to get the performance of tensor programs without actual execution, so it is at the heart of accelerating tenor programs generation. For example, TVM [1] uses XGboost [2] to estimate tensor program runtime. More recently, advanced DNN-based cost models are proposed to perform more accurate estimation such as GNN [3] and transformers [4]. Cost modelling can be even cross-hardware [5]. On the other hand, distributed DNN training also needs delicate cost modelling to provide cheap estimation of the throughput of parallelization strategies, such as [6] and [7]. However, distributed cost modelling is mostly mathematical estimation.
This project aims to investigate and benchmark the state-of-the-art cost modelling. In particular, we want to understand how good those cost models are, and if possible, we want to replace the distributed cost modelling with a DNN-based cost model.
[1] TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. Contact EmailPlease email to eiko.yoneki@cl.cam.ac.uk for any question. |