Computer Laboratory

MPhil, Part III, and Part II Project Suggestions (2024-2025)

Contact

 

 

 

 

 

 

 

  

 

Please contact Eiko Yoneki (email: eiko.yoneki@cl.cam.ac.uk) if you are interested in any project below. * indicates high recommendation!

1.     Better sharding strategy search with deep reinforcement learning

 

Contact: Eiko Yoneki

 

Deep learning recommender model (DLRMs) is one of the most important applications of deep learning. The challenge of DLRMs is to shard the embedding table across multiple devices. This involves column-wise and row-wise sharding. Neuroshard [1] proposes to use a DNN as cost model to guide the search of a sharding strategy, and then it uses a combination of beam search and greedy search to find the sharding strategy. We would like to see whether deep reinforcement learning (RL) can search for a better sharding strategy. Your solution should be compared and benchmarked with [1][2].

 

Desirable skills: Python, deep reinforcement learning

 

References:

[1] Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models

[2] AutoShard: Automated Embedding Table Sharding for Recommender Systems

 

2.     Enhancing Generalization through Task Vector Fusion for Query Optimizer *

 

Contact: Eiko Yoneki

 

Recent advancements in the integration of Reinforcement Learning (RL) with database systems, notably in areas like query plan optimization [3], underscoring the burgeoning interest in exploiting RL's inherent strengths in scheduling and solving complex combinatorial optimization problems. The project aims to enhance our previous RL-based query optimizer Aurora [2], which utilizes equality saturation and query graph transformation to achieve efficient query execution. Aurora, like other RL models, struggles to adapt to evolving database workloads, requiring retraining when conditions change. To address this challenge, we propose a novel approach that dynamically adapts to new workloads with minimal retraining. By leveraging task vectors [1] parameter sets specific to individual workloads from previously trained models, we can perform arithmetic operations to predict optimal parameter adjustments for new tasks. This method allows existing optimizers to maintain high performance amid changing demands, improving both efficiency and usability.

 

Desirable skills: Reinforcement Learning, Equality Saturation, Database, Query Optimization

 

References:

[1] Ilharco G, Ribeiro M T, Wortsman M, et al. Editing models with task arithmetic[J]. arXiv preprint arXiv:2212.04089, 2022.

[2] Bărbulescu, George-Octavian, et al. "Learned Graph Rewriting with Equality Saturation: A New Paradigm in Relational Query Rewrite and Beyond." arXiv preprint arXiv:2407.12794 (2024).

[3] Kraska T, Alizadeh M, Beutel A, et al. Sagedb: A learned database system[J]. 2021.

 

3.     Multi-model Serving on Heterogeneous Clusters*

 

Contact: Eiko Yoneki

 

Serving multiple LLMs (such as GPT, Llama, OPT, Falcon etc.) efficiently in heterogeneous clusters presents unique challenges due to varying computational power, communication bandwidth, memory bandwidth, and memory limits across different types of GPUs [1]. The project aims to extend the capabilities of multi-model serving [2] to heterogeneous clusters effectively. The initial phase involves setting up a benchmarking suite to evaluate different model-serving frameworks like vLLM [3] and DistServe [4] on various cluster configurations. Subsequently, the project will focus on designing a custom LLM serving framework that leverages dynamic resource allocation to optimize for throughput and latency across heterogeneous environments. This involves developing algorithms for intelligent job scheduling and resource management that consider the unique characteristics of each cluster node. The goal is to enhance the efficiency and scalability of serving multiple models in diverse computing environments, which is critical for applications in areas like autonomous driving and real-time data analytics.

 

Desirable skills: Proficiency in Python/CUDA/C++ programming, understanding of Large Language Model (LLM) inference and serving, and experience with cluster management and resource allocation

 

References:

[1] Jiang, Youhe, et al. "HexGen: Generative Inference of Large Language Model over Heterogeneous Environment." Forty-first International Conference on Machine Learning.

[2] Duan, Jiangfei, et al. "MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving." Forty-first International Conference on Machine Learning.

[3] Kwon, Woosuk, et al. "Efficient memory management for large language model serving with paged attention." Proceedings of the 29th Symposium on Operating Systems Principles. 2023.

[4] Zhong, Yinmin, et al. "{DistServe}: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving." 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24). 2024.

 

4.     Automated Workload Compression for Index Selection in Scaling Workloads*

 

Contact: Eiko Yoneki

 

Scaling workloads involve numerous queries arriving within a fixed time interval in a relational database management system [1,2]. Compression technology aims to select a representative subset of queries, reducing redundant information and optimizing query performance [1,2]. This significantly decreases tuning time, particularly in index selection, where a compressed subset can generate beneficial index candidates with minimal performance loss [3,4]. Existing compression techniques rely on strong heuristics [1,2], often involving phases like column and query selection. Since the compression process can be viewed as a continuous search and scheduling problem, methods like dynamic programming or Monte Carlo Tree Search (MCTS) [5] can be applied to automate and maximize compression efficiency. This project seeks to develop a cost-efficient compression approach that minimizes performance loss compared to the original workload, optimizing indexing for downstream execution.

 

Desirable skills: Indexing, Database Optimization, Monte Carlo Tree Search

 

References:

[1] Siddiqui, Tarique, et al. "ISUM: Efficiently compressing large and complex workloads for scalable index tuning." Proceedings of the 2022 International Conference on Management of Data. 2022.

[2] Brucato, Matteo, et al. "Wred: Workload Reduction for Scalable Index Tuning." Proceedings of the ACM on Management of Data 2.1 (2024): 1-26.

[3] Kossmann, Jan, Alexander Kastius, and Rainer Schlosser. "SWIRL: Selection of Workload-aware Indexes using Reinforcement Learning." EDBT. Vol. 2. 2022.

[4] Wang, Taiyi, and Eiko Yoneki. "IA2: Leveraging Instance-Aware Index Advisor with Reinforcement Learning for Diverse Workloads." Proceedings of the 4th Workshop on Machine Learning and Systems. 2024.

[5] Chaslot, Guillaume Maurice Jean-Bernard Chaslot. "Monte-carlo tree search." (2010).

 

5.     Predictive Long-Term Indexing with RL-Enhanced Control*

 

Contact: Eiko Yoneki

 

Traditional learned index tuners like AirIndex [1], LSM-Tree Tuner [2], primarily optimize for current workloads, neglecting the impact of these tuning efforts on future workloads. This approach can lead to inefficiencies when new data arrives, as the system lacks foresight regarding future workload patterns. To address this, the project proposes serializing the indexing process at a higher level. The agent, potentially equipped with RL, would control when and whether the local index tuner should take actions as new workloads arrive or data updates happen. By testing this on streaming data with various distributions, such as SOSD data [4] or skewed datasets [3], or using real data traces from sources like Microsoft or Snowflake, the project aims to develop a more predictive and efficient indexing system that better accommodates incoming workloads over time, which improves long-term performance.

 

Desirable skills: Reinforcement Learning, Indexing, Predictive Control, Learned Index

 

References:

[1] Chockchowwat, Supawit, Wenjie Liu, and Yongjoo Park. "Airindex: versatile index tuning through data and storage." Proceedings of the ACM on Management of Data 1.3 (2023): 1-26.

[2] Mo, Dingheng, et al. "Learning to Optimize LSM-trees: Towards A Reinforcement Learning based Key-Value Store for Dynamic Workloads." Proceedings of the ACM on Management of Data 1.3 (2023): 1-25.

[3] Ding, Jialin, et al. "Tsunami: A learned multi-dimensional index for correlated data and skewed workloads." arXiv preprint arXiv:2006.13282 (2020).

[4] Kipf, Andreas, et al. "SOSD: A benchmark for learned indexes." arXiv preprint arXiv:1911.13014 (2019).

 

6.     Structured Bayesian Optimisation in BoTorch **

 

Contact: Eiko Yoneki

 

Optimising system performance is challenging due to the high computational cost of performance evaluation, which is time-consuming and involves navigating a vast search space of variables. Traditional techniques like Bayesian Optimisation (BO) [2] often take a long time to converge and struggle with high-dimensional parameter spaces. However, incorporating structural information into surrogate models can significantly accelerate BO's convergence.

 

This project explores DagBO [1][3], an open-source extension of BO developed by our group, which allows for user-definable surrogate models based on directed acyclic graphs (DAGs). The project will investigate DagBO's performance in tuning Spark benchmarks compared to traditional BO. Potential extensions of the project include distributed computing and the development of sub-models within DagBO.

 

Desirable skills: Computer Systems, Bayesian Optimisation, Spark

 

References:

[1] Ross Tooley: Auto-tuning Spark with Bayesian optimisation.

[2] Jonas B. Mockus. The bayesian approach to global optimization. Freie Univ., Fachbereich

Mathematik, 1984.

[3] https://github.com/Tyv217/dagbo/

 

7.   Bayesian Optimisation for tensor code generation **   

Contact: Eiko Yoneki

Tensor codes are run on massively parallel hardware. When generating tensor code (also called auto-scheduling), TVM Compiler [1] needs to search through many parameters. A State-of-the-art auto-scheduler is Ansor [2], which applies rule-based mutation to generate tensor code templates and then fine-tunes those templates via Evolutionary Search. We think Bayesian Optimisation (BayesOpt) [3] is a better approach to efficiently search the tensor code templates than Evolutionary Search.

At first, TVM is set up for benchmarking with ResNet and Bert using CPU and GPU (possible a few different types of GPU). Next same benchmarking with NVIDIA's Compiler Cutlass should be experimented.

Afterwards, exploring using BayesOpt for high-performance tensor code generation, and benchmark black-box algorithms for tensor code generation. . The main interface for tensor code generation in TVM will be through MetaScheduler [6], which provides a fairly simple Python interface for various search methodologies [7]. We also have a particular interest in tensor code generation for tensor cores, which are equipped by recent generation of GPUs (since the Turing micro-architectures) as a domain-specific architectures to massively accelerate tensor programs.

The project can an advantage of former student's work [9][10], and set the focus on the performance improvement on GPU, Multi-objective BO, and scalability

Desired Skills: Strong interest in tensor program optimisation, Some knowledge/interest in Bayesian optimisation, Python, with some knowledge in C++

References:
[1] TVM: An Automated End-to-End Optimizing Compiler for Deep Learning https://www.usenix.org/system/files/osdi18-chen.pdf.
[2] Ansor: Generating High-Performance Tensor Programs for Deep Learning https://www.usenix.org/system/files/osdi20-zheng.pdf.
[3] A Tutorial on Bayesian Optimization https://arxiv.org/abs/1807.02811.
[4] HEBO Pushing The Limits of Sample-Efficient Hyperparameter Optimisation https://arxiv.org/abs/2012.03826.
[5] Are Random Decompositions all we need in High Dimensional Bayesian Optimisation? https://arxiv.org/abs/2301.12844.
[6] Tensor Program Optimization with Probabilistic Programs https://arxiv.org/abs/2205.13603.
[7] https://github.com/apache/tvm/blob/4267fbf6a173cd742acb293fab4f77693dc4b887/python/tvm/meta_schedule/search_strategy/search_strategy.py#L238.
[8] NVIDIA Compiler https://github.com/NVIDIA/cutlass.

[9] https://github.com/hgl71964/unity-tvm

[10] Discovering Performant Tensor Programs with Bayesian Optimization

 

8.   Probabilistic Inference within Reinforcement Learning *

Contact: Eiko Yoneki

Incorporating with Probabilistic Inference within RL to solve decision making problems by treating them as probabilistic inference tasks can be interesting research to tackle. Unlike for classical RL, which requires explicit reward functions, probabilistic assumptions of the model can be implied. This would enable a fundamental new way to design the reward function by model selection and to apply existing probabilistic modelling techniques to RL problems.

The project includes:

-        Design of Probabilistic Models for RL Tasks: Investigate how different probabilistic models (e.g., hidden Markov models, Bayesian neural networks) can be adapted or developed for RL tasks.

-        Reward Function Implication and Learning: Develop methods for deriving reward functions from probabilistic models, possibly through techniques like expectation-maximisation or variational inference.

-        Evaluation and Application: Test the proposed approach on benchmark RL problems, comparing its performance to classical RL approaches.

Desirable skills: Reinforcement Learning, Probabilistic Modelling, Reward Function

 

References:

[1] Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd ed.). MIT Press.

[2] Ghavamzadeh, M., Mannor, S., Pineau, J., & Tamar, A. (2015). Bayesian Reinforcement Learning: A Survey. Foundations and Trends in Machine Learning, 8(5-6), 359-483.



9.   Tensor expression superoptimisation via deep reinforcement learning  

Contact: Eiko Yoneki

AlphaDev [1][2] shows RL agent can discover faster sorting algorithm via playing the assembly game. In recent advances of machine learning compiler, EinNet [3] [4] shows how to discover faster tensor programs via rule-based transformation. We are interested in investigating how RL works in discovering faster tensor programs at tensor expression level transformation. You should implement a graph RL agent to play in the tensor game to discover faster tensor programs. Alternatively, we have an internal RL-driven graph transformation system, which you can leverage and apply to the tensor expression level transformation. To get started, build and benchmark EinNet to see how it works with rule-based transformation on a simple DNN (e.g. self-attention). Then, replace their search algorithm with RL algorithm.

References:
[1] Faster sorting algorithms discovered using deep reinforcement learning, Nature, 2023.
[2] https://github.com/google-deepmind/alphadev.
[3] EINNET: Optimizing Tensor Programs with Derivation-Based Transformations. OSDI, 2023.
[4] https://github.com/InfiniTensor/InfiniTensor.
[5] https://github.com/ucamrl/xrlflow/tree/main.

10.   Inferring reward function via Inverse Reinforcement Learning

Contact: Eiko Yoneki

Reinforcement Learning (RL) has achieved impressive results recently, from playing video games to solving robotic control problems. However, it is challenging in many applications to design a reward function that robustly describes the desired task. Inverse Reinforcement Learning (IRL) [1], which evolved from imitation learning, can be formulated as a problem of inferring the reward function of an agent, given its policy or observed behaviour. Recent works focused more on the robustness of IRL in unknown environments [2,3]. IRL is suitable for environments where it's hard to define the reward function, or it's expensive to obtain the real-time reward [4]. System-level RL tuning problems [6] are in such cases when people typically define reward functions based on the actual execution time. One of the most popular and open-source IRL methods is Generative Adversarial Imitation Learning (GAIL) [5]. GAIL combines Generative Adversarial Network (GAN) and IRL and alternatively updates the agent's reward and policy until reaching optimum. One challenge here is that IRL needs expert data to update the agent's reward functions and policy. However, unlike video games, building a virtual environment for system-level optimisation problems to collect such data is sometimes impossible. Some underlying optimisers may help here. Thus, one good exploration could be using GAIL to infer reward functions within such expensive environments. In this project you would explore the reward function using GAIL in our existing project on the optimisation of LLVM Pass List using Reinforcement Learning [7].

Desired Skills: Inverse Reinforcement Learning, LLVM Optimisation

References:

[1] Ng, Andrew Y., and Stuart Russell: Algorithms for inverse reinforcement learning. ICML 2000.
[2] Lindner et al.: Active Exploration for Inverse Reinforcement Learning. ICML workshop 2022.
[3] Fu et al.: Learning robust rewards with adversarial inverse reinforcement learning. ICLR 2018.
[4] Arora, Saurabh and Prashant Doshi: A survey of inverse reinforcement learning: Challenges, methods and progress.
[5] Ho J., Ermon S.: Generative adversarial imitation learning.
[6] Zhang, Ji, et al.: An end-to-end automatic cloud database tuning system using deep reinforcement learning.
[7] Yilin Sun: Optimizing LLVM Pass List using Reinforcement Learning.

11.   Advancing Computer Systems Optimisation through Model-Based Reinforcement Learning 

Contact: Eiko Yoneki

Reinforcement Learning (RL) is gaining interest as a generic optimisation and control method in data management tasks such as resource management, scheduling, database tuning, or stream processing. However, its application is hindered by sample inefficiency and extensive decision evaluation times. Model-Based Reinforcement Learning (MBRL), employing techniques like Probabilistic Ensemble-based models [1], World Models [2] and Dyna-style planning [4] etc. addresses these issues by learning environment models. This project tackles the LLVM compiler optimisation on selection of the pass list. This is critical as pass list selection affects code performance and size, and MBRL's capability to model these interactions is invaluable. Additionally, MBRL challenges on enhancing generalisation ability across different programs, essential due to the expensive training of RL agents. You would use MBRL for intelligent pass list selection in LLVM and work on enhancing generalisation across programs. They will investigate various MBRL techniques and evaluate their effectiveness in modelling the compiler environment and improving transfer learning. You would also evaluate a world-model based approach against a model free approach using software infrastructure. The previous project work [3] can be used as a starting point. Evaluating the project with SPEC 2017 (https://www.spec.org/cpu2017/). benchmarking will be ideal.

Desired Skills: Reinforcement Learning, LLVM Optimisation

References:
[1] Kurtland Chua, Roberto Calandra, Rowan McAllister, and Sergey Levine. 2018. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. NIPS'18. https://dl.acm.org/doi/10.5555/3327345.3327385.
[2] World Models. https://arxiv.org/abs/1803.10122.
[3] Yilin Sun: Optimizing LLVM Pass List using Reinforcement Learning.
[4] Sutton, Richard S., et al. Dyna-style planning with linear function approximation and prioritized sweeping. https://arxiv.org/pdf/1206.3285.pdf.

12.   Enabling fast and efficient hyper-parameter tuning with search space partitions for large models

Contact: Eiko Yoneki

Hyperparameter tuning of large-scale deep learning models, such as ResNet-18, VGG-19, ResNet-50, and ResNet-152, is a critical but computationally intensive task due to the large search space. This project proposes a novel hyperparameter tuning algorithm that integrates Monte Carlo Tree Search (MCTS) and Bayesian Optimisation (BO). MCTS, known for its efficiency in large and high-dimensional spaces, will be employed to partition the hyperparameter space into smaller subspaces, which inherently enables parallelism. MCTS's probabilistic and heuristic-driven approach makes it lightweight and well-suited for black-box optimisation [1, 3]. On the other hand, BO is particularly efficient in smaller parameter spaces, making it an ideal choice for searching within the partitions created by MCTS. By combining MCTS's efficiency in high-dimensional spaces with BO's efficiency in lower-dimensional spaces, this approach aims to significantly enhance the speed and performance of hyperparameter tuning. Some suggested benchmarks could be Cifar10 and Cifar100 datasets [7] and compared against existing methods in terms of computational efficiency, accuracy improvements, and resource allocation strategies.

Desired Skills: Reinforcement Learning, Gaussian Processes, Bayesian Optimisation, Parallel Computing

References:
[1] Wang L, et al. Learning search space partition for black-box optimization using monte carlo tree search. https://proceedings.neurips.cc/paper/2020/file/e2ce14e81dba66dbff9cbc35ecfdb704-Paper.pdf.
[2] Paul et al.: Fast efficient hyperparameter tuning for policy gradients. NIPS 2019. https://arxiv.org/abs/1902.06583.
[3] Facebook Research. LaMCTS: Large-scale Parallel Architecture Search Using Monte Carlo Tree Search. https://github.com/facebookresearch/LaMCTS.
[4] X. Dong and Y. Yang. Nas-bench-201: Extending the scope of reproducible neural architecture search. 2020. https://openreview.net/pdf?id=HJxyZkBKDr.
[5] Zhang, Baohe, et al.: On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning http://proceedings.mlr.press/v130/zhang21n/zhang21n.pdf.

13.   Harnessing Structured Action Spaces for Efficient RL-based System Optimisation  

Contact: Eiko Yoneki

Desired Skills: Reinforcement Learning, Database Systems, Machine Learning, Parameter Tuning

Reinforcement Learning (RL) is increasingly applied to system optimisation, but handling large and combinatorial action spaces remains a challenge. This project aims to address this by developing methods to compress and structure action spaces, thereby enhancing RL efficiency and performance in system optimisation tasks. The central idea is to leverage hierarchical and structured action spaces, enabling RL agents to make decisions at varying granularities and foster rapid convergence. Additionally, the integration of action pruning and attention mechanisms can further focus the learning process on relevant actions. To validate the developed methods, the project will include evaluations across diverse system optimisation tasks, such as database tuning (high-dimensional action space) [1] and index advising (large discrete action space) [2], as examples. This is not limited to these examples, as the methodologies are intended to be generalisable across various system optimisation domains. The anticipated outcomes include innovative algorithms and techniques that significantly improve the efficiency and effectiveness of RL in system optimisation by intelligently compressing and structuring the action space. This project paves the way for more robust and scalable RL applications in computer systems. Many previous works and methods can be widely used here [3,4,5]

References:
[1] Zhang, Ji, et al.: An end-to-end automatic cloud database tuning system using deep reinforcement learning. https://dl.acm.org/doi/pdf/10.1145/3299869.3300085.
[2] Kossmann, Jan, Alexander Kastius, and Rainer Schlosser: SWIRL: Selection of Workload-aware Indexes using Reinforcement Learning. EDBT. 2022. https://openproceedings.org/2022/conf/edbt/paper-37.pdf.
[3] Dulac-Arnold, Gabriel, et al.: Deep reinforcement learning in large discrete action spaces. https://arxiv.org/abs/1512.07679.
[4] Tavakoli, Arash, Fabio Pardo, and Petar Kormushev: Action branching architectures for deep reinforcement learning. AAAI 2018. https://ojs.aaai.org/index.php/AAAI/article/view/11798.
[5] Chandak, Yash, et al. "Learning action representations for reinforcement learning. PMLR, 2019. https://proceedings.mlr.press/v97/chandak19a.html.

14.   Learning from Demonstration to Guide Query Rewrite  

Contact: Eiko Yoneki

Desired Skills: Machine Learning, Database Systems, Query Optimisation of SQL Database

This This project aims to utilise Learning from Demonstration (LfD) to augment the query rewriting process within database query optimisers, with a specific emphasis on using Monte Carlo Tree Search (MCTS) as the backbone algorithm for intelligent rewriting strategies [1]. LfD is employed as a mechanism to integrate knowledge from existing query optimisers [3]. Traditionally, optimisers rely on heuristic or rule-based strategies, which may not be optimal for complex queries and evolving datasets. Through LfD, this project will extract and learn from the successful query rewriting patterns and strategies in existing optimisers, taking into account both expert-crafted rewrites and rules discovered by WeTune [2], an automated system for the discovery and verification of query rewrite rules. MCTS will be the core algorithm powering the model's decision-making process. MCTS is renowned for its effectiveness in handling large and high-dimensional search spaces, which makes it ideal for navigating through the space of possible query rewrites. Through the integration of LfD and MCTS, the project aims to develop an adaptive query rewriting system that is not only powerful but also highly flexible. By learning from evolving data and query patterns, this system aims to substantially improve query performance by applying intelligent rewriting strategies informed by LfD and guided by MCTS. A very good start point could an early-stage open-source project [4].

References:
[1] Xuanhe Zhou, Guoliang Li, Chengliang Chai, and Jianhua Feng. 2021. A learned query rewrite system using Monte Carlo tree search. VLDB, 2021.fjointf https://doi.org/10.14778/3485450.3485456.
[2] Zhaoguo Wang, Zhou Zhou, Yicun Yang, Haoran Ding, Gansen Hu, Ding Ding, Chuzhe Tang, Haibo Chen, and Jinyang Li. 2022. WeTune: Automatic Discovery and Verification of Query Rewrite Rules. SIGMOD 2022). https://doi.org/10.1145/3514221.3526125.
[3] Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, and Tim Kraska. 2021. Bao: Making Learned Query Optimization Practical. SIGMOD 2021. https://doi.org/10.1145/3448016.3452838.
[4] https://github.com/zhouxh19/LearnedRewrite.

15.   ML-compiler: Joint optimisation of tensor graph and GPU kernels   

Contact: Eiko Yoneki

TASO [2] optimises tensor graph structure, and its backend uses CuDNN to execute tensor Operators [1]. A recent hardware library, Cutlass, allows users to customise their own kernels, as if users can configure their hardware. This brings the opportunities of jointly optimisation of tensor graph and GPU kernels. The project aims to replace TASO's backend from CuDNN [3] to Cutlass [4] . Ideal candidates should have some knowledge writing C++ level codes and preferably know the basics of CUDA programming. The candidates can get started by benchmarking common tensor operators, such as MatMul, Conv2D etc, and compare the performance between different ML compilers. It is a great fit for those who want to know more and deeper about ML compilers.

[1] End-to-end comparison.
[2] Z. Jia, et al.: TASO: Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions, SOSP, 2019.
[3] CuDNN: NVIDIA cuDNN Installation Guide
[4] Cutlass: CUDA C++ template abstractions

16.   Bayesian optimisation tuning with transfer Learning from simulation to databases   

Contact: Eiko Yoneki

Databases come with a large number of tunable configurations [1]. We successfully used Bayesian optimisation in tuning varying numbers of parameters. However, a common issue when tuning these parameters is that a single experiment takes a long time to evaluate. One way to deal with the long evaluation time is to leverage other data sources. This project goal is to study using fast-to-execute simulators to aid the convergence speed of the optimiser. FoundationDB [2] is a recent popular database built as a simulator first and provides first-class support to performance and simulation testing at 1/10th the actual execution time [3]. This project proposes using Multi-fidelity [4] as an optimisation method to integrate the simulator in the Bayesian optimisation loop. This framework enables transfer learning from a low-accuracy-cheap model (simulator) to a high-accuracy-expensive model (FoundationDB).

[1] https://github.com/apple/foundationdb/blob/master/flow/Knobs.cpp.
[2] https://www.foundationdb.org.
[3] https://apple.github.io/foundationdb/testing.html.
[4] Song, J., Chen, Y. and Yue, Y., 2019, April. A general framework for multi-fidelity Bayesian optimization with gaussian processes. PMLR.

17.   Explainable Reinforcement Learning for Device Placement   

Contact: Eiko Yoneki

Task scheduling over a finite number of resources in order to maximise resources efficiency is a traditionally difficult problem with heuristics made to approximate it. Previous work used Hierarchical Reinforcement Learning [1] to find optimal task scheduling for TensorFlow graph operations. This project proposes to investigate Contextual Multi-Armed bandit [2] with Bayesian learning (possible Bayesian neural networks) to allow injecting expert knowledge in the system. Using Bayesian models to define a hierarchy of the system allows it to converge faster and be easier to understand. For this project, Park [3] provides a simulation-based environment to test task scheduling on a TensorFlow graph. TensorFlow Probability [2] provides a probabilistic programming language that makes it possible to define Bayesian models that interacts with deep learning models.

[1] https://ai.google/research/pubs/pub46646.
[2] https://arxiv.org/pdf/1510.00757.
[3] https://github.com/park-project/park.

18.   Hierarchical Device Placement from Probabilistic Perspective  

Contact: Eiko Yoneki

This project tackles the hierarchical device placement from probabilistic perspective to seek further improvement to the original work by Google [1]. In [1], two networks, one a grouper and one a placer are used. The grouper learns valid grouping of TensorFlow operations and the placer learns optimal placement on multiple devices. The idea in this project is to replace the grouper network with a Variational Autoencoder (VAE) network that in the latent space and using Bayesian Optimisation (BO) to generate valid grouping representation, similar approach to [2]. The output will be fed into another BO that learns the optimal placement, which is a similar approach described in our BOAT project 3] learning the placement for distributed SGD. Furthermore you can add the second placer learning multiple objectives rather than just the runtime improvement: by making it learn more correlated tasks it should be expected to be converged faster.

[1] A. Mirhoseini et al.: A Hierarchical Model for Device Placement, LCLR, 2018. https://openreview.net/pdf?id=Hkc-TeZ0W.
[2] Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules.
[3] BOAT: Building Auto-Tuners with Structured Bayesian Optimization https://dl.acm.org/doi/10.1145/3038912.3052662.

19.   Generalisable compiler (query planner) optimisation using Equality Saturation with Reinforcement Learning   

Contact: Eiko Yoneki

Omelette [1] and Aurora [2] use reinforcement learning to guide the application of rules in Equality Saturation. A pressing challenge is generalisation. While the proposed methodologies can generalise across unseen expressions (e.g. query rewrite problems) in theory, expecting an RL agent to solve NP-hard combinatorial problems across sparse problem instances (such as the space of possible SQL query e-graphs) is not realistic. To improve generalisation, we propose an investigation into population-based Reinforcement Learning (PBT, example: Poppy [3]) which comes with theoretical guarantees for solving combinatorial optimisation problems more efficiently.

[1] Zak Sigh: Omelette: Deep Reinforcement Learning for Equality Saturation.
[2] George-Octavian Barbulescu: An Inquiry into Database Query Optimisation with Equality Saturation and Reinforcement Learning.
[3] POPULATION-BASED REINFORCEMENT LEARNING FOR COMBINATORIAL OPTIMIZATION.

20.   Learned Extraction for Equality Saturation   

Contact: Eiko Yoneki

One of the core practical limitations of applying Equality Saturation for rewriting query plans is the latency overhead. In the database domain, Aurora [1] shows that most of the planning latency is linked to extracting the optimal expression from the e-graph. This is achieved by formulating the query plan extraction as an integer linear program (ILP). To minimise planning latency, a promising idea is to learn the e-graph extraction with reinforcement learning. However, a naive application of RL for extracting queries (or other types of expressions) from the set of all possible e-graphs is set for failure due to the enormous search space. To solve this, we propose an investigation into guiding ILP with RL (e.g. learning to branch [2]), to minimise the search space.

[1] George-Octavian Barbulescu: An Inquiry into Database Query Optimisation with Equality Saturation and Reinforcement Learning.
[2] Learning to Branch.

21.   Cost modelling for tensor programs  

Contact: Eiko Yoneki

Cost models provide cheap estimates to get the performance of tensor programs without actual execution, so it is at the heart of accelerating tenor programs generation. For example, TVM [1] uses XGboost [2] to estimate tensor program runtime. More recently, advanced DNN-based cost models are proposed to perform more accurate estimation such as GNN [3] and transformers [4]. Cost modelling can be even cross-hardware [5]. On the other hand, distributed DNN training also needs delicate cost modelling to provide cheap estimation of the throughput of parallelisation strategies, such as [6] and [7]. However, distributed cost modelling is mostly mathematical estimation. This project aims to investigate and benchmark the state-of-the-art cost modelling. In particular, we want to understand how good those cost models are, and if possible, we want to replace the distributed cost modelling with a DNN-based cost model.

[1] TVM: An Automated End-to-End Optimizing Compiler for Deep Learning.
[2] AutoTVM: Learning to Optimize Tensor Programs.
[3] A Graph Neural Network-Based Performance Model for Deep Learning Applications.

Contact Email

Please email to eiko.yoneki@cl.cam.ac.uk for any question.