Reinforcement Learning
Principal lecturer: Dr Rika Antonova
Taken by: MPhil ACS, Part III
Code: R171
Term: Lent
Hours: 16 (8 x two-hour seminars)
Format: In-person lectures
Prerequisites: Multivariable calculus, linear algebra, probability, machine learning (intro)
Moodle, timetable
Aims
The aim of this module is to introduce students to the foundations of the field of reinforcement learning (RL), discuss state-of-the-art RL methods, incentivise students to produce critical analysis of recent methods, and prompt students to propose novel solutions that address shortcomings of existing methods. If judged by the headlines, RL has seen an unprecedented success in recent years. However, the vast majority of RL methods still have shortcomings that might not be apparent at first glance. The aim of the course is to inspire students by communicating the promising aspects of RL, but ensure that students develop the ability to produce a critical analysis of current RL limitations.
Objectives
Students will learn the following technical concepts:
- fundamental RL terminology and mathematical formalism; a brief history of RL and its connection to neuroscience and biological systems
- RL methods for discrete action spaces, e.g. deep Q-learning and large-scale Monte Carlo Tree Search
- methods for exploration, modelling uncertainty, and partial observability for RL
- modern policy gradient and actor-critic methods
- concepts needed to construct model-based RL and Model Predictive Control methods
- approaches to make RL data-efficient and ways to enable simulation-to-reality transfer
- examples of fine-tuning foundation models and large language models (LLMs) with human feedback; safe RL concepts; examples of using RL for safety validation
- examples of using RL for scientific discovery
Students will also gain experience with analysing RL methods to uncover their strengths and shortcomings, as well as proposing extensions to improve performance.
Finally, students will gain skills needed to create and deliver a successful presentation in a format similar to that of conference presentations.
With all of the above, students who take part in this module
will be well-prepared to start conducting research in the field
of reinforcement learning.
Syllabus
Topic 1: Introduction and Fundamentals
- Overview of RL: foundational ideas, history, and books; connection to neuroscience and biological systems, recent industrial applications and research demonstrations
- Mathematical fundamentals: Markov decision processes, Bellman equations, policy and value iteration, temporal difference learning
Topic 2: RL in Discrete Action Spaces
- Q-learning, function approximation and deep Q-learning; nonstationarity in RL and its implications for deep learning; example applications (video games; initial example: Atari)
- Monte Carlo Tree Search; example applications (AlphaGo)
Topic 3: Exploration, Uncertainty, Partial Observability
- Multi-armed bandits, Bayesian optimisation, regret analysis
- Partially observable Markov decision process; belief, memory, and sequence modelling (probabilistic methods, recurrent networks, transformers)
Topic 4: Policy Gradient and Actor-critic Methods for Continuous Action Spaces
- Importance sampling, policy gradient theorem, actor-critic methods (SPG, DDPG)
- Proximal policy optimisation; example applications
Topic 5: Model-based RL and Model Predictive Control
- Learning dynamics models (graph networks, stochastic processes, diffusion models, physics-based models, ensembles); planning with learned models
- Model predictive control; example applications (real-time control)
Topic 6: Data-efficient RL and Simulation-to-reality Transfer
- Data-efficient learning with probabilistic methods from real data (e.g. policy search in robotics), real-to-sim inference and differentiable simulation, data-efficient simulation-to-reality transfer
- RL for physical systems (successful examples in locomotion, open problems in contact-rich manipulation, applications to logistics, energy, and transport systems); examples of RL for healthcare.
Topic 7: RL with Human Feedback ; Safe RL and RL for Validation
- Fine-tuning large language models (LLMs) and other foundation models with human feedback (TRLX,RL4LMs, a light-weight overview of RLHF)
- A review of SafeRL, example: optimising commercial HVAC systems using policy improvement with constraints; improving safety using RL for validation: examples in autonomous driving and autonomous flying and aircraft collision avoidance
Topic 8: RL for Scientific Discovery; Student Presentations
- Examples of RL for molecular design and drug discovery, active learning for synthesising new materials, RL for nuclear fusion experiments
- Student presentations (based on essays and mini-projects) for other topics in RL, e.g. multi-agent RL, hierarchical RL, RL for hyperparameter optimisation and NN architecture search, RL for multi-task transfer, lifelong RL, RL in biological systems, etc.
Assessment
The assessment for this module consists of:
- Essay and mini-project (60%)
Students will choose a concrete RL method from the literature, then complete a 2-part essay described below:- Essay Part 1 (1500 words, 20%): Introduce a formal description of the method and explain how the method extended the state-of-the-art at the time of its publication
- Essay Part 2 or a mini-project (2500 words, 40%): Provide a critical analysis of the chosen RL method.
- Presentation of the essay and mini-project results
(15%)
students will be expected to create and record a 15-minute video presentation of the analysis described in their essay and mini-project. - Short test of RL theory fundamentals (15%)
to test the understanding of RL fundamentals (~15 minutes, in-class, closed book) - Participation in seminar discussions (10%)
take an active part in the seminars by asking clarifying questions and mentioning related works.
Recommended Reading
Books
[S&B] Reinforcement Learning: An Introduction (second print edition). Richard S. Sutton, Andrew G. Barto. [Available from the book’s website as a free PDF updated in 2022]
[CZ] Algorithms for Reinforcement Learning. Csaba Szepesvari. [Available from the book’s website as a free PDF updated in 2019]
[MK] Algorithms for Decision Making. Mykel J. Kochenderfer Tim A. WheelerKyle H. Wray. [Available from the book’s website as a free PDF updated in 2023]
[DB] Reinforcement Learning and Optimal Control. Dimitri Bertsekas. [Available from the book’s website as a free PDF updated in 2023]
Presentation guidelines
[KN] Ten simple rules for effective presentation slides. Kristen M.Naegle. PLoS computational biology 17, no. 12 (2021).