Reinforcement Learning

Principal lecturer: Dr Rika Antonova
Taken by: MPhil ACS, Part III
Code: R171
Term: Lent
Hours: 16 (8 x two-hour seminars)
Format: In-person lectures
Prerequisites: Multivariable calculus, linear algebra, probability, machine learning (intro)
Moodle, timetable

Aims

The aim of this module is to introduce students to the foundations of the field of reinforcement learning (RL), discuss state-of-the-art RL methods, incentivise students to produce critical analysis of recent methods, and prompt students to propose novel solutions that address shortcomings of existing methods. If judged by the headlines, RL has seen an unprecedented success in recent years. However, the vast majority of RL methods still have shortcomings that might not be apparent at first glance. The aim of the course is to inspire students by communicating the promising aspects of RL, but ensure that students develop the ability to produce a critical analysis of current RL limitations.

Objectives

Students will learn the following technical concepts:

fundamental RL terminology and mathematical formalism; a brief history of RL and its connection to neuroscience and biological systems
RL methods for discrete action spaces, e.g. deep Q-learning and large-scale Monte Carlo Tree Search
methods for exploration, modelling uncertainty, and partial observability for RL
modern policy gradient and actor-critic methods
concepts needed to construct model-based RL and Model Predictive Control methods
approaches to make RL data-efficient and ways to enable simulation-to-reality transfer
examples of fine-tuning foundation models and large language models (LLMs) with human feedback; safe RL concepts; examples of using RL for safety validation
examples of using RL for scientific discovery

Students will also gain experience with analysing RL methods to uncover their strengths and shortcomings, as well as proposing extensions to improve performance.

Finally, students will gain skills needed to create and deliver a successful presentation in a format similar to that of conference presentations.

With all of the above, students who take part in this module will be well-prepared to start conducting research in the field of reinforcement learning.

Syllabus

Topic 1: Introduction and Fundamentals

Overview of RL: foundational ideas, history, and books; connection to neuroscience and biological systems, recent industrial applications and research demonstrations
Mathematical fundamentals: Markov decision processes, Bellman equations, policy and value iteration, temporal difference learning

Topic 2: RL in Discrete Action Spaces

Q-learning, function approximation and deep Q-learning; nonstationarity in RL and its implications for deep learning; example applications (video games; initial example: Atari)
Monte Carlo Tree Search; example applications (AlphaGo)

Topic 3: Exploration, Uncertainty, Partial Observability

Multi-armed bandits, Bayesian optimisation, regret analysis
Partially observable Markov decision process; belief, memory, and sequence modelling (probabilistic methods, recurrent networks, transformers)

Topic 4: Policy Gradient and Actor-critic Methods for Continuous Action Spaces

Importance sampling, policy gradient theorem, actor-critic methods (SPG, DDPG)
Proximal policy optimisation; example applications

Topic 5: Model-based RL and Model Predictive Control

Learning dynamics models (graph networks, stochastic processes, diffusion models, physics-based models, ensembles); planning with learned models
Model predictive control; example applications (real-time control)

Topic 6: Data-efficient RL and Simulation-to-reality Transfer

Data-efficient learning with probabilistic methods from real data (e.g. policy search in robotics), real-to-sim inference and differentiable simulation, data-efficient simulation-to-reality transfer
RL for physical systems (successful examples in locomotion, open problems in contact-rich manipulation, applications to logistics, energy, and transport systems); examples of RL for healthcare.

Topic 7: RL with Human Feedback ; Safe RL and RL for Validation

Fine-tuning large language models (LLMs) and other foundation models with human feedback (TRLX,RL4LMs, a light-weight overview of RLHF)
A review of SafeRL, example: optimising commercial HVAC systems using policy improvement with constraints; improving safety using RL for validation: examples in autonomous driving and autonomous flying and aircraft collision avoidance

Topic 8: RL for Scientific Discovery; Student Presentations

Examples of RL for molecular design and drug discovery, active learning for synthesising new materials, RL for nuclear fusion experiments
Student presentations (based on essays and mini-projects) for other topics in RL, e.g. multi-agent RL, hierarchical RL, RL for hyperparameter optimisation and NN architecture search, RL for multi-task transfer, lifelong RL, RL in biological systems, etc.

Assessment

The assessment for this module consists of:

Essay and mini-project (60%)
Students will choose a concrete RL method from the literature, then complete a 2-part essay described below:
- Essay Part 1 (1500 words, 20%): Introduce a formal description of the method and explain how the method extended the state-of-the-art at the time of its publication
- Essay Part 2 or a mini-project (2500 words, 40%): Provide a critical analysis of the chosen RL method.
Presentation of the essay and mini-project results (15%)
students will be expected to create and record a 15-minute video presentation of the analysis described in their essay and mini-project.
Short test of RL theory fundamentals (15%)
to test the understanding of RL fundamentals (~15 minutes, in-class, closed book)
Participation in seminar discussions (10%)
take an active part in the seminars by asking clarifying questions and mentioning related works.

Reinforcement Learning

Aims

Objectives

Syllabus

Assessment

Recommended Reading

Study at Cambridge

About the University

Research at Cambridge