skip to primary navigationskip to content

Department of Computer Science and Technology

Masters

 

Course pages 2025–26 (working draft)

Reinforcement Learning

Principal lecturer: Dr Rika Antonova
Taken by: MPhil ACS, Part III
Code: L171
Term: Michaelmas
Hours: 16 (8 x two-hour lectures)
Format: In-person lectures
Prerequisites: Multivariable calculus, linear algebra, probability, machine learning
timetable

Aims

The aim of this module is to present state-of-the-art reinforcement learning (RL) methods, incentivise students to understand RL theory and develop skills for coding deep RL methods.


RL has seen unprecedented success in recent years. However, the majority of RL methods still require intricate skills and insights for successful applications. The goal of this module is to communicate the promising aspects of RL, but also ensure that students understand the limitations of the current RL methods.
 

The assessment will consist of a theory test (in-class, closed-book), a test to assess the students’ understanding of how to code RL methods (in-class, closed-book), a mini-project, and a participation component (all described in the ‘Modes of Assessment’ section).
 

Objectives

Students will learn the following technical concepts:

  •  fundamental RL terminology and mathematical formalism; a brief history of RL and its connection to neuroscience and biological systems
  • RL methods for discrete action spaces, e.g. deep Q-learning and large-scale Monte Carlo Tree Search
  • methods for exploration, modelling uncertainty, and partial observability for RL
  • modern policy gradient and actor-critic methods
  • concepts needed to construct model-based RL and Model Predictive Control methods
  • approaches to make RL data-efficient and ways to enable simulation-to-reality transfer
  • examples of fine-tuning foundation models and large language models (LLMs) with human feedback; safe RL concepts; examples of using RL for safety validation
  • examples of using RL for scientific discovery

Students will also gain practical experience with coding and analysing RL methods to uncover their strengths and shortcomings, as well as proposing novel extensions to improve the performance of existing RL methods in a short mini-project.
 

Syllabus

Topic 1: Introduction and Fundamentals

  • Overview of RL: foundational ideas, history, and books; connection to neuroscience and biological systems, recent industrial applications and research demonstrations
  • Mathematical fundamentals: Markov decision processes, Bellman equations, policy and value iteration, temporal difference learning
  • Short intro to RL libraries and environments

Topic 2: RL in Discrete Action Spaces

  • Q-learning, function approximation and deep Q-learning; nonstationarity in RL and its implications for deep learning; example applications (video games; initial example: Atari)
  • Monte Carlo Tree Search; example applications (AlphaGo)

Topic 3: Policy Gradient and Actor-critic Methods for Continuous Action Spaces

  • Policy gradient theorem, actor-critic methods (SPG, DDPG)
  • Proximal policy optimisation; example applications

Topic 4: Exploration, Uncertainty, Data-efficient RL and Simulation-to-reality Transfer

  • Multi-armed bandits, Bayesian optimisation, regret analysis
  • Data-efficient learning from real data (e.g. policy search in robotics), real-to-sim inference and differentiable simulation, data-efficient simulation-to-reality transfer
  • RL for physical systems (successful examples in locomotion, open problems in contact-rich robot manipulation)

Topic 5: Partial Observability, Memory, and Sequence Modelling

  • Introduction to sequence modelling with transformers and RL with transformer-based methods.
  • Partially observable Markov decision process; probabilistic methods for belief and memory modelling.

Topic 6: Model-based RL and Model Predictive Control; Residual RL

  • Learning dynamics models with learned models
  • Model predictive control; residual RL (model-based or model-free)

Topic 7: RL with Human Feedback ; Safe RL and RL for Validation

  • Fine-tuning large language models (LLMs) and other foundation models with human feedback (TRLX,RL4LMs, a light-weight overview of RLHF)
  • A review of SafeRL, example: optimising commercial HVAC systems using policy improvement with constraints; improving safety using RL for validation: examples in autonomous driving and autonomous flying and aircraft collision avoidance
  • Examples of RL for molecular design and drug discovery, active learning for synthesising new materials, RL for theorem proving, and RL for nuclear fusion experiments
     

Topic 8: Student Presentations (+ an invited talk)

  • Students will give short mini-project presentations: a figure for an algorithm overview, and at least two plots with main results as they will appear in the mini-project report. (The report is due several days after the last class meeting, but the figures should be the same as in this in-class presentation).
  • One of the invited talks can be scheduled for this class meeting as well.

Assessment

The assessment for this module consists of:

The module will have a test of RL theory (20%), a test of coding skills for RL methods (20%), a mini-project (50%), and a participation (10%) assessment. Students will also be required to complete two brief practical exercises in preparation for the coding test. These will not be marked, instead, they will serve as checkpoints to ensure students are preparing for the test. Similarly, we will have a brief (unmarked) in-class theory quiz to ensure that students are preparing for the theory test well. We will discuss the unmarked coding exercises and unmarked quiz answers during class, with opportunities for students to earn participation points.

  • Theory test (20%): To test the understanding of RL fundamentals, students will be required to take a theory test (30 minutes, in-class, closed-book).
  • Coding test (20%): The students will be required to show understanding of the implementation of the foundational and state-of-the-art RL methods (30 minutes, in-class, closed-book).
  • Mini-project (50%): This mini-project will focus on improving RL under resource constraints. It will build on the take-home exercises, but for the project, the students will be challenged to push beyond the state of the art: maintain high reward performance while minimising memory and compute resources. Students can apply innovations they learn from lectures and assigned readings, seek further insights from other sources, and even show research potential by creating novel RL algorithms.
    Students will work in pairs, one partner focusing on minimising memory (space), the other on minimising compute (time). Hence, the project will facilitate clear separation in individual contributions while still encouraging teamwork. The results will be ranked based on the memory and compute resources used by the proposed methods for the given reward/success rate targets.
    Students will need to submit a short report that describes the proposed method and shows key plots that present the results. Each pair should submit one report (maximum 2000 words total; maximum 1000 words per student), at least one algorithm summary figure, and at least two figures with main results (one to show memory, the other to show space usage improvements compared to the baseline). Each paragraph and plot should be marked with the name of the student who contributed, so that the contributions of each student can be clearly separated for marking.
  • Participation (10%): Students will be expected to attend class sessions in person, contribute at least one substantial explanation/comment to the in-class discussions about unmarked take-home coding exercises, and briefly present the results of their mini-project. This component will help to ensure that we have lively and active class meetings, where students can practice articulating their knowledge.
     

Recommended Reading

Books

[S&B] Reinforcement Learning: An Introduction (second print edition). Richard S. Sutton, Andrew G. Barto. [Available from the book’s website as a free PDF updated in 2022]

[CZ] Algorithms for Reinforcement Learning. Csaba Szepesvari. [Available from the book’s website as a free PDF updated in 2019]

[MK] Algorithms for Decision Making. Mykel J. Kochenderfer Tim A. WheelerKyle H. Wray. [Available from the book’s website as a free PDF updated in 2023]

[DB] Reinforcement Learning and Optimal Control. Dimitri Bertsekas. [Available from the book’s website as a free PDF updated in 2023]