# Theory of Deep Learning

**Principal lecturers:** Dr Ferenc Huszar, Dr Challenger Mishra

**Taken by:** MPhil ACS, Part III

**Code:** R252

**Term:** Michaelmas

**Hours:** 16 (8x2 hour reading group sessions)

**Class limit:** max. 16 students

**Prerequisites:** A strong background in calculus, probability theory and linear algebra, familiarity with differential equations, optimization and information theory. Students need to have taken an introductory machine learning module such as Machine Learning and Bayesian Inference, Deep Neural Networks, or similar.

**Moodle, timetable**

## Aims

The objectives of this course is to expose you to one of the
most active contemporary research directions within machine
learning: the theory of deep learning (DL). While the first wave
of modern DL has focussed on empirical breakthroughs and
ever more complex techniques, the attention is now shifting
to building a solid mathematical understanding of why these
techniques

work so well in the first place. The purpose of this course is to
review this recent progress through a mixture of reading
group sessions and invited talks by leading researchers in
the topic, and prepare you to embark on a PhD in modern deep
learning research. Compared to typical, non-mathematical
courses on deep learning, this advanced module will appeal to
those who have strong foundations in mathematics and
theoretical computer science. In a way, this course is our
answer to the question “What should the world’s best computer
science students know about deep learning in 2021?”

## Learning Outcomes

This course should prepare the best students to start a PhD in the theory and mathematics of deep learning, and to start formulating their own hypotheses in this space. You will be introduced to a range of empirical and mathematical tools developed in recent years for the study of deep learning behaviour, and you will build an awareness of the main open questions and current lines of attack. At the end of the course you will:

- be able to explain why classical learning theory is insufficient to describe the phenomenon of generalization in DL
- be able to design and interpret empirical studies aimed at understanding generalization
- be able to explain the role of overparameterization: be able to use deep linear models as a model to study implicit regularisation of gradient-based learning
- be able to state PAC-Bayes and Information-theoretic bounds, and apply them to DL
- be able to explain the connection between Gaussian processes and neural networks, and will be able to study learning dynamics in the neural tangent kernel (NTK) regime.
- be able to formulate your own hypotheses about DL and choose tools to prove/test them
- leverage your deeper theoretical understanding to produce more robust, rigorous and reproducible solutions to practical machine learning problems.

## Syllabus

Each week we'll have two or three student-lead presentations about a research paper chosen from a reading list. Occasionally, we'll include invited guest lectures by top researchers in the field. The reading list follows the weekly breakdown below:

Week 1: Introduction to the topic

Week 2: Empirical Studies of Deep Learning Phenomena

Week 3: Interpolation Regime and “Double Descent” Phenomena

Week 4: Implicit Regularization in Deep Linear Models

Week 5: Approximation Theory

Week 6: Networks in the Infinite Width Limit

Week 7: PAC-Bayes and Information Theoretic Bounds for SGD

Week 8: Discussion and Coursework Spotlight Session

## Assessment

Students will be assessed on the following basis:

- 20% for presentation/content contributed to the module: Each student will have an opportunity to present one of the recommended papers during Weeks 1-7 (30 minute slot + 10 mins Q&A). For the presentation, students should aim to communicate the core ideas behind the paper, and clearly present the results, conclusions, and future directions. Where possible, students are encouraged to comment on how the work itself fits into broader research goals.
- 10% for active participation (regular attendance and contribution to discussions during the Q&A sessions).
- 70% for a coursework report, with a word limit of 4000.
Either (1) an original research proposal/report with a
hypothesis, review of related literature, and ideally
preliminary findings, or, (2) reproduction and ideally
extension of an existing relevant paper.

Coursework reports are marked in line with general ACS guidelines, reports receiving top marks will have have demonstrable research value (contain an original research idea, extension of existing work, or a thorough reproduction effort which is valuable to the research community). Additionally, some projects will be suggested during the first weeks of the course, although students are encouraged to come up with their own ideas. Students may be required to participate in group projects, with groups of size 2-3 (the class groups will be separated). For any given project, individual contributions would be noted for assessment though a viva component.

## Relationship with related modules

This course can be considered as an advanced follow-up to the Part IIB course on Deep Neural Networks. That course introduces some high level concepts that this course significantly expands on.

This module complements L48: Machine Learning in the Physical World and L46: Principles of Machine Learning Systems, which focus on applications and hardware/systems aspects of ML respectively.

## Recommended reading

- PNAS Colloquium on the Science of Deep Learning
- Mathematics of Machine Learning book by Marc Deisenroth, Aldo Faisal and Cheng Soon Ong.
- Probabilistic Machine Learning: An Introduction book by Kevin Murphy
- Matus Telgarsky's lecture notes on deep learning theory

These are in addition to the papers which will be discussed in the lectures.

## Further Information

Due to infectious respiratory diseases, the method of teaching for this module may be adjusted to cater for physical distancing and students who are working remotely. Unless otherwise advised, this module will be taught in person.