Theory of Deep Learning

Principal lecturers: Dr Ferenc Huszar, Dr Challenger Mishra
Taken by: MPhil ACS, Part III
Code: R252
Term: Lent
Hours: 16 (8x2 hour reading group sessions)
Class limit: max. 28 students
Prerequisites: A strong background in calculus, probability theory and linear algebra, familiarity with differential equations, optimization and information theory is required. Students need to have studied Deep Neural Networks, familiarity with deep learning terminology (e.g. architectures, benchmark problems) as well as deep learning frameworks (pytorch, jax or similar) is assumed
Moodle, timetable

Aims

The objectives of this course is to expose you to one of the most active contemporary research directions within machine learning: the theory of deep learning (DL). While the first wave of modern DL has focussed on empirical breakthroughs and ever more complex techniques, the attention is now shifting to building a solid mathematical understanding of why these techniques
work so well in the first place. The purpose of this course is to review this recent progress through a mixture of reading group sessions and invited talks by leading researchers in the topic, and prepare you to embark on a PhD in modern deep learning research. Compared to typical, non-mathematical courses on deep learning, this advanced module will appeal to those who have strong foundations in mathematics and theoretical computer science. In a way, this course is our answer to the question “What should the world’s best computer science students know about deep learning in 2023?”

Learning Outcomes

This course should prepare the best students to start a PhD in the theory and mathematics of deep learning, and to start formulating their own hypotheses in this space. You will be introduced to a range of empirical and mathematical tools developed in recent years for the study of deep learning behaviour, and you will build an awareness of the main open questions and current lines of attack. At the end of the course you will:

be able to explain why classical learning theory is insufficient to describe the phenomenon of generalization in DL
be able to design and interpret empirical studies aimed at understanding generalization
be able to explain the role of overparameterization: be able to use deep linear models as a model to study implicit regularisation of gradient-based learning
be able to state PAC-Bayes and Information-theoretic bounds, and apply them to DL
be able to explain the connection between Gaussian processes and neural networks, and will be able to study learning dynamics in the neural tangent kernel (NTK) regime.
be able to formulate your own hypotheses about DL and choose tools to prove/test them
leverage your deeper theoretical understanding to produce more robust, rigorous and reproducible solutions to practical machine learning problems.

Syllabus

Each week we'll have two to four student-lead presentations about a research paper chosen from a reading list. The reading list loosely follows the weekly breakdown below (but we adapt it each year based as this is an active research area):

Week 1: Introduction to the topic
Week 2: Empirical Studies of Deep Learning Phenomena
Week 3: Interpolation Regime and “Double Descent” Phenomena
Week 4: Implicit Regularization in Deep Linear Models
Week 5: Approximation Theory
Week 6: Networks in the Infinite Width Limit
Week 7: PAC-Bayes and Information Theoretic Bounds for SGD
Week 8: Discussion and Coursework Spotlight Session

Assessment

Students will be assessed on the following basis:

20% for presentation/content contributed to the module: Each student will have an opportunity to present one of the recommended papers during Weeks 1-7 (30 minute slot including Q&A). For the presentation, students should aim to communicate the core ideas behind the paper, and clearly present the results, conclusions, and future directions. Where possible, students are encouraged to comment on how the work itself fits into broader research goals.
10% for active participation (regular attendance and contribution to discussions during the Q&A sessions).
70% for a group project report, with a word limit of 4000. Either (1) an original research proposal/report with a hypothesis, review of related literature, and ideally preliminary findings, or, (2) reproduction and ideally extension of an existing relevant paper.
Coursework reports are marked in line with general ACS guidelines, reports receiving top marks will have have demonstrable research value (contain an original research idea, extension of existing work, or a thorough reproduction effort which is valuable to the research community). Additionally, some projects will be suggested during the first weeks of the course, although students are encouraged to come up with their own ideas. Students may be required to participate in group projects, with groups of size 2-3 (the class groups will be separated). For any given project, individual contributions would be noted for assessment though a viva component.

Relationship with related modules

This course can be considered as an advanced follow-up to the Part IIB course on Deep Neural Networks. That course introduces some high level concepts that this course significantly expands on.

This module complements L48: Machine Learning in the Physical World and L46: Principles of Machine Learning Systems, which focus on applications and hardware/systems aspects of ML respectively.

Theory of Deep Learning

Aims

Learning Outcomes

Syllabus

Assessment

Relationship with related modules

Recommended reading

Study at Cambridge

About the University

Research at Cambridge