skip to primary navigationskip to content

Department of Computer Science and Technology

Undergraduate

Course pages 2021–22

Advanced Data Science

Principal lecturers: Prof Neil Lawrence, Dr Carl Henrik Ek
Taken by: Part II CST 75%
Term: Michaelmas
Hours: 16
Format: In-person lectures
Prerequisites: Data Science, Machine Learning and Real-world Data. NST Mathematics
timetable

Aims

Data science has through machine learning become an essential building block for work across a wide range
of different disciplines. While this success have lead to more interest and developments of new methods it
has also created a range of different narratives to describe the field. The first aim of this course is to provide
a statistical narrative to uncover the principles underlying machine learning. The second aim of the course
is to provide the key building blocks for how to build statistical models of data and how to perform inference
by mixing data and model through computation. While our aim is to make data science mathematically
principled an important skill set of a data scientist is intuition that is best built from practice. To that
end we will implement each method we present from scratch using modern tool-chains such as PyTorch and
address the challenges that one is faced with when working with machine learning for real data.

Lectures

Part I The first part of the course will focus on building statistical models. We will cover the material
through examples of real-world applications and the challenges we face when working in data in the
wild.
Part II The second part of the course will focus on theoretical foundations of learning theory. We will cover
concepts such as "No Free Lunch Theorem", "Bias-Variance Trade-off", "Empirical Risk Minimisation"
and "Generalisation". We will exemplify these concepts through the models introduced in Part I.
Part III In the third part of the course we will focus on approximate inference. We will cover stochastic inference
approximations such as Markov Chain Monte Carlo and deterministic approaches as Variational Bayes.

Practicals

To be confirmed shortly

Objectives

By the end of the course students should be able to:

  • understand the principles that allows us to learn from data
  • be able to formulate a statistical model of data
  • understand how to perform statistical inference

Recommended reading

  • Simon Rogers et al. (2016). A First Course in Machine Learning, Second Edition. [] Chapman and Hall/CRC, nil
  • Shai Shalev-Shwartz et al. (2014). Understanding Machine Learning: From Theory to Algorithms. New York, NY, USA: Cambridge  University Press
  • Christopher M. Bishop (2006). Pattern Recognition and Machine Learning (Information Science and Statistics). Secaucus, NJ, USA: Springer-Verlag New York, Inc.

Assessment

Practical There will be four practicals contributing 20% to the final module mark. These will be ‘ticked’
rather than graded: i.e., for each assignment, 100% of the mark is awarded for satisfactory completion
and 0% for inadequate work or failure to submit. Data science is a topic where there is rarely a single
correct answer therefore the important thing is that an informed attempt have been made to address
the questions that can be supported and motivated.
Report The course will be concluded by an individual report covering the material in the course through
practical exercise working with real data. This report will make up 80% of the mark for the course.