Data Science

Principal lecturer: Dr Damon Wischik
Taken by: Part IB CST
Term: Michaelmas
Hours: 16 (16 lectures)
Format: In-person lectures
Suggested hours of supervisions: 4
Prerequisites: Mathematics for Natural Sciences
This course is a prerequisite for: Advanced Data Science, Computer Systems Modelling, Machine Learning and Bayesian Inference, Natural Language Processing, Quantum Computing
Exam: Paper 6 Question 5, 6
Past exam questions, Moodle, timetable

Aims

This course introduces fundamental tools for describing and reasoning about data. There are two themes: designing probability models to describe systems; and drawing conclusions based on data generated by such systems.

Lectures

Specifying and fitting probability models. Random variables. Maximum likelihood estimation. Generative and supervised models. Goodness of fit.
Feature spaces. Vector spaces, bases, inner products, projection. Linear models. Model fitting as projection. Design of features.
Handling probability models. Handling pdf and cdf. Bayes’s rule. Monte Carlo estimation. Empirical distribution.
Inference. Bayesianism. Frequentist confidence intervals, hypothesis testing. Bootstrap resampling.
Random processes. Markov chains. Stationarity, and drift analysis. Processes with memory. Learning a random process.

Objectives

At the end of the course students should

be able to formulate basic probabilistic models, including discrete time Markov chains and linear models
be familiar with common random variables and their uses, and with the use of empirical distributions rather than formulae
understand different types of inference about noisy data, including model fitting, hypothesis testing, and making predictions
understand the fundamental properties of inner product spaces and orthonormal systems, and their application to modelling

Data Science

Aims

Lectures

Objectives

Recommended reading

Study at Cambridge

About the University

Research at Cambridge