Data Science

Principal lecturer: Dr Damon Wischik
Taken by: Part IB CST
Term: Michaelmas
Hours: 16 (16 lectures)
Format: In-person lectures
Suggested hours of supervisions: 4
Prerequisites: Mathematics for Natural Sciences
This course is a prerequisite for: Advanced Data Science, Computer Systems Modelling, Machine Learning and Bayesian Inference, Natural Language Processing, Quantum Computing
Exam: Paper 6 Question 5, 6, 5, 6
Past exam questions, Moodle, timetable

Aims

This course introduces fundamental tools for describing and reasoning about data. There are two themes: describing the behaviour of random systems; and making inferences based on data generated by such systems. The course will survey a wide range of models and tools, and it will emphasize how to design a model and what sorts of questions one might ask about it.

Lectures

Likelihood. Random variables. Random samples. Maximum likelihood estimation, likelihood profile.
Random variables. Rules for expectation and variance. Generating random variables. Empirical distribution. Monte Carlo estimation; law of large numbers. Central limit theorem.
Inference. Estimation, confidence intervals, hypothesis testing, prediction. Bootstrap. Bayesianism. Logistic regression, natural parameters.
Feature spaces. Vector spaces, bases, inner products, projection. Model fitting as projection. Linear modeling. Choice of features.
Random processes. Markov chains. Stationarity and convergence. Drift models. Examples, including estimation and memory.
Probabilistic modelling. Independence; joint distributions. Descriptive, discriminative, and causal models. Latent variable models. Random fields.

Objectives

At the end of the course students should

be able to formulate basic probabilistic models, including discrete time Markov chains and linear models
be familiar with common random variables and their uses, and with the use of empirical distributions rather than formulae
be able to use expectation and conditional expectation, limit theorems, equilibrium distributions
understand different types of inference about noisy data, including model fitting, hypothesis testing, and making predictions
understand the fundamental properties of inner product spaces and orthonormal systems, and their application to model representation

Data Science

Aims

Lectures

Objectives

Recommended reading

Study at Cambridge

About the University

Research at Cambridge