Course pages 2019–20

Foundations of Data Science

Principal lecturer: Dr Damon Wischik
Taken by: Part IB CST 50%, Part IB CST 75%
Past exam questions

No. of lectures and practical classes: 12 + 4
Suggested hours of supervisions: 3
Prerequisite courses: either Mathematics for Natural Sciences, or the equivalent from the Maths Tripos
This course is a prerequisite for: Part II Machine Learning and Bayesian Inference, Information Retrieval, Quantum Computing, Natural Language Processing.

Aims

This course introduces fundamental tools for describing and reasoning about data. There are two themes: describing the behaviour of random systems; and making inferences based on data generated by such systems. The course will survey a wide range of models and tools, and it will emphasize how to design a model and what sorts of questions one might ask about it.

Lectures

Likelihood. Random variables. Random samples. Maximum likelihood estimation, likelihood profile.
Random variables. Rules for expectation and variance. Generating random variables. Empirical distribution. Monte Carlo estimation; law of large numbers. Central limit theorem.
Inference. Estimation, confidence intervals, hypothesis testing, prediction. Bootstrap. Bayesianism. Logistic regression, natural parameters.
Feature spaces. Vector spaces, bases, inner products, projection. Model fitting as projection. Linear modeling. Choice of features.
Random processes. Markov chains. Stationarity and convergence. Drift models. Examples, including estimation and memory.
Probabilistic modelling. Independence; joint distributions. Descriptive, discriminative, and causal models. Latent variable models. Random fields.

Objectives

At the end of the course students should

be able to formulate basic probabilistic models, including discrete time Markov chains and linear models
be familiar with common random variables and their uses, and with the use of empirical distributions rather than formulae
be able to use expectation and conditional expectation, limit theorems, equilibrium distributions
understand different types of inference about noisy data, including model fitting, hypothesis testing, and making predictions
understand the fundamental properties of inner product spaces and orthonormal systems, and their application to model representation

Department of Computer Science and Technology

Foundations of Data Science

Aims

Lectures

Objectives

Recommended reading