# Department of Computer Science and Technology

Course pages 2019–20

# Foundations of Data Science

Principal lecturer: Dr Damon Wischik
Taken by: Part IB CST 50%, Part IB CST 75%
Past exam questions

No. of lectures and practical classes: 12 + 4
Suggested hours of supervisions: 3
Prerequisite courses: either Mathematics for Natural Sciences, or the equivalent from the Maths Tripos
This course is a prerequisite for: Part II Machine Learning and Bayesian Inference, Information Retrieval, Quantum Computing, Natural Language Processing.

## Aims

This course introduces fundamental tools for describing and reasoning about data. There are two themes: describing the behaviour of random systems; and making inferences based on data generated by such systems. The course will survey a wide range of models and tools, and it will emphasize how to design a model and what sorts of questions one might ask about it.

## Lectures

• Likelihood. Random variables. Random samples. Maximum likelihood estimation, likelihood profile.

• Random variables. Rules for expectation and variance. Generating random variables. Empirical distribution. Monte Carlo estimation; law of large numbers. Central limit theorem.

• Inference. Estimation, confidence intervals, hypothesis testing, prediction. Bootstrap. Bayesianism. Logistic regression, natural parameters.

• Feature spaces. Vector spaces, bases, inner products, projection. Model fitting as projection. Linear modeling. Choice of features.

• Random processes. Markov chains. Stationarity and convergence. Drift models. Examples, including estimation and memory.

• Probabilistic modelling. Independence; joint distributions. Descriptive, discriminative, and causal models. Latent variable models. Random fields.

## Objectives

At the end of the course students should

• be able to formulate basic probabilistic models, including discrete time Markov chains and linear models

• be familiar with common random variables and their uses, and with the use of empirical distributions rather than formulae

• be able to use expectation and conditional expectation, limit theorems, equilibrium distributions

• understand different types of inference about noisy data, including model fitting, hypothesis testing, and making predictions

• understand the fundamental properties of inner product spaces and orthonormal systems, and their application to model representation