skip to primary navigationskip to content

Department of Computer Science and Technology

Undergraduate

Course pages 2022–23 (working draft)

Data Science

Principal lecturer: Dr Damon Wischik
Taken by: Part IB CST
Hours: 16 (16 lectures)
Format: Video lectures and in-person Q&A sessions
Suggested hours of supervisions: 4
Prerequisites: Mathematics for Natural Sciences
This course is a prerequisite for: Advanced Data Science, Computer Systems Modelling, Machine Learning and Bayesian Inference, Natural Language Processing, Quantum Computing
Past exam questions

Aims

This course introduces fundamental tools for describing and reasoning about data. There are two themes: describing the behaviour of random systems; and making inferences based on data generated by such systems. The course will survey a wide range of models and tools, and it will emphasize how to design a model and what sorts of questions one might ask about it.

Lectures

  • Likelihood. Random variables. Random samples. Maximum likelihood estimation, likelihood profile.
  • Random variables. Rules for expectation and variance. Generating random variables. Empirical distribution. Monte Carlo estimation; law of large numbers. Central limit theorem.
  • Inference. Estimation, confidence intervals, hypothesis testing, prediction. Bootstrap. Bayesianism. Logistic regression, natural parameters.
  • Feature spaces. Vector spaces, bases, inner products, projection. Model fitting as projection. Linear modeling. Choice of features.
  • Random processes. Markov chains. Stationarity and convergence. Drift models. Examples, including estimation and memory.
  • Probabilistic modelling. Independence; joint distributions. Descriptive, discriminative, and causal models. Latent variable models. Random fields.

Objectives

At the end of the course students should

  • be able to formulate basic probabilistic models, including discrete time Markov chains and linear models
  • be familiar with common random variables and their uses, and with the use of empirical distributions rather than formulae
  • be able to use expectation and conditional expectation, limit theorems, equilibrium distributions
  • understand different types of inference about noisy data, including model fitting, hypothesis testing, and making predictions
  • understand the fundamental properties of inner product spaces and orthonormal systems, and their application to model representation

Recommended reading

* F.M. Dekking, C. Kraaikamp, H.P. Lopuhaä, L.E. Meester (2005). A modern introduction to probability and statistics: understanding why and how. Springer.

S.M. Ross (2002). Probability models for computer science. Harcourt / Academic Press.

M. Mitzenmacher and E. Upfal (2005). Probability and computing: randomized algorithms and probabilistic analysis. Cambridge University Press.