skip to primary navigationskip to content

Department of Computer Science and Technology

Part IB CST

 

Course pages 2024–25 (working draft)

Data Science

Principal lecturer: Dr Damon Wischik
Taken by: Part IB CST
Term: Michaelmas
Hours: 16 (16 lectures)
Format: In-person lectures
Suggested hours of supervisions: 4
Prerequisites: Mathematics for Natural Sciences
This course is a prerequisite for: Advanced Data Science, Computer Systems Modelling, Machine Learning and Bayesian Inference, Natural Language Processing, Quantum Computing
Past exam questions, timetable

Aims

This course introduces fundamental tools for describing and reasoning about data. There are two themes: designing probability models to describe systems; and drawing conclusions based on data generated by such systems.

Lectures

  • Specifying and fitting probability models. Random variables. Maximum likelihood estimation. Generative and supervised models. Goodness of fit.
  • Feature spaces. Vector spaces, bases, inner products, projection. Linear models. Model fitting as projection. Design of features.
  • Handling probability models. Handling pdf and cdf. Bayes’s rule. Monte Carlo estimation. Empirical distribution.
  • Inference. Bayesianism. Frequentist confidence intervals, hypothesis testing. Bootstrap resampling.
  • Random processes. Markov chains. Stationarity, and drift analysis. Processes with memory. Learning a random process.

Objectives

At the end of the course students should

  • be able to formulate basic probabilistic models, including discrete time Markov chains and linear models
  • be familiar with common random variables and their uses, and with the use of empirical distributions rather than formulae
  • understand different types of inference about noisy data, including model fitting, hypothesis testing, and making predictions
  • understand the fundamental properties of inner product spaces and orthonormal systems, and their application to modelling

Recommended reading

* F.M. Dekking, C. Kraaikamp, H.P. Lopuhaä, L.E. Meester (2005). A modern introduction to probability and statistics: understanding why and how. Springer.

S.M. Ross (2002). Probability models for computer science. Harcourt / Academic Press.

M. Mitzenmacher and E. Upfal (2005). Probability and computing: randomized algorithms and probabilistic analysis. Cambridge University Press.