Department of Computer Science and Technology

Course pages 2018–19

Machine Learning and Bayesian Inference

Principal lecturer: Dr Sean Holden
Taken by: Part II CST 50%, Part II CST 75%
Past exam questions

No. of lectures: 12
Suggested hours of supervisions: 3
Prerequisite courses: Artificial Intelligence, Foundations of Data Science, Discrete Mathematics and Probability, Linear Algebra and Calculus from the NST Mathematics course.


The Part 1B course Artificial Intelligence introduced simple neural networks for supervised learning, and logic-based methods for knowledge representation and reasoning. This course has two aims. First, to provide a rigorous introduction to machine learning, moving beyond the supervised case and ultimately presenting state-of-the-art methods. Second, to provide an introduction to the wider area of probabilistic methods for representing and reasoning with knowledge.


  • Introduction to learning and inference. Supervised, unsupervised, semi-supervised and reinforcement learning. Bayesian inference in general. What the naive Bayes method actually does. Review of backpropagation. Other kinds of learning and inference. [1 lecture]

  • How to classify optimally. Treating learning probabilistically. Bayesian decision theory and Bayes optimal classification. Likelihood functions and priors. Bayes theorem as applied to supervised learning. The maximum likelihood and maximum a posteriori hypotheses. What does this teach us about the backpropagation algorithm? [2 lectures]

  • Linear classifiers I. Supervised learning via error minimization. Iterative reweighted least squares. The maximum margin classifier. [1 lecture]

  • Support vector machines (SVMs). The kernel trick. Problem formulation. Constrained optimization and the dual problem. SVM algorithm. [2 lectures]

  • Practical issues. Hyperparameters. Measuring performance. Cross-validation. Experimental methods. [1 lecture]

  • Linear classifiers II. The Bayesian approach to neural networks. [1 lecture]

  • Unsupervised learning I. The k-means algorithm. Clustering as a maximum likelihood problem. [1 lecture]

  • Unsupervised learning II. The EM algorithm and its application to clustering. [1 lecture]

  • Bayesian networks I. Representing uncertain knowledge using Bayesian networks. Conditional independence. Exact inference in Bayesian networks. [1 lecture]

  • Bayesian networks II. Markov random fields. Approximate inference. Markov chain Monte Carlo methods. [1 lecture]


At the end of this course students should:

  • Understand how learning and inference can be captured within a probabilistic framework, and know how probability theory can be applied in practice as a means of handling uncertainty in AI systems.

  • Understand several algorithms for machine learning and apply those methods in practice with proper regard for good experimental practice.

Recommended reading

If you are going to buy a single book for this course we recommend:

* Bishop, C.M. (2006). Pattern recognition and machine learning. Springer.

The course text for Artificial Intelligence I:

Russell, S. & Norvig, P. (2010). Artificial intelligence: a modern approach. Prentice Hall (3rd ed.).

covers some relevant material but often in insufficient detail. Similarly:

Mitchell, T.M. (1997). Machine Learning. McGraw-Hill.

gives a gentle introduction to some of the course material, but only an introduction. Recently a few new books have appeared that cover a lot of relevant ground well. For example:

Barber, D. (2012). Bayesian Reasoning and Machine Learning. Cambridge University Press.
Flach, P. (2012). Machine Learning: The Art and Science of Algorithms that Make Sense of Data. Cambridge University Press.
Murphy, K.P. (2012). Machine Learning: A Probabilistic Perspective. MIT Press.