Course pages 2017–18
Machine Learning and Real-world Data
Principal lecturers: Prof Ann Copestake, Dr Helen Yannakoudakis, Dr Paula Buttery
Taken by: Part IA CST 75%, Part IB CST 50%
Past exam questions
No. of lectures and practical classes: 16
Suggested hours of supervisions: 4
Prerequisite courses: NST Mathematics
Aims
This course introduces students to machine learning algorithms as used in real-world applications, and to the experimental methodology necessary to perform statistical analysis of large-scale data from unpredictable processes. Students will perform 3 extended practicals, as follows:
- Statistical classification: Determining movie review sentiment using Naive Bayes (7 sessions);
- Sequence Analysis: Hidden Markov Modelling and its application to a task from biology (predicting protein interactions with a cell membrane) (4 sessions);
- Analysis of social networks, including detection of cliques and central nodes (5 sessions).
Syllabus
- Topic One: Statistical Classification [7 sessions].
Introduction to sentiment classification.
Naive Bayes parameter estimation.
Statistical laws of language.
Statistical tests for classification tasks.
Cross-validation and test sets.
Uncertainty and human agreement.
- Topic Two: Sequence Analysis [4 sessions].
Hidden Markov Models (HMM) and HMM training.
The Viterbi algorithm.
Using an HMM in a biological application.
- Topic Three: Social Networks [5 sessions].
Properties of networks: Degree, Diameter.
Betweenness Centrality.
Clustering using betweenness centrality.
Objectives
By the end of the course students should be able to:
- understand and program two simple supervised machine learning algorithms;
- use these algorithms in statistically valid experiments, including the design of baselines, evaluation metrics, statistical testing of results, and provision against overtraining;
- visualise the connectivity and centrality in large networks;
- use clustering (i.e., a type of unsupervised machine learning) for detection of cliques in unstructured networks.
Recommended reading
Jurafsky, D. & Martin, J. (2008). Speech and language
processing. Prentice Hall.
Easley, D. and Kleinberg, J. (2010). Networks, crowds, and markets: reasoning about a highly connected world. Cambridge University Press.