Course pages 2016–17

Machine Learning and Real-world Data

Principal lecturers: Dr Simone Teufel, Prof Ann Copestake
Taken by: Part IA CST 75%
Past exam questions

No. of lectures and practical classes: 16
Suggested hours of supervisions: 4
Prerequisite courses: NST Mathematics

Aims

This course introduces students to machine learning algorithms as used in real-world applications, and to the experimental methodology necessary to perform statistical processing of large-scale unpredictable processes such as language, social networks or genetic data. Students will perform 3 extended practicals, as follows:

Statistical classification: Determining a movie review’s sentiment using Naive Bayes (7 sessions)
Sequence Analysis: Detection of proteins in genetic data using Hidden Markov Modelling (4 sessions)
Network analysis of a social network, including detection of cliques and central nodes (5 sessions)

Syllabus

Topic One: Statistical Classification [7 sessions].
Introduction to Sentiment Classification.
Naive Bayes Parameter Estimation.
Statistical Laws of Language.
Smoothing and Statistical Tests.
Overtraining.
Uncertainty and Human Agreement.
Topic Two: Sequence Analysis [4 sessions].
Simple HMM Parameter Estimation.
The Viterbi Algorithm.
Random Baselines and Evaluation Metrics.
Application to Protein Detection Data.
Topic Three: Network Analysis [5 sessions].
Degree, Diameter, Visualisation.
Random Networks and Small World Property.
Betweenness Centrality.
Clique Finding.

Objectives

By the end of the course students should be able to

understand and program two simple supervised machine learning algorithms;
use these algorithms in statistically valid experiments, including the design of baselines, evaluation metrics, statistical testing of results, and provision against overtraining;
visualise and interpret examples of statistical laws of language;
visualise the connectivity and centrality in large networks;
use clustering (i.e., a type of unsupervised machine learning) for detection of cliques in unstructured networks.

Computer Laboratory

Machine Learning and Real-world Data

Aims

Syllabus

Objectives

Recommended reading