Machine Learning and Real-world Data

Principal lecturer: Dr Fermin Moscoso del Prado Martin
Additional lecturer: Prof Paula Buttery
Taken by: Part IA CST
Term: Lent
Hours: 16
Format: In-person lectures
Suggested hours of supervisions: 4
This course is a prerequisite for: Advanced Data Science, Natural Language Processing
Exam: Paper 3 Question 7, 8, 9
Past exam questions, Moodle, timetable

Aims

This course introduces students to machine learning algorithms as used in real-world applications, and to the experimental methodology necessary to perform statistical analysis of large-scale data from unpredictable processes. Students will perform 3 extended practicals, as follows:

Statistical classification: Determining movie review sentiment using Naive Bayes;
Sequence Analysis: Hidden Markov Modelling and its application to a task from biology (predicting protein interactions with a cell membrane);
Analysis of social networks, including detection of cliques and central nodes.

Syllabus

Topic One: Statistical Classification
Introduction to sentiment classification.
Naive Bayes parameter estimation.
Statistical laws of language.
Statistical tests for classification tasks.
Cross-validation and test sets.
Uncertainty and human agreement.
Topic Two: Sequence Analysis
Hidden Markov Models (HMM) and HMM training.
The Viterbi algorithm.
Using an HMM in a biological application.
Topic Three: Social Networks
Properties of networks: Degree, Diameter.
Betweenness Centrality.
Clustering using betweenness centrality.

Objectives

By the end of the course students should be able to:

understand and program two simple supervised machine learning algorithms;
use these algorithms in statistically valid experiments, including the design of baselines, evaluation metrics, statistical testing of results, and provision against overtraining;
visualise the connectivity and centrality in large networks;
use clustering (i.e., a type of unsupervised machine learning) for detection of cliques in unstructured networks.

Machine Learning and Real-world Data

Aims

Syllabus

Objectives

Recommended reading

Study at Cambridge

About the University

Research at Cambridge