Course pages 2016–17

Machine Learning and Real-world Data

Errata

Session 3 (Statistical Laws of Language), slide 11, title Vocabulary size:

In the original slides, the relationship between u_n (unique items in the vocabulary) and n (text size) was wrongly described as `exponential'. This expression was removed, as the formula (with 0 < b < 1) fully explains the type of relationship.

Other changes (May 27)

Course Outline

The main teaching for the course is being done via Moodle. However, various material is also available via this site. pdf versions of all notes are available here, for ease of printing. The notes and slides will be added incrementally. Some material is in draft form, as noted.

The course has 16 sessions. Each session will have a short introductory lecture, followed by a two hour practical session. For most of these practical sessions, there is a `task', but there are 4 sessions which are purely for catch up.

Students are expected to attend the practical sessions in the Intel lab, but may skip the catch up sessions if they have completed the work and obtained all their ticks up to that point.

The schedule may change in response to events, but any such changes will be announced on the Moodle forum.

Topic One: Statistical classification (7 sessions)

Session 1: Introduction to sentiment classification
Tick 1 corresponds to this session
Session 2: Naive Bayes Classifier
Tick 2 corresponds to this session
Session 3: Statistical laws of language
Tick 3 corresponds to Session 3 plus Session 4
Session 4: Statistical testing
Tick 3 corresponds to Session 3 plus Session 4
Session 5: Overtraining and cross-validation
Tick 4 corresponds to Session 5 plus Session 6
Session 6: Uncertainty and human agreement
Tick 4 corresponds to Session 5 plus Session 6
Session 7: Catch up 1 : Research in Sentiment Detection
- Slides

Topic Two: Hidden Markov Models (4 sessions)

Session 8: Training the HMM
Tick 5 corresponds to this session
Session 9: Viterbi algorithm
Tick 6 corresponds to Session 9 and Session 10
Session 10: HMMs in a biological application
Tick 6 corresponds to Session 9 and Session 10
Session 11: Catch up 2: Protein structure prediction
- Slides

Topic Three: Social Networks (4 sessions)

Session 12: Properties of Networks
Tick 7 corresponds to this session
Session 13: Betweenness
Tick 8 corresponds to Session 13 and Session 14
Session 14: Clustering
Tick 8 corresponds to Session 13 and Session 14
Session 15: Catch up 3: Ethics and machine learning
- Slides
Final Catch up session: No lecture

Computer Laboratory