Course pages 2016–17
Machine Learning and Real-world Data
Errata
Session 3 (Statistical Laws of Language), slide 11, title Vocabulary size:In the original slides, the relationship between un (unique items in the vocabulary) and n (text size) was wrongly described as `exponential'. This expression was removed, as the formula (with 0 < b < 1) fully explains the type of relationship.
Other changes (May 27)
- Notes on Brandes
- Notes on Brandes (pdf)
- Question set contributed by Ekaterina Kochmar (pdf)
- Answers contributed by Ekaterina Kochmar (pdf)
- Exam style question on HMMs (pdf)
- Answers to exam style question on HMMs (pdf)
Course Outline
The main teaching for the course is being done via Moodle. However, various material is also available via this site. pdf versions of all notes are available here, for ease of printing. The notes and slides will be added incrementally. Some material is in draft form, as noted.
The course has 16 sessions. Each session will have a short introductory lecture, followed by a two hour practical session. For most of these practical sessions, there is a `task', but there are 4 sessions which are purely for catch up.
Students are expected to attend the practical sessions in the Intel lab, but may skip the catch up sessions if they have completed the work and obtained all their ticks up to that point.
The schedule may change in response to events, but any such changes will be announced on the Moodle forum.
Topic One: Statistical classification (7 sessions)
- Session 1: Introduction to sentiment classification
Tick 1 corresponds to this session
- Session 2: Naive Bayes Classifier
Tick 2 corresponds to this session
- Session 3: Statistical laws of language
Tick 3 corresponds to Session 3 plus Session 4
- Session 4: Statistical testing
Tick 3 corresponds to Session 3 plus Session 4
- Session 5: Overtraining and cross-validation
Tick 4 corresponds to Session 5 plus Session 6
- Session 6: Uncertainty and human agreement
Tick 4 corresponds to Session 5 plus Session 6
- Session 7: Catch up 1 : Research in Sentiment Detection
Topic Two: Hidden Markov Models (4 sessions)
- Session 8: Training the HMM
Tick 5 corresponds to this session
- Session 9: Viterbi algorithm
Tick 6 corresponds to Session 9 and Session 10
- Session 10: HMMs in a biological application
Tick 6 corresponds to Session 9 and Session 10
- Session 11: Catch up 2: Protein structure prediction
Topic Three: Social Networks (4 sessions)
- Session 12: Properties of Networks
Tick 7 corresponds to this session
- Session 13: Betweenness
Tick 8 corresponds to Session 13 and Session 14
- Session 14: Clustering
Tick 8 corresponds to Session 13 and Session 14
- Session 15: Catch up 3: Ethics and machine learning
Final Catch up session: No lecture