Machine Learning and Real-world Data
Course Outline
The main teaching for the course is being done via Moodle. However, various material is also available via this site. pdf versions of all notes are available here, for ease of printing.
The course has 16 sessions. Each task has a short introductory lecture, followed by a two hour practical demonstration session in the Intel Lab. Most of the practical sessions are concerned with a task (with associated tick), but there are 4 sessions which are purely for catch up. Here are some general instructions on how to perform the tasks.
Students who cannot attend the demonstrated sessions (for instance due to self-isolation, sickness or other valid reason) should ask their DoS to email [Javascript required] to get permission for online ticking (please don't leave it too late to arrange this).
Topic One: Statistical classification (7 sessions)
- Session 1: Introduction to sentiment classification
- Slides
- Introduction to the course (also available in PDF)
- Session 2: Naive Bayes Classifier
- Slides
- Notes on Naive Bayes (also available in PDF)
- Session 3: Statistical laws of language
- Session 4: Statistical testing
- Slides
- Notes on significance
testing (also available
in PDF)
- Session 5: Overtraining and cross-validation
- Slides
- ML methodology: training, development and evaluation datasets (also available in PDF)
- Session 6: Uncertainty and human agreement
- Session 7: Catch up 1 : Quick introduction to some other
classifers
Topic Two: Hidden Markov Models (4 sessions)
- Session 8: Training the HMM
- Session 9: Viterbi algorithm
- Session 10: HMMs in a biological application
- Session 11: Catch up 2: Protein structure prediction
Soft Ticking Deadlines
- Ticks 1, 2 and 3: Friday 4/2
- Ticks 4, 5 and 6: Friday 25/2
- Ticks 7, 8 and 9: Monday 14/3 (last session)
- Ticks 10, 11 and 12: CANCELLED DUE TO STRIKE (still available on Moodle)
Examinability of material for MLRD
All material which is examinable can be found in the slides, in the ticks or in the additional notes. This material is all linked from the Moodle site. The slides, practical notes and additional notes are also available from this page. Materials will become available shortly before each session.
NOTE: Due to strike action, Topic Three (Social Networks) will not be lectured and thus will not be examinable.
In some cases, students are asked to read additional material, such as some parts of the Easley and Kleinberg textbook. This is explicitly noted on the slides where the concepts are examinable.
Of course, it may also be necessary for students to read additional material in order to fully understand the material presented: you should ask your supervisor for help if you think this applies to you and are uncertain about what to read.
It is not necessary to do the starred ticks for the exams. However, looking at the material for the starred ticks may well help you understand material more thoroughly.
Material introduced in catch up session lectures is not examinable.
For fairness, the lecturers will not answer any individual questions about whether material is or is not examinable.
Supervision Questions
There are 4 question sets here.