Course pages 2018–19
Machine Learning and Real-world Data
Examinability of material for MLRD
All material which is examinable can be found in the slides, in the ticks or in the additional notes. This material is all linked from the Moodle site. The slides, practical notes and additional notes are also available from this page. Materials will become available shortly before each session.
In some cases, students are asked to read additional material, such as some parts of the Easley and Kleinberg textbook. This is explicitly noted on the slides where the concepts are examinable.
Of course, it may also be necessary for students to read additional material in order to fully understand the material presented: you should ask your supervisor for help if you think this applies to you and are uncertain about what to read.
It is not necessary to do the starred ticks for the exams. However, looking at the material for the starred ticks may well help you understand material more thoroughly.
Material introduced in catch up session lectures is not examinable.
For fairness, the lecturers will not answer any individual questions about whether material is or is not examinable.
Supervision Questions
There are 4 question sets here.
Course Outline
The main teaching for the course is being done via Moodle. However, various material is also available via this site. pdf versions of all notes are available here, for ease of printing.
The course has 16 sessions. Each session will have a short introductory lecture, followed by a two hour practical session. For most of these practical sessions, there is a `task' (with associated tick), but there are 4 sessions which are purely for catch up.
Students are expected to attend the practical sessions in the Intel lab, but may skip the catch up sessions if they have completed the work and obtained all their ticks up to that point.
The schedule may change in response to events, but any such changes will be announced on the Moodle forum.
Topic One: Statistical classification (7 sessions)
- Session 1: Introduction to sentiment classification
- Slides
- Practical notes
- Practical notes (pdf)
- Introduction to the course
- Introduction to the course (pdf)
- ML methodology: training, development and evaluation datasets
- ML methodology: training, development and evaluation datasets (pdf)
- Session 2: Naive Bayes Classifier
- Session 3: Statistical laws of language
- Session 4: Statistical testing
- Slides
- Practical notes
- Practical notes (pdf)
- Notes on significance testing
- Notes on significance testing (pdf)
- Session 5: Overtraining and cross-validation
- Session 6: Uncertainty and human agreement
- Session 7: Catch up 1 : Quick introduction to some other
classifers
Topic Two: Hidden Markov Models (4 sessions)
- Session 8: Training the HMM
- Session 9: Viterbi algorithm
- Session 10: HMMs in a biological application
- Session 11: Catch up 2: Protein structure prediction
Topic Three: Social Networks (4 sessions)
- Session 12: Properties of Networks
- Session 13: Betweenness
- Session 14: Clustering