Course pages 2018–19

Machine Learning and Real-world Data

Examinability of material for MLRD

All material which is examinable can be found in the slides, in the ticks or in the additional notes. This material is all linked from the Moodle site. The slides, practical notes and additional notes are also available from this page. Materials will become available shortly before each session.

In some cases, students are asked to read additional material, such as some parts of the Easley and Kleinberg textbook. This is explicitly noted on the slides where the concepts are examinable.

Of course, it may also be necessary for students to read additional material in order to fully understand the material presented: you should ask your supervisor for help if you think this applies to you and are uncertain about what to read.

It is not necessary to do the starred ticks for the exams. However, looking at the material for the starred ticks may well help you understand material more thoroughly.

Material introduced in catch up session lectures is not examinable.

For fairness, the lecturers will not answer any individual questions about whether material is or is not examinable.

Supervision Questions

There are 4 question sets here.

Course Outline

The main teaching for the course is being done via Moodle. However, various material is also available via this site. pdf versions of all notes are available here, for ease of printing.

The course has 16 sessions. Each session will have a short introductory lecture, followed by a two hour practical session. For most of these practical sessions, there is a `task' (with associated tick), but there are 4 sessions which are purely for catch up.

Students are expected to attend the practical sessions in the Intel lab, but may skip the catch up sessions if they have completed the work and obtained all their ticks up to that point.

The schedule may change in response to events, but any such changes will be announced on the Moodle forum.

Topic One: Statistical classification (7 sessions)

Session 1: Introduction to sentiment classification
Session 2: Naive Bayes Classifier
Session 3: Statistical laws of language
Session 4: Statistical testing
Session 5: Overtraining and cross-validation
Session 6: Uncertainty and human agreement
Session 7: Catch up 1 : Quick introduction to some other classifers
- Slides

Topic Two: Hidden Markov Models (4 sessions)

Session 8: Training the HMM
Session 9: Viterbi algorithm
Session 10: HMMs in a biological application
Session 11: Catch up 2: Protein structure prediction
- Slides

Department of Computer Science and Technology