Computer Laboratory

Course pages 2017–18

Machine Learning for Language Processing

Principal lecturers: Prof Ted Briscoe, Prof Ann Copestake
Taken by: MPhil ACS, Part III
Code: L101
Hours: 16 (8 lectures + 8 seminar sessions)
Class limit: 16 students
Prerequisites: L90 Overview of Natural Language Processing or similar, L95 Introduction to Natural Language Syntax and Parsing


This module aims to provide an introduction to machine learning with specific application to tasks such as document classification, spam email filtering, language modelling, part-of-speech tagging, and named entity and event recognition for textual information extraction. We will cover supervised, weakly-supervised and unsupervised approaches using generative and discriminative classifiers based on graphical models, including (hidden) Markov models and CRFs, and clustering / dimensionality-reduction methods, such as latent Dirichlet allocation and neural word embeddings.


  • Classification by machine learning: classification vs. prediction, types of classifier, generative vs. discriminative models, supervised training. [2 lectures, Prof Copestake]
  • Document topic classification: bag-of-words representation, evaluation measures, feature selection, model comparison. [2 seminars, Prof Briscoe]
  • Graphical models: Markov Models, Hidden Markov Models, Maximum Entropy. [2 lectures, Prof Copestake]
  • Spam email filtering: task, adaptive training, semi-structured documents, N-gram language models, evaluation. [1 seminar, Prof Briscoe]
  • Named entity recognition: HMMs vs. MaxEnt & non-sequential classification, inherent vs. contextual features, feature dependence, partially labelled data, evaluation. [1 seminar, Prof Briscoe]
  • Perceptron classifiers: kernel 'trick', types of kernel. [1 lecture, Prof Copestake]
  • Relation extraction: sequence vs. tree kernel approaches, evaluation. [1 seminar, Prof Briscoe]
  • Weak/Un-supervised methods: singular value decomposition, latent Dirichlet allocation, neural methods. [3 lectures, Prof Copestake]
  • Document topic models, Unsupervised PoS tagging, and Word Embeddings. [3 seminars, Prof Briscoe]


On completion of this module, students should:

  • understand the issues involved in applying machine learning approaches to a range of language processing applications;
  • understand the theory underlying a number of machine learning approaches that have been applied to language processing, including: graphical models, perceptrons, and neural and dimensionality-reduction methods;
  • understand some applications and specific tasks including: document topic classification and clustering, SPAM filtering, PoS tagging, named entity recognition, event extraction, and language modelling and word embeddings.


Students will be expected to undertake reading for assigned lectures and seminars. Each student will give a 20 minute presentation of one paper.


  • Students will receive one tick worth 5% for attendance at seminar sessions, reading of assigned material, and satisfactory contribution during seminars.
  • Students will receive a second tick worth 5% for a satisfactory presentation of an assigned paper.
  • students will undertake a small project to be agreed with the lecturers and write a project report of not more than 5000 words. The report will be due around the beginning of the Lent Term (see academic calendar for precise date), will be assessed by the lecturers, and will account for 90% of the module marks.

Recommended reading

Bishop, C. (2006). Pattern recognition and machine learning. Springer. (Chaps: 1, 2, 4-9, 13).

Jurafsky, D. & Martin, J. (2008). Speech and language processing. Prentice Hall (2nd ed.). (Chaps: 4-6, 22).

Manning, C., Raghavan, P. & Schutze, H. (2008). Introduction to information retrieval. Cambridge University Press. (Chaps: 12-18).

Class limit

This module has a class limit of 16.