Computer Laboratory

Course pages 2011–12

Machine Learning for Language Processing

Principal lecturers: Prof Ted Briscoe, Dr Mark Gales
Taken by: MPhil ACS, Part III
Code: L101
Hours: 16 (8 lectures + 8 seminar sessions)
Prerequisites: None, but the modules Introduction to Natural Language Processing L100 and Spoken Language Processing L106 useful


This module aims to provide an introduction to machine learning with specific application to tasks such as document topic classification, spam email filtering, and named entity and event recognition for textual information extraction. We will cover supervised, weakly-supervised and unsupervised approaches using generative and discriminative classifiers based on graphical models, including hidden Markov models, Gaussian mixture models and conditional random fields.


  • Classification by machine learning: classification vs. prediction, types of classifier, generative vs. discriminative models, supervised training. [2 lectures, Dr Gales]
  • Document topic classification: bag-of-words representation, evaluation measures, feature selection, model comparison [2 seminars, Prof Briscoe]
  • Graphical models: Markov Models, Hidden Markov Models, Gaussian Mixture Models, Conditional Random Fields, Estimation Maximisation, Variational Inference. [4 lectures, Dr Gales]
  • Spam email filtering: task, adaptive training, semi-structured documents, N-gram language models, evaluation [1 seminar, Prof Briscoe]
  • Named entity recognition: HMMs vs. CRFs vs. parsing & non-sequential classification, inherent vs. contextual features, feature dependence, partial vs. actively labelled data, evaluation. [2 seminars, Prof Briscoe]
  • Support vector machines: maximum margin classifiers, kernel “trick”, types of kernel. [1 lecture, Dr Gales]
  • Relation extraction: sequence vs. tree vs. graph kernel approaches, evaluation. [2 seminars, Prof Briscoe]
  • Clustering: factor analysis, singular value decomposition, principal component analysis. [1 lecture, Dr Gales]
  • Document topic clustering and term clustering: latent semantic indexing/analysis, incremental semantic analysis, evaluation. [1 seminar, Prof Briscoe]


On completion of this module, students should:

  • understand the issues involved in applying machine learning approaches to a range of language processing applications;
  • understand the theory underlying a number of machine learning approaches that have been applied to language processing, including: graphical models, Gaussian mixture models, conditional random fields, support vector machines;
  • understand applications of machine learning to specific tasks including: document topic classification and clustering, SPAM filtering, named entity recognition.


Students will be expected to undertake reading for assigned lectures and seminars. Each student will give a 20–30 minute presentation of one paper during a seminar.

Practical work



  • Students will receive one tick worth 5% for attendance at seminar sessions, reading of assigned material, and satisfactory contribution during seminars.
  • Students will receive a second tick worth 5% for a satisfactory presentation of an assigned paper.
  • Students will write an in-depth essay on a topic agreed with the lecturers of not more than 5000 words. The essay will be due at the beginning of the Easter Term, will be assessed by one of the lecturers, and will account for 90% of the module marks.

Recommended reading

Bishop, C. (2006). Pattern recognition and machine learning. Springer. (Chaps: 1, 2, 7–10, 12).
Jurafsky, D. & Martin, J. (2008). Speech and language processing. Prentice Hall (2nd ed.). (Chaps: 4, 6, 22).
Manning, C., Raghavan, P. & Schutze, H. (2008). Introduction to information retrieval. Cambridge University Press. (Chaps: 12–18).