Department of Computer Science and Technology

Course pages 2020–21 (these pages are still being updated)

Machine Learning for Language Processing

Principal lecturers: Dr Andreas Vlachos, Dr Ryan Cotterell
Taken by: MPhil ACS, Part III
Code: L101
Hours: 16 (16 x 1 hour lectures)
Class limit: 16 students
Prerequisites: L90 Overview of Natural Language Processing or similar, L95 Introduction to Natural Language Syntax and Parsing


This module aims to provide an introduction to machine learning with specific application to tasks such as document classification, spam email filtering, language modelling, part-of-speech tagging, and named entity and event recognition for textual information extraction. We will cover supervised, weakly-supervised and unsupervised approaches using generative and discriminative classifiers based on graphical models, including (hidden) Markov models and CRFs, and clustering / dimensionality-reduction methods, such as latent Dirichlet allocation and neural network architectures.


  • introduction to machine learning to natural language processing
  • classification
    • perceptron/online learning
    • naive Bayes, logistic regression, generative vs discriminative
    • feed forward neural networks
    • optimization fundamentals
  • structured prediction
    • language models, sequence tagging
    • locally vs globally normalized models
    • constituency parsing
    • dependency parsing
    • neural models
    • decoding strategies
  • sequence 2 sequence
    • recurrent neural networks
    • encoder-decoder
    • weighted finite state transducers
  • Applications
    • information extraction
    • dialogue agents


On completion of this module, students should:

  • understand the issues involved in applying machine learning approaches to a range of language processing applications;
  • understand the theory underlying a number of machine learning approaches that have been applied to language processing, including: graphical models, perceptrons, and neural and dimensionality-reduction methods;
  • understand some applications and specific tasks including: document topic classification and clustering, SPAM filtering, PoS tagging, named entity recognition, event extraction, and language modelling and word embeddings.


The coursework will be a class project with a report of no more than 5000 words.


  • Students will receive one tick worth 5% for attendance at lecture sessions, reading of assigned material, and satisfactory contribution during lectures.
  • Students will undertake a small project to be agreed with the lecturers and write a project report of not more than 5000 words. The report will be due at the beginning of the Lent term, will be assessed by the lecturers, and will account for 95% of the module marks.

Recommended reading

The web version of Jurafsky and Martin Speech and Language Processing:

Class limit

This module has a class limit of 16.