Computer Laboratory

Course pages 2012–13

Statistical Machine Translation

Principal lecturers: Dr Stephen Clark, Dr Bill Byrne
Taken by: MPhil ACS, Part III
Code: L102
Hours: 16 (10 lectures + extended practical session covering 6 sessions)
Prerequisites: L100 Introduction to Natural Language Processing


This module provides an in-depth introduction to statistical machine translation (MT), the dominant approach to providing large-scale, robust translation applicable to many language pairs (and the approach currently used by Google).


  • Overview: [2 lectures]: Translation as an economic, political, and cultural activity. Machine translation as a problem in natural language processing. Syntax and morphology in translation. Translation memories; example and rule-based based MT. Interlingua.
  • Alignment: automatic translations in text [2 lectures]: Parallel texts and their role in building translation systems and measuring translation quality. Document and sentence alignment: models and algorithms. Word and phrase alignment: models and algorithms. Techniques for automatic measurement of alignment quality. Webcrawling for parallel text.
  • Weighted finite state transducers: algorithms for natural language processing and MT [2 lectures]
  • SMT systems [4 lectures]: Extraction of translation rules from parallel text. Phrase-based, Hiero, syntax-based MT. Techniques for automatic measurement of translation quality. Minimum error rate training. Language models for SMT: simple back-off, MapReduce. MT system combination. Practical issues in SMT: true casing; source text pre-processing; handling morphology; system building procedure.

All lectures will be given by Dr Clark or Dr Byrne.


On completion of this module, students should understand:

  • the role of parallel text in MT;
  • how alignment models can be estimated from parallel text;
  • how alignment models capture divergent language properties such as word order;
  • the use of WFSTs in translation and some other basic NLP tasks;
  • the extraction of translation rules from parallel text;
  • various phrase-based translation architectures, including Hiero;
  • parameter optimization procedures for SMT;
  • the role of language models in SMT;
  • the evaluation of SMT systems using automatic metrics;
  • system combination techniques for SMT.

Practical work

There will be two substantial practical exercises associated with this module.

  • Practical 1: 2 sessions. Parallel text, alignment models and WFSTs.
  • Practical 2: 4 sessions. SMT system construction and evaluation.


  • Written report covering the practical worth 35% of the marks.
  • One final take-home test covering all the material. Final take-home test will contribute 65% to the final mark. Questions set and marked by Dr Clark and Dr Byrne.

Recommended reading

Jurafsky, D. & Martin, J. (2008). Speech and language processing. Prentice Hall (2nd ed.). Chapter 25 on Machine Translation.
Brown, P.F., Della Pietra, V.J. & Della Pietra, S.A. (1993). The mathematics of statistical machine translation: parameter estimation. Computational linguistics.
Chiang, D. (2007). Hierarchical phrase-based translation. Computational linguistics, 33(2):201–228.