Computer Laboratory

Course pages 2012–13

Introduction to Natural Language Processing

Principal lecturers: Prof Ann Copestake, Dr Stephen Clark
Taken by: MPhil ACS, Part III
Code: L100
Hours: 16
Prerequisites: None


This module aims to provide a brief introduction to linguistics for computer scientists and then goes on to cover some of the core tasks in natural language processing (NLP), with the emphasis on statistical techniques suitable for the extraction of meaning from large bodies of text.


  • Linguistics for NLP – morphology, syntax, semantics and pragmatics with illustrated application mostly to English [6 lectures, AAC]
  • Finite-State/Markovian Techniques – lemmatisation, part-of-speech (PoS) tagging, phrase chunking and named entity recognition (NER) [4 lectures, SC]
  • Parsing – grammars, treebanks, representations and evaluation, statistical parse ranking [4 lectures, SC]
  • Interpretation – compositional semantics and entailment, pragmatic inference [2 lectures, AAC]


On completion of this module students should:

  • understand the basic properties of human languages and be familiar with descriptive and theoretical frameworks for handling these properties;
  • understand the design of tools for basic NLP tasks such as tagging and (partial) parsing and be able to apply them to text and evaluate their performance;
  • understand some of the basic principles of the representation of linguistic meaning and interpretative inference;

Practical work

  • Practical Week 6: (6 hours) PoS tag, chunk, and/or perform NER on a designated text with one or more provided tools. Evaluate the performance of the tools quantitatively and qualitatively.
  • Practical Week 8: (6 hours) Parse a designated set of sentences with one or more provided tools, to yield representations of their grammatical relations, phrase structure and/or logical forms. Evaluate the performance of the tools quantitatively and qualitatively.


  • There will be four ticked, short, take-home assignments on linguistic analysis during weeks 1-3. Each assignment is worth 5% of the final mark.
  • An assessed practical report based on the practicals described above. The practical report will consist of a description of the work done of not more than 5000 words. It will contribute 80% of the final mark.

The ticked assignments and practical reports will be set and marked by Professor Copestake and Dr Clark.

Recommended reading

Jurafsky, D. & Martin, J. (2008). Speech and language processing. Prentice-Hall (2nd ed.).


L100 Introduction to Natural Language Processing cannot be taken in conjunction with R02 Network Architectures in 2012-13.