Computer Laboratory

Course pages 2016–17 (still under preparation!)

Overview of Natural Language Processing

Principal lecturers: Dr Ekaterina Shutova, Dr Simone Teufel
Taken by: MPhil ACS, Part III
Code: L90
Hours: 16 (12 lectures and 2 x 2 hour practical sessions)
Prerequisites: Courses such as CST Part II Regular Languages and Finite Automata, Probability, Logic and Proof, and Artificial Intelligence

Aims

This course introduces the fundamental techniques of natural language processing. It aims to explain the potential and the main limitations of these techniques. Some current research issues are introduced and some current and potential applications discussed and evaluated. Students will also be introduced to practical experimentation in natural language processing.

Syllabus

  • Introduction. Brief history of NLP research, current applications, components of NLP systems.
  • Finite-state techniques. Inflectional and derivational morphology, finite-state automata in NLP, finite-state transducers.
  • Prediction and part-of-speech tagging. Corpora, simple N-grams, word prediction, stochastic tagging, evaluating system performance.
  • Context-free grammars and parsing. Generative grammar, context-free grammars, parsing with context-free grammars, weights and probabilities. Limitations of context-free grammars. Dependencies.
  • Lexical semantics. Semantic relations, WordNet, word senses, word sense disambiguation.
  • Distributional semantics 1. Representing lexical meaning with distributions. Similarity metrics.
  • Distributional semantics 2. Generalisation and clustering. Selectional preference induction. Multimodal semantics.
  • Compositional semantics. Compositional semantics with FOPL and lambda calculus. Compositional distributional semantics. Inference and robust entailment.
  • Discourse processing. Anaphora resolution, discourse relations.
  • Language generation and regeneration. Components of a generation system. Summarisation.
  • Applications. Examples of practical applications of NLP techniques.
  • Recent trends in NLP research. Recent trends in NLP research.
  • Practical on sentiment analysis. Students will build a sentiment analysis system which will be trained and evaluated on supplied data. The system will be built from existing components, but students will be expected to compare approaches and to implement their own feature extraction to support this.

Objectives

On completion of this module, students should:

  • be able to discuss the current and likely future performance of several NLP applications;
  • be able to describe briefly a fundamental technique for processing language for several subtasks, such as morphological processing, parsing, word sense disambiguation etc.;
  • understand how these techniques draw on and relate to other areas of computer science;
  • understand the basic principles of designing and running an NLP experiment.

Coursework

Write a 4,000-word report including results from a sentiment analysis experiment based on the practical component of the course.

Practical work

Build and evaluate a sentiment analysis system.

Assessment

Assessment will be based on the 4,000-word practical report.

Recommended reading

Jurafsky, D. and Martin, J. (2008). Speech and language processing. Prentice Hall (specific chapter references will be provided in the lecture notes).

Although the lectures don't assume any exposure to linguistics, the course will be easier to follow if students have some understanding of basic linguistic concepts. The following may be useful for this: The Internet Grammar of English