skip to primary navigationskip to content

Department of Computer Science and Technology

Masters

 

Course pages 2022–23

Overview of Natural Language Processing

Principal lecturers: Prof Simone Teufel, Dr Andrew Caines, Dr Weiwei Sun
Taken by: MPhil ACS, Part III
Code: L90
Term: Michaelmas
Hours: 18 (12 lectures and 3 x 2 hour practical sessions)
Format: In-person lectures
Class limit: max. 15 students
Prerequisites: No prerequisites beyond those topics covered in an undergraduate CS degree. This course is a prerequisite for L95: Introduction to Natural Language Syntax and Parsing if you haven't already done a NLP course
This course is a prerequisite for: Introduction to Natural Language Syntax and Parsing
Moodle, timetable

Aims

This course introduces the fundamental techniques of natural language processing. It aims to explain the potential and the main limitations of these techniques. Some current research issues are introduced and some current and potential applications discussed and evaluated. Students will also be introduced to practical experimentation in natural language processing.

Lectures

  • Overview. Brief history of NLP research, some current applications, components of NLP systems.
  • Morphology and Finite State Techniques. Morphology in different languages, importance of morphological analysis in NLP, finite-state techniques in NLP.
  • Part-of-Speech Tagging and Log-Linear Models. Lexical categories, word tagging, corpora and annotations, empirical evaluation.
  • Phrase Structure and Structure Prediction. Phrase structures, structured prediction, context-free grammars, weights and probabilities. Some limitations of context-free grammars.
  • Dependency Parsing. Dependency structure, grammar-free parsing, incremental processing. 
  • Gradient Descent and Neural Nets. Parameter optimisation by gradient descent. Non-linear functions with neural network layers. Log-linear model as softmax layer. Current findings of Neural NLP.
  • Word representations. Representing words with vectors, count-based and prediction-based approaches, similarity metrics.
  • Recurrent Neural Networks. Modelling sequences, parameter sharing in recurrent neural networks, neural language models, word prediction.
  • Compositional Semantics. Logical representations, compositional semantics, lambda calculus, inference and robust entailment.
  • Lexical Semantics. Semantic relations, WordNet, word senses.
  • Discourse. Discourse relations, anaphora resolution, summarization.
  • Natural Language Generation. Challenges of natural language generation (NLG), tasks in NLG, surface realisation.
  • Practical and assignments. Students will build a natural language processing system which will be trained and evaluated on supplied data. The system will be built from existing components, but students will be expected to compare approaches and some programming will be required for this. Several assignments will be set during the practicals for assessment.

Objectives

By the end of the course students should:

  • be able to discuss the current and likely future performance of several NLP applications;
  • be able to describe briefly a fundamental technique for processing language for several subtasks, such as morphological processing, parsing, word sense disambiguation etc.;
  • understand how these techniques draw on and relate to other areas of computer science.

Assessment - Part II Students

  • Assignment 1 - 10% of marks
  • Assignment 2 - 25% of marks
  • Assignment 3 - 65% of marks

Recommended reading

* Jurafsky, D. and Martin, J. (2008) Speech and language processing. Prentice Hall.

Coursework

Submit work for 3 assignments as part of practical sessions on information extraction.

These include an annotation exercise, a feature-based classifier along with code repository and documentation, and a 4,000-word report on results and analysis from an extended information extraction experiment.

Practical work

Students will build a natural language processing system which will be trained and evaluated on supplied data. The system will be built from existing components, but students will be expected to compare approaches and some programming will be required for this.

Assessment - Part III and MPhil Students

Assessment will be based on the practicals:

  • First practical exercise (10%, ticked)
  • Second practical exercise (25%, code repository or notebook with documentation)
  • Final report (65%, 4,000 words, excluding references)

Further Information

Although the lectures don't assume any exposure to linguistics, the course will be easier to follow if students have some understanding of basic linguistic concepts. The following may be useful for this: The Internet Grammar of English

Due to infectious respiratory diseases, the method of teaching for this module may be adjusted to cater for physical distancing and students who are working remotely. Unless otherwise advised, this module will be taught in person.

  • Current Cambridge undergraduate students who are continuing onto Part III or the MPhil in Advanced Computer Science may only take this module if they did NOT take it as a Unit of Assessment in Part II.
  • The class limit is 15 MPhil / Part III students with the practical assessed by the Departent of Computer Science and Technology.
  • Students from other departments may attend the lectures for this module if space allows. However students wanting to take it for credit will need to make arrangements for assessment within their own department.

This module is shared with Part II of the Computer Science Tripos. Assessment will be adjusted for the two groups of students to be at an appropriate level for whichever course the student is enrolled on. Further information about assessment and practicals will follow at the first lecture.