skip to primary navigationskip to content

Department of Computer Science and Technology

Part II CST

 

Course pages 2023–24

Natural Language Processing

Principal lecturer: Dr Weiwei Sun
Additional lecturer: Dr Michael Schlichtkrull
Taken by: Part II CST
Code: NLP
Term: Michaelmas
Hours: 15 (12 lectures + 3 practical classes)
Format: In-person lectures
Class limit: max. 30 students
Prerequisites: Artificial Intelligence, Data Science, Formal Models of Language, Foundations of Computer Science, Machine Learning and Real-world Data
Moodle, timetable

Aims

This course introduces the fundamental techniques of natural language processing. It aims to explain the potential and the main limitations of these techniques. Some current research issues are introduced and some current and potential applications discussed and evaluated. Students will also be introduced to practical experimentation in natural language processing.

Lectures

  • Overview. Brief history of NLP research, some current applications, components of NLP systems.
  • Morphology and Finite State Techniques. Morphology in different languages, importance of morphological analysis in NLP, finite-state techniques in NLP.
  • Part-of-Speech Tagging and Log-Linear Models. Lexical categories, word tagging, corpora and annotations, empirical evaluation.
  • Phrase Structure and Structure Prediction. Phrase structures, structured prediction, context-free grammars, weights and probabilities. Some limitations of context-free grammars.
  • Dependency Parsing. Dependency structure, grammar-free parsing, incremental processing. 
  • Gradient Descent and Neural Nets. Parameter optimisation by gradient descent. Non-linear functions with neural network layers. Log-linear model as softmax layer. Current findings of Neural NLP.
  • Word representations. Representing words with vectors, count-based and prediction-based approaches, similarity metrics.
  • Recurrent Neural Networks. Modelling sequences, parameter sharing in recurrent neural networks, neural language models, word prediction.
  • Compositional Semantics. Logical representations, compositional semantics, lambda calculus, inference and robust entailment.
  • Lexical Semantics. Semantic relations, WordNet, word senses.
  • Discourse. Discourse relations, anaphora resolution, summarization.
  • Natural Language Generation. Challenges of natural language generation (NLG), tasks in NLG, surface realisation.
  • Practical and assignments. Students will build a natural language processing system which will be trained and evaluated on supplied data. The system will be built from existing components, but students will be expected to compare approaches and some programming will be required for this. Several assignments will be set during the practicals for assessment.

Objectives

By the end of the course students should:

  • be able to discuss the current and likely future performance of several NLP applications;
  • be able to describe briefly a fundamental technique for processing language for several subtasks, such as morphological processing, parsing, word sense disambiguation etc.;
  • understand how these techniques draw on and relate to other areas of computer science.

Recommended reading

Jurafsky, D. & Martin, J. (2023). Speech and language processing. Prentice Hall (3rd ed. draft, online).

Assessment - Part II Students

  • Assignment 1 - 10% of marks
  • Assignment 2 - 25% of marks
  • Assignment 3 - 65% of marks