Department of Computer Science and Technology

Course pages 2019–20

Subsections


Unit: Natural Language Processing

This course is only taken by Part II 75% students.

Lecturers: Professor S. Teufel and Dr P. Buttery

No. of lectures and practical classes: 12+3

Prerequisite courses: Machine Learning and Real-World Data, Formal Models of Language, Foundations of Data Science, Artificial Intelligence.

Capacity: 30

Aims

This course introduces the fundamental techniques of natural language processing. It aims to explain the potential and the main limitations of these techniques. Some current research issues are introduced and some current and potential applications discussed and evaluated. Students will also be introduced to practical experimentation in natural language processing.

Lectures

  • Introduction. Brief history of NLP research, some current applications, components of NLP systems.

  • Finite-state techniques. Inflectional and derivational morphology, finite-state automata in NLP, finite-state transducers.

  • Prediction and part-of-speech tagging. Corpora, simple N-grams, word prediction, stochastic tagging, evaluating system performance.

  • Context-free grammars and parsing. Generative grammar, context-free grammars, parsing with context-free grammars, weights and probabilities. Some limitations of context-free grammars.

  • Dependency structures. English as an outlier. Universal dependencies. Introduction to dependency parsing.

  • Compositional semantics. Logical representations. Compositional semantics and lambda calculus. Inference and robust entailment. Negation.

  • Lexical semantics. Semantic relations, WordNet, word senses.

  • Distributional semantics. Representing lexical meaning with distributions. Similarity metrics.

  • Distributional semantics and deep learning. Embeddings. Grounding. Multimodal systems and visual question answering.

  • Discourse processing. Anaphora resolution, summarization.

  • Language generation and regeneration. Generation and regeneration. Components of a generation system. Generation of referring expressions.

  • Recent NLP research. Some recent NLP research.

  • Practical on sentiment analysis. Students will build a sentiment analysis system which will be trained and evaluated on supplied data. The system will be built from existing components, but students will be expected to compare approaches and some programming will be required for this.

Objectives

By the end of the course students should:

  • be able to discuss the current and likely future performance of several NLP applications;

  • be able to describe briefly a fundamental technique for processing language for several subtasks, such as morphological processing, parsing, word sense disambiguation etc.;

  • understand how these techniques draw on and relate to other areas of computer science.

Recommended reading

* Jurafsky, D. & Martin, J. (2008) Speech and language processing. Prentice Hall.