Course pages 2011–12

Word Meaning and Discourse Understanding

Principal lecturer: Dr Simone Teufel
Taken by: MPhil ACS, Part III
Code: L113
Hours: 16 (8 × two-hour lecture sessions)
Prerequisites: None, but should be taken in parallel with Introduction to NLP

Aims

This module provides an introduction to NLP research centered around lexical semantics (i.e., aspects of the meaning of words and relations between word meanings), and discourse processing (i.e., the means by which larger pieces of text are structured). Relevant phenomena are described, algorithms for determination of meaning and detection of text structure are presented. The module includes applications of lexical semantics and discourse structure, including dialogue modelling and summarisation.

Syllabus

8 two-hour lecture sessions:

Session 1: Background to lexical semantics and word senses
What is word meaning, and what are word senses? What does a lexicographer do? Psycholinguistic background, lexical relations, linguistic tests.
Session 2: Word Sense Disambiguation, Coherence and Lexical Chains
Supervised and unsupervised methods of determining the sense of a word. How can a computer learn when "bass" is a fish, and when it is a musical instrument? How does a piece of text "hang together" lexically? How can we segment a piece of text according to the topics it discusses?
Session 3: Distributional Semantics and semantic spaces.
How words can be represented "by the company they keep". The vector space model, and dimensionality reduction models. LSI, Topic models, application to Information Retrieval.
Session 4: Verb Classes and clustering
Frame Semantics, Semantic Role Labelling. In which respect can verb meanings be similar to each other (e.g., purchase, buy, sell, lend)? How can we represent these similarities?
Session 5: Figurative Language and Antonymy/Sentiment
What are metaphors, metonymies and similes? How can a machine recognise and interpret figurative language? What does it mean for a piece of text to display negative or positive sentiment, and how could it be automatically recognised?
Session 6: Anaphora Resolution and Coreference Resolution
"When your child does not drink milk, try heating/distracting it" — how do we know what "it" refers to in each case? How could a computer? 3 different anaphora resolution models
Session 7: Rhetorical Relations and AI-related Discourse Processing
Knowledge-based models of discourse structure: Rhetorical Structure Theory, Grosz/Sidner's Theory
Session 8: Scientific Discourse Structure and Summarisation (Applications 2)
Rhetorical Structure and argumentation in science. How can discourse models help in summarisation?

Objectives

On completion of this module, students should:

understand aspects of word meaning such as synonymy, similarity, word senses, and aspects of discourse structure such as co-reference, anaphora, and rhetorical relations
understand how these phenomena relate to the rest of the field of natural language processing
have gained intuition about these phenomena by experimentation with corpora of everyday language
understand automatic methods for representing important aspects of word meaning and for solving lexical semantic-type ambiguities
understand what a model of discourse structure is, which linguistic aspects it could be based on, and which algorithms could be used to recognise it
have a deep enough understanding of the models to be able to identify further literature about them (e.g., for a research project), and subsequently reimplement them from descriptions in the research literature
appreciate the role of lexical semantics and discourse structure for practical NLP applications

Coursework

3 sets of ticked course work (after Session 1, 4 and 6), which may include some corpus-based work and basic programming. Exercises to be checked and ticked by Simone Teufel. Sums to 30%.

Assessment

One extended take-home test, due at the beginning of Lent term. Contributes 70% of the assessment.

Computer Laboratory