Course pages 2012–13
Word Meaning and Discourse Understanding
This module provides an introduction to NLP research centered around lexical semantics (i.e., aspects of the meaning of words and relations between word meanings), and discourse processing (i.e., the means by which larger pieces of text are structured). Relevant phenomena are described, algorithms for determination of meaning and detection of text structure are presented. The module includes applications of lexical semantics and discourse structure, including dialogue modelling and summarisation.
8 two-hour lecture sessions:
- Session 1: Background to lexical semantics and word senses
What is word meaning, and what are word senses? What does a lexicographer do? Psycholinguistic background, lexical relations, linguistic tests.
- Session 2: Word Sense Disambiguation, Coherence and Lexical Chains
Supervised and unsupervised methods of determining the sense of a word. How can a computer learn when "bass" is a fish, and when it is a musical instrument? How does a piece of text "hang together" lexically? How can we segment a piece of text according to the topics it discusses?
- Session 3: Distributional Semantics and semantic spaces.
How words can be represented "by the company they keep". The vector space model, and dimensionality reduction models. LSI, Topic models, application to Information Retrieval.
- Session 4: Verb Classes and clustering
Frame Semantics, Semantic Role Labelling. In which respect can verb meanings be similar to each other (e.g., purchase, buy, sell, lend)? How can we represent these similarities?
- Session 5: Figurative Language and Antonymy/Sentiment
What are metaphors, metonymies and similes? How can a machine recognise and interpret figurative language? What does it mean for a piece of text to display negative or positive sentiment, and how could it be automatically recognised?
- Session 6: Anaphora Resolution and Coreference Resolution
"When your child does not drink milk, try heating/distracting it" how do we know what "it" refers to in each case? How could a computer? 3 different anaphora resolution models
- Session 7: Rhetorical Relations and AI-related Discourse Processing
Knowledge-based models of discourse structure: Rhetorical Structure Theory, Grosz/Sidner's Theory
- Session 8: Scientific Discourse Structure and Summarisation (Applications 2)
Rhetorical Structure and argumentation in science. How can discourse models help in summarisation?
On completion of this module, students should:
- understand aspects of word meaning such as synonymy, similarity, word senses, and aspects of discourse structure such as co-reference, anaphora, and rhetorical relations
- understand how these phenomena relate to the rest of the field of natural language processing
- have gained intuition about these phenomena by experimentation with corpora of everyday language
- understand automatic methods for representing important aspects of word meaning and for solving lexical semantic-type ambiguities
- understand what a model of discourse structure is, which linguistic aspects it could be based on, and which algorithms could be used to recognise it
- have a deep enough understanding of the models to be able to identify further literature about them (e.g., for a research project), and subsequently reimplement them from descriptions in the research literature
- appreciate the role of lexical semantics and discourse structure for practical NLP applications
Assessment is by coursework as follows:
- One tick-assessed homework (as "practice", in week 2), to count for 20% of the final mark. If passed, the student receives 100%.
- One fully assessed homework in week 5, to count for 20% of the final mark, to be marked from 0-100 percent.
- One take-home homework, to count for 60% of the final mark, to be marked from 0-100 percent. This homework will be set in the last week of class; students will submit in the first week of the following term (i.e. students will be given 4-5 weeks to complete it).
Cruse, A. (2000), Meaning in Language. Oxford University Press. chapters 5-9, 11
Jurafsky and Martin, Speech and Language Processing, 2nd Edition (2008), chapters 19-21
Additional list of papers (up to 4 per 2-hour session), which will be made available on the web before the start of the module.
L113 Word Meaning and Discourse Understanding cannot be taken in conjunction with P31 Low Power Embedded Systems Programming in 2012-13.