Computer Laboratory

Course pages 2014–15

Discourse Processing

Principal lecturer: Dr Simone Teufel
Taken by: MPhil ACS, Part III
Code: R216
Hours: 16
Prerequisites: L90 Overview of Natural Language Processing and L95 Introduction to Natural Language

Aims

This module provides an introduction to those language-related phenomena that happen above the sentence level, i.e. that deal with how text "holds together". It looks at current NLP approaches which perform recognition of such phenomena automatically. Special attention is given to the various types of evaluation. Students read papers assigned for each session, manually try out discourse annotation, present on one topic of their choice, and are assessed on the basis of a final essay.

Syllabus

The sessions will be seminars rather than lectures: students will be expected to do the assigned reading before the session and come prepared to discuss the material. Most sessions will consist of some general introduction followed by an in-depth examination of some particular piece of research described in one or more papers. Some of these examinations will be given by students.

Seminar Sessions:

  • Session 1: Lexical Coherence. Concept and applications. Topic Segmentation. Lexical Chains.

  • Session 2: Anaphora Resolution. "When your child does not want to drink its milk, try heating/distracting it" -- How do we know what "it" refers to in each case? How could a computer?

  • Session 3: Annotation Methodology. Human Subjectivity. Experimental Design. Agreement metrics suitable for segmentation, classification, degree judgements.

  • Session 4: Discourse Coherence and Entity-based Coherence. Grosz/Sidner. Centering. Entity-based Coherence.

  • Session 5: Information Structure. What is "known" and "new" in a text? Why does it matter? Linguistic realisation. Data-driven methods.

  • Session 6: Rhetorical Relations. Dialogue act determination. RST.

  • Session 7: Deeper Discourse Structure. Plot Units. Human-oriented Memory retention. Lehnert; Kintsch/van Dijk

  • Session 8: Discourse-based Applications. Various downstream applications that can profit from discourse analyses, for instance summarisation. Argumentative Zoning -- analysis of scientific text. Legal text analysis.

Objectives

On completion of this module, students should:

  • understand the phenomenon of anaphora resolution and coreference and several current NLP approaches to the task
  • describe lexically based concepts of coherence, and their implementations, eg. in order to perform topic segmentation
  • be able to describe the ideas behind various deeper discourse analyses that rely on continuity between objects or events, or on rhetorical relations
  • be able to manually analyse a text according to the discourse methods introduced in this course, and to statistically measure their agreement with fellow students
  • understand which applications can use the output of a discourse processor to improve system results

Coursework

  • Reading about 2-3 papers per week
  • 3-4 short pieces of annotation work (plus voluntary agreement study for some of these)
  • One 20 minute presentation (mandatory, but not marked)

Assessment

An assessed essay (4000 words) about a topic to be negotiated with the lecturer (typically related to the topic chosen for the presentation). Each student will work on a different topic.

Recommended reading

A reading list for each session will be published closer to the start of this module, but interested students can gain an overview of the topics treated here by reading the following chapter in a standard textbook on NLP.

  • Jurafsky and Martin (2008), Speech and Language Processing, 2nd Edition, chapter 21.