Computer Laboratory

Course pages 2013–14

Discourse and Text Summarisation

Principal lecturer: Dr Simone Teufel
Taken by: MPhil ACS, Part III
Code: L115
Hours: 16 (8 × two-hour lecture sessions)
Prerequisites: L100 Introduction to Natural Language Processing

Aims

This module provides an introduction to text summarisation research, and in particular, discourse-oriented text summarisation. There are four lectures on the topic, after which the students will implement and evaluate their own discourse-based text summarisation system. This will be done in four practical sessions, which are demonstrated by students. Discourse processing methods necessary for the summarisation system are introduced. Special attention is given to the various types of evaluation.

Syllabus

Taught component:

  • Session 1: Overview of summarisation techniques. Summarisation evaluation.
  • Session 2: Lexical Coherence. Topic Segmentation. Information Status. Grosz/Sidner.
  • Session 3: Anaphora Resolution and Coreference Resolution. Entity-based coherence. Centering. "When your child does not drink milk, try heating/distracting it" — how do we know what "it" refers to in each case? How could a computer?
  • Session 4: Rhetorical Relations. Dialogue act determination. Plot Units. Kintsch/van Dijk

Practical Component:

  • Session 5: Build an extractive summariser based on frequency statistics.
  • Session 6: Build a simple trained anaphora resolution module.
  • Session 7: Include a recogniser for expletive anaphora. Evaluate.
  • Session 8: Combine previous implementations into an anaphora-based summariser. There are several options that students can choose from. Run the final evaluation.

Objectives

On completion of this module, students should:

  • understand the phenomenon of anaphora resolution and coreference and be able to implement a simple solution
  • be able to describe the ideas behind various methods of text summarisation that rely on continuity between objects or events (discourse-based)
  • understand how text summarisers can be evaluated and have some practical experience in doing so;
  • have implemented a prototype of a text summariser applying some of the ideas in this course.

Coursework

none

Practical work

Implement a simple anaphora-based summariser, in four demonstrated sessions.

Assessment

An assessed report of the implementation and evalution of their own anaphora-based summariser.

Recommended reading

  • Jurafsky and Martin (2008), Speech and Language Processing, 2nd Edition, chapter 21.
  • Mani, I. (2001), Automatic Summarization, John Benjamins Publishing.