Computer Laboratory

Course pages 2015–16

Discourse Processing

Principal lecturer: Dr Simone Teufel
Taken by: MPhil ACS, Part III
Code: R216
Hours: 16
Prerequisites: L90 Overview of Natural Language Processing and L95 Introduction to Natural Language

Aims

This module provides an introduction to those language-related phenomena that happen above the sentence level, i.e. that deal with how text "holds together". It looks at current NLP approaches which perform recognition of such phenomena automatically. Special attention is given to the various types of evaluation. Students read papers assigned for each session, manually try out discourse annotation, present one topic of their choice, and are assessed on the basis of a final essay.

Syllabus

The sessions will be seminars rather than lectures: students will be expected to do the assigned reading before the session and come prepared to discuss the material. Most sessions will consist of some general introduction followed by an in-depth examination of some particular piece of research described in one or more papers. Some of these examinations will be given by students.

Seminar topics:

  • Lexical Coherence. Concept and applications. Topic Segmentation. Lexical Chains.

  • Anaphora Resolution. "When your child does not want to drink its milk, try heating/distracting it" -- How do we know what "it" refers to in each case? How could a computer?

  • Discourse Coherence and Entity-based Coherence. Information Structure. What is "known" and "new" in a text? Linguistic realisation. Centering. Entity-based Coherence.

  • Rhetorical Relations. Dialogue act determination. RST.

  • Annotation Methodology. Human Subjectivity. Experimental Design. Agreement metrics suitable for segmentation, classification, degree judgements.

  • Discourse-structure in Scientific writing. Argumentative Zoning -- analysis of scientific text. Connection with citation analysis. Citation block determination.

  • Deeper Discourse Structure. Plot Units. Human-oriented Memory retention. Lehnert; Kintsch/van Dijk. Summarisation and discourse structure.

Objectives

On completion of this module, students should:

  • understand the phenomenon of anaphora resolution and coreference and several current NLP approaches to the task
  • describe lexically based concepts of coherence, and their implementations, eg. in order to perform topic segmentation
  • be able to describe the ideas behind various deeper discourse analyses that rely on continuity between objects or events, or on rhetorical relations
  • be able to manually analyse a text according to the discourse methods introduced in this course, and to statistically measure their agreement with fellow students
  • understand which applications can use the output of a discourse processor to improve system results

Coursework

  • Reading about 2-3 papers per week
  • 3-4 short pieces of annotation work (plus voluntary agreement study for some of these)
  • One 20 minute presentation (mandatory, but not marked)

Assessment

An assessed essay (4000 words) about a topic to be negotiated with the lecturer (typically related to the topic chosen for the presentation). Each student will work on a different topic.

Recommended reading

This is a literature-based seminar, so the course materials give detailed reading for each session.

A short (but not full-coverage) overview of the topics treated here can be found in the following chapter in a standard textbook on NLP:

  • Jurafsky and Martin (2008), Speech and Language Processing, 2nd Edition, chapter 21.