Computer Laboratory

Course pages 2015–16

Discourse Processing

Reading List

Session 1: Introduction and Overview

Core Reading:

  • Jurafsky and Martin (2008), Speech and Language Processing, 2nd Edition, chapter 21.

Session 2: Topic Segmentation

Core reading:

Deep reading:

Session 3: Anaphora Resolution

Core Reading:

Deep Reading:

Session 4: Centering and Entity Coherence

Core Reading:

Deep Reading:

Session 5: Rhetorical Structure Theory

Core Reading:

Deep Reading:

Session 6: Annotation Methodology

Core Reading:

Deep Reading:

Session 7: Argumentative Zoning

Core Reading:

Deep Reading:

Session 8: Summarisation and Discourse Structure

Core Reading:

Deep Reading:

Annotation Task

Consider the following two texts:

Please do the following tasks with these texts:

WEEK 2 : Segment text into topics; give each segment a name (its topic)

Your annotations:

Text 1:

Text 2:

WEEK 3 : Perform Co-reference resolution on one of the texts; for each entity in the text (generally each noun phrase, with some non-NPs slipping in), decide whether it co-refers to another entity in the text. Express this equivalence-class relationship in a way you find intuitive, e.g., by giving classes of co-referring entities a number.

Your annotations:

Text 1:

Text 2:

WEEK 4 : We now want to perform Centering Theory processing on your annotations from week 3. What is forward looking center? Backward looking center? Which entities are repeated at all across a boundary between two sentences?

Your annotations:

Text 1 (Fire):

Text 2 (Cars):

WEEK 5 : Perform an RST analysis of several sentences of your chosen text (preferrably from the top, so we get a lot of shared annotations). Using the RST annotation tool suggested, (or doing it on paper and scanning the text), please perform the following tasks involved in any RST analysis:

  • Segment the text into EDUs (elementary discourse units), i.e., smallest units which perform a rhetorical function (typically one clause, but could sometimes be an NP or other part of the sentence). Deciding what an EDU is can be the hardest part of an RST analysis.
  • Determine the relationship of that EDU to its surrounding context, either other EDUs or parts of the RST tree you are currently building.
  • Connect every level of the RST tree up so that you entirely cover the text.
  • Note that RST tree building corresponds to hierarchical clustering of a text.

Your annotations:

Text 1 (Fire):

Text 2 (Cars):

WEEK 7 : Please perform an AZ analysis of the following text:

9912016.A.xml.

You should acquaint yourself with the 7 AZ categories by reading the relevant parts of Teufel/Moens (2002), and then assign each sentence one of the seven labels. A good way to represent your annotation is by simply listing the sentence-id with your chosen label on one line in a file. It is not necessary to do the entire text, but please all start from the top of this file so that we have enough overlapping text. Please send your annotations to me as soon as you can, so that we will have at least three annotations that others can perform kappa (Fleiss or Cohen, as you like), alpha (your chosen distance metrics) calculations on. Show intermediate steps in your calculation, i.e., don't just feed them through an existing implementation.

Your annotations: