Course pages 2016–17

Discourse Processing

Reading List

Session 1: Introduction and Overview

Core Reading:

Stephen Pinker, The Sense of Style (2014). Chapter 5: Arcs of Coherence.
Jurafsky and Martin (2008), Speech and Language Processing, 2nd Edition, chapter 21.

Session 2: Topic Segmentation

Core reading:

Discourse Segmentation of Multi-Party Conversation by Michael Galley, Kathleen McKeown, Eric Fosler-Lussier and Hongyan Jing
TextTiling: Segmenting text into multi-paragraph subtopic passages. Hearst (1997), Computational Linguistics

Deep reading:

Text Segmentation with Topic Models by Martin Riedl and Chris Biermann, JLCL, 2012.
Steyvers, Griffiths (2007): Probabilistic Topic Models (2007); in: Handbook of Latent Semantic Analysis.

Session 3: Anaphora Resolution

Core Reading:

A statistical approach to anaphora resolution, Ge et al (1998), WS on very large corpora
Exploring Lexicalized Features for Coreference Resolution, Bjorkelund, Nugues (2011). CONLL.

Deep Reading:

Stanford's Multi-Pass Sieve, CoNLL-2011.
An Algorithm for Pronominal Anaphora, Lappin, Leass (1994), Computational Linguistics

Session 4: Centering and Entity Coherence

Core Reading:

Modeling local coherence: An entity-based approach. R Barzilay, M Lapata, Computational Linguistics. 2008.
Centering: A framework for modeling the local coherence of discourse. BJ Grosz, S Weinstein, AK Joshi, Computational linguistics. 1995.

Deep Reading:

Supplementing Entity Coherence with Local Rhetorical Relations for Information Ordering. Nikiforos Karamanis. Journal of Logic Language and Information, 2007.
Using entity-based features to model coherence in student essays. Burstein et al. (2010), HLT-10.

Session 5: Rhetorical Structure Theory

Core Reading:

Rhetorical structure theory: Toward a functional theory of text organization. WC Mann, SA Thompson (1987). Interdisciplinary Journal for the Study of Discourse.
HILDA: a discourse parser using support vector machine classification. H Hernault, H Prendinger (2010), Dialogue and Discourse.

Deep Reading:

The rhetorical parsing of natural language texts. Daniel Marcu. ACL-98.
Disambiguating Rhetorical Structure. Manfred Stede. Res. Lang. Comput (2008).

Session 6: Annotation Methodology

Core Reading:

Assessing Agreement on classification tasks: the kappa statisticCarletta (1996), Computational Linguistics
Chapter 8.1, The Structure of Scientific Articles, Teufel (2010), CSLI publications.

Deep Reading:

A critique and improvement of an evaluation metric for text segmentation Pevzner, Hearst, Computational Linguistics, 2002

Session 7: Argumentative Zoning

Core Reading:

Summarizing scientific articles: experiments with relevance and rhetorical status. S Teufel, M Moens (2002), Computational linguistics.

Deep Reading:

Towards discipline-independent argumentative zoning: evidence from chemistry and computational linguistics. S Teufel, A Siddharthan, C Batchelor. EMNLP-2009
Citation Block Determination using Textual Coherence, Kaplan, Tokunaga, Teufel. Journal of Information Processing, 2016.

Session 8: Summarisation and Discourse Structure

Core Reading:

Toward a model of text comprehension and production. W Kintsch, TA Van Dijk. Psychological review, 1978
A summariser based on human memory limitations and lexical competition. Y Fang, S Teufel (2014), EACL.

Deep Reading:

Causal reasoning in the comprehension of simple narrative texts. CR Fletcher, CP Bloom. 1988.

Annotation Task

Consider the following two texts:

Text 1: A splint, a spark
Text 2: The motor car

Please do the following tasks with these texts:

WEEK 2 : Segment text into topics; give each segment a name (its topic). Please send me your analysis by email, and I will put them up on this website.

WEEK 3 : Perform anaphora resolution on text 2. Steps: a) identify the anaphor. b) Indicate which other referring expression it corefers with.

Out of all possible anaphors, please concentrate only on two types: a) pronouns b) definite noun phrases (noun phrases starting with "the")

WEEK 4 : Simulate the Centering algorithm on the first six consecutive sentences of the "invention of matches" text. You can use a "standard" definition of anaphora resolution: resolve all prounouns and definite noun phrases. Then contruct forward-looking and backward-looking centers for each sentence, and decide which type of shift was performed.

WEEK 5: Provide an RST analysis of the first six consecutive sentences of "motor car". (maybe easiest if you write on paper and scan it; alternately, here are instructions for how to run an RST-tree drawing tool that previous years' students used).

WEEK 8: Perform an Kintsch and van Dijk style proposition tree manipulation for the first five sentences of "invention of fire lighting". To do so, you first have to create propositions. Please use your own best guess of what a proposition could look like, using Kintsch and van Dijk's propositions in the article as your guide. You then have to draw trees representing which argument overlap is the best fit for each proposition. As there is no title, you can choose a usable proposition from the first sentence at random (verbal propositions often work best). You perform this sentence-by sentence, working propositions off in the order you created them. When all propositions for one sentence are worked off, you then apply the Leading-Edge forgetting rule and move onto the next sentence. The "summary" of that text are the propositions that were remembered in most memory cycles.

Your annotations:

Week 2 (Topic Segmentation):

pjc211;
lfmh2; Text 1 and lfmh2; Text 2 ;

Week 3 (Pronoun Resolution):

Week 4 (Centering):

Week 5 (RST):

pjc211;
astf2;
lfmh2 (and here's the segmentation);
cjmv2

Computer Laboratory

Discourse Processing

Reading List

Annotation Task