Course pages 2016–17
Discourse Processing
Reading List
Session 1: Introduction and Overview
Core Reading:
- Stephen Pinker, The Sense of Style (2014). Chapter 5: Arcs of Coherence.
- Jurafsky and Martin (2008), Speech and Language Processing, 2nd Edition, chapter 21.
Session 2: Topic Segmentation
Core reading:
- Discourse Segmentation of Multi-Party Conversation by Michael Galley, Kathleen McKeown, Eric Fosler-Lussier and Hongyan Jing
- TextTiling: Segmenting text into multi-paragraph subtopic passages. Hearst (1997), Computational Linguistics
Deep reading:
- Text Segmentation with Topic Models by Martin Riedl and Chris Biermann, JLCL, 2012.
- Steyvers, Griffiths (2007): Probabilistic Topic Models (2007); in: Handbook of Latent Semantic Analysis.
Session 3: Anaphora Resolution
Core Reading:
- A statistical approach to anaphora resolution, Ge et al (1998), WS on very large corpora
- Exploring Lexicalized Features for Coreference Resolution, Bjorkelund, Nugues (2011). CONLL.
Deep Reading:
- Stanford's Multi-Pass Sieve, CoNLL-2011.
- An Algorithm for Pronominal Anaphora, Lappin, Leass (1994), Computational Linguistics
Session 4: Centering and Entity Coherence
Core Reading:
- Modeling local coherence: An entity-based approach. R Barzilay, M Lapata, Computational Linguistics. 2008.
- Centering: A framework for modeling the local coherence of discourse. BJ Grosz, S Weinstein, AK Joshi, Computational linguistics. 1995.
Deep Reading:
- Supplementing Entity Coherence with Local Rhetorical Relations for Information Ordering. Nikiforos Karamanis. Journal of Logic Language and Information, 2007.
- Using entity-based features to model coherence in student essays. Burstein et al. (2010), HLT-10.
Session 5: Rhetorical Structure Theory
Core Reading:
- Rhetorical structure theory: Toward a functional theory of text organization. WC Mann, SA Thompson (1987). Interdisciplinary Journal for the Study of Discourse.
- HILDA: a discourse parser using support vector machine classification. H Hernault, H Prendinger (2010), Dialogue and Discourse.
Deep Reading:
- The rhetorical parsing of natural language texts. Daniel Marcu. ACL-98.
- Disambiguating Rhetorical Structure. Manfred Stede. Res. Lang. Comput (2008).
Session 6: Annotation Methodology
Core Reading:
- Assessing Agreement on classification tasks: the kappa statisticCarletta (1996), Computational Linguistics
- Chapter 8.1, The Structure of Scientific Articles, Teufel (2010), CSLI publications.
Deep Reading:
- A critique and improvement of an evaluation metric for text segmentation Pevzner, Hearst, Computational Linguistics, 2002
Session 7: Argumentative Zoning
Core Reading:
- Summarizing scientific articles: experiments with relevance and rhetorical status. S Teufel, M Moens (2002), Computational linguistics.
Deep Reading:
- Towards discipline-independent argumentative zoning: evidence from chemistry and computational linguistics. S Teufel, A Siddharthan, C Batchelor. EMNLP-2009
- Citation Block Determination using Textual Coherence, Kaplan, Tokunaga, Teufel. Journal of Information Processing, 2016.
Session 8: Summarisation and Discourse Structure
Core Reading:
- Toward a model of text comprehension and production. W Kintsch, TA Van Dijk. Psychological review, 1978
- A summariser based on human memory limitations and lexical competition. Y Fang, S Teufel (2014), EACL.
Deep Reading:
- Causal reasoning in the comprehension of simple narrative texts. CR Fletcher, CP Bloom. 1988.
Annotation Task
Consider the following two texts:
- Text 1: A splint, a spark
- Text 2: The motor car
Please do the following tasks with these texts:
WEEK 2 : Segment text into topics; give each segment a name (its topic). Please send me your analysis by email, and I will put them up on this website.
WEEK 3 : Perform anaphora resolution on text 2. Steps: a) identify the anaphor. b) Indicate which other referring expression it corefers with.
Out of all possible anaphors, please concentrate only on two types: a) pronouns b) definite noun phrases (noun phrases starting with "the")
WEEK 4 : Simulate the Centering algorithm on the first six consecutive sentences of the "invention of matches" text. You can use a "standard" definition of anaphora resolution: resolve all prounouns and definite noun phrases. Then contruct forward-looking and backward-looking centers for each sentence, and decide which type of shift was performed.
WEEK 5: Provide an RST analysis of the first six consecutive sentences of "motor car". (maybe easiest if you write on paper and scan it; alternately, here are instructions for how to run an RST-tree drawing tool that previous years' students used).
WEEK 8: Perform an Kintsch and van Dijk style proposition tree manipulation for the first five sentences of "invention of fire lighting". To do so, you first have to create propositions. Please use your own best guess of what a proposition could look like, using Kintsch and van Dijk's propositions in the article as your guide. You then have to draw trees representing which argument overlap is the best fit for each proposition. As there is no title, you can choose a usable proposition from the first sentence at random (verbal propositions often work best). You perform this sentence-by sentence, working propositions off in the order you created them. When all propositions for one sentence are worked off, you then apply the Leading-Edge forgetting rule and move onto the next sentence. The "summary" of that text are the propositions that were remembered in most memory cycles.
Your annotations:
Week 2 (Topic Segmentation):
- pjc211;
- lfmh2; Text 1 and lfmh2; Text 2 ;
Week 3 (Pronoun Resolution):
Week 4 (Centering):
Week 5 (RST):