Course pages 2015–16
Discourse Processing
Reading List
Session 1: Introduction and Overview
Core Reading:
- Jurafsky and Martin (2008), Speech and Language Processing, 2nd Edition, chapter 21.
Session 2: Topic Segmentation
Core reading:
- Text Segmentation with Topic Models by Martin Riedl and Chris Biermann, JLCL, 2012.
Deep reading:
- TextTiling: Segmenting text into multi-paragraph subtopic passages. Hearst (1997), Computational Linguistics
- Steyvers, Griffiths (2007): Probabilistic Topic Models (2007); in: Handbook of Latent Semantic Analysis.
Session 3: Anaphora Resolution
Core Reading:
- A statistical approach to anaphora resolution, Ge et al (1998), WS on very large corpora
- Exploring Lexicalized Features for Coreference Resolution, Bjorkelund, Nugues (2011). CONLL.
Deep Reading:
- Stanford's Multi-Pass Sieve, CoNLL-2011.
- An Algorithm for Pronominal Anaphora, Lappin, Leass (1994), Computational Linguistics
Session 4: Centering and Entity Coherence
Core Reading:
- Modeling local coherence: An entity-based approach. R Barzilay, M Lapata, Computational Linguistics. 2008.
- Centering: A framework for modeling the local coherence of discourse. BJ Grosz, S Weinstein, AK Joshi, Computational linguistics. 1995.
Deep Reading:
- Supplementing Entity Coherence with Local Rhetorical Relations for Information Ordering. Nikiforos Karamanis. Journal of Logic Language and Information, 2007.
- Using entity-based features to model coherence in student essays. Burstein et al. (2010), HLT-10.
Session 5: Rhetorical Structure Theory
Core Reading:
- Rhetorical structure theory: Toward a functional theory of text organization. WC Mann, SA Thompson (1987). Interdisciplinary Journal for the Study of Discourse.
- HILDA: a discourse parser using support vector machine classification. H Hernault, H Prendinger (2010), Dialogue and Discourse.
Deep Reading:
- The rhetorical parsing of natural language texts. Daniel Marcu. ACL-98.
- Disambiguating Rhetorical Structure. Manfred Stede. Res. Lang. Comput (2008).
Session 6: Annotation Methodology
Core Reading:
- Assessing Agreement on classification tasks: the kappa statisticCarletta (1996), Computational Linguistics
- Chapter 8.1, The Structure of Scientific Articles, Teufel (2010), CSLI publications.
Deep Reading:
- A critique and improvement of an evaluation metric for text segmentation Pevzner, Hearst, Computational Linguistics, 2002
- Inter-coder agreement for computational linguistics, R Artstein, M Poesio, Computational Linguistics, 2008
Session 7: Argumentative Zoning
Core Reading:
- Summarizing scientific articles: experiments with relevance and rhetorical status. S Teufel, M Moens (2002), Computational linguistics.
Deep Reading:
- Towards discipline-independent argumentative zoning: evidence from chemistry and computational linguistics. S Teufel, A Siddharthan, C Batchelor. EMNLP-2009
- Citation Block Determination using Textual Coherence, Kaplan, Tokunaga, Teufel. Journal of Information Processing, 2016.
Session 8: Summarisation and Discourse Structure
Core Reading:
- Toward a model of text comprehension and production. W Kintsch, TA Van Dijk. Psychological review, 1978
- A summariser based on human memory limitations and lexical competition. Y Fang, S Teufel (2014), EACL.
Deep Reading:
- Causal reasoning in the comprehension of simple narrative texts. CR Fletcher, CP Bloom. 1988.
Annotation Task
Consider the following two texts:
- Text 1: A splint, a spark
- Text 2: The motor car
Please do the following tasks with these texts:
WEEK 2 : Segment text into topics; give each segment a name (its topic)
Your annotations:
Text 1:
Text 2:
WEEK 3 : Perform Co-reference resolution on one of the texts; for each entity in the text (generally each noun phrase, with some non-NPs slipping in), decide whether it co-refers to another entity in the text. Express this equivalence-class relationship in a way you find intuitive, e.g., by giving classes of co-referring entities a number.
Your annotations:
Text 1:
Text 2:
WEEK 4 : We now want to perform Centering Theory processing on your annotations from week 3. What is forward looking center? Backward looking center? Which entities are repeated at all across a boundary between two sentences?
Your annotations:
Text 1 (Fire):
Text 2 (Cars):
WEEK 5 : Perform an RST analysis of several sentences of your chosen text (preferrably from the top, so we get a lot of shared annotations). Using the RST annotation tool suggested, (or doing it on paper and scanning the text), please perform the following tasks involved in any RST analysis:
- Segment the text into EDUs (elementary discourse units), i.e., smallest units which perform a rhetorical function (typically one clause, but could sometimes be an NP or other part of the sentence). Deciding what an EDU is can be the hardest part of an RST analysis.
- Determine the relationship of that EDU to its surrounding context, either other EDUs or parts of the RST tree you are currently building.
- Connect every level of the RST tree up so that you entirely cover the text.
- Note that RST tree building corresponds to hierarchical clustering of a text.
Your annotations:
Text 1 (Fire):
Text 2 (Cars):
WEEK 7 : Please perform an AZ analysis of the following text:
You should acquaint yourself with the 7 AZ categories by reading the relevant parts of Teufel/Moens (2002), and then assign each sentence one of the seven labels. A good way to represent your annotation is by simply listing the sentence-id with your chosen label on one line in a file. It is not necessary to do the entire text, but please all start from the top of this file so that we have enough overlapping text. Please send your annotations to me as soon as you can, so that we will have at least three annotations that others can perform kappa (Fleiss or Cohen, as you like), alpha (your chosen distance metrics) calculations on. Show intermediate steps in your calculation, i.e., don't just feed them through an existing implementation.
Your annotations: