CitRAZ: Rhetorical Citation Maps and Domain-Independent Argumentative Zoning
Dan Tidhar (until May 2006)
Associated Researchers and PhD students
This is a First Grant project is funded by EPSRC, grant no. GR/S27832/01. Runtime 02/2004--10/2006.
The objective of Component A is to demonstrate that AZ as an
intermediate discourse analysis is feasible and useful for longer
texts. If Argumentative Zoning is to be applied to longer texts with
variant structure, different writing phenomena and possibly different
rhetorical devices must be accounted for. Part of this component has
resulted in an analysis and transformation of the ACL anthology
(Hollingworth et al., 2005), particularly the journal
Computational Linguistics part of the anthology. Cf. also our
recent related work with IR applications of ACL citations (Ritchie et
Component A: extend Argumentative Zoning, a method of discourse analysis, to a wider range of texts, i.e. journals.
- Component B: use rhetorical information to guide the choice of citation
information to be shown to a user, leading to better citation indexes (which we call citation maps). In particular, find, for each citation in a text,
the sentence which most succinctly states its relation to the current
paper (Is it criticised? Does it provide the basis for the current
- Component C: extend Argumentative Zoning to a different domain, bioinformatics.
The objective of Component B is to investigate the use of citation
material selected using guidance from AZ in order to create more
valuable document surrogates. Improvements of Citation Maps over
Automatic Citation Indexers (such as Google Scholar and CiteSeer) is
the distinction of contrastive statements,
including direct comparisons and criticisms from
continuative statements, where the cited work is declared as part of
current paper's solution. For this work, automatic citation
classification sits at the core of CitRAZ's objectives. Our
published results in this work package (Teufel et al 2006a, b) have
contributed a workable and consistent annotation scheme for citation
classification, which can be summarised by the following table:
|Weak||Weakness of cited approach|
|CoCoGM||Contrast/Comparison in Goals or
|CoCo-||Author's work is stated to be superior to
|CoCoR0||Contrast/Comparison in Results (neutral)|
|CoCoXY||Contrast between 2 cited methods|
|PBas||Author uses cited work as basis or starting point |
| PUse||Author uses tools/algorithms/data/definitions|
|PModi||Author adapts or modifies tools/algorithms/data|
|PMot||This citation is positive about approach used or problem addressed (used
to motivate work in current paper|
|PSim||Author's work and cited work are similar|
|PSup|| Author's work and cited work are compatible/provide support for each other|
|Neut||Neutral description of cited work, or not enough textual evidence for above categories, or unlisted/unknown citation function|
Whereas Component A ports Argumentative Zoning to a new text type,
Component C concerns the move to a different scientific domain, namely
bioinformatics. We have chosen bioinformatics texts as we expect meta-discourse in the
life sciences to be maximally different from computational
linguistics. Meta-discourse can be expected to differ across scientific domains,
due to differences in writing styles and conventions. We observed in previous work in the medical domain (Teufel et al, 2001), that there seems to
be less overall meta-discourse in this domain, but also less
variation. This component has resulted in research in meta-discourse discovery (Abdalla and Teufel, 2006).
SciXML is an XML vocabulary developed describing the structure
of scientific papers. It was developed in conjunction with the CitRAZ
project. Distribution by email to PI.
- S. Teufel, A. Siddharthan, D. Tidhar. 2006a. Automatic classification of citation function. In: "Proceedings of EMNLP-06", Sydney, Australia.
- S. Teufel, A. Siddharthan, D. Tidhar. 2006b. An annotation scheme for citation function. In: Proceedings of Sigdial-06, Sydney, Australia.
- R. Abdalla, S. Teufel. 2006.
A bootstrapping approach to unsupervised detection of cue phrase variants. In "Proceedings of ACL/COLING 2006."
- A. Ritchie, S. Teufel, S. Robertson. 2006a. Creating a test collection for IR experiments with citations. In: Proceedings of HLT/NAACL, New York.
- A. Ritchie, S. Teufel, S. Robertson. 2006b. How to find better index terms through citations. In: Proceedings of the Workshop "Can Computational Linguistics Improve Information Retrieval?", at ACL/COLING-2006, Sydney, Australia.
- B. Hollingsworth, I. Lewin and D. Tidhar. 2005.
Retrieving Hierarchical Text Structure from Typeset
Scientific Articles - a Prerequisite for E-Science Text Mining
In Proceedings of the 4th UK E-Science All Hands Meeting,
Nottingham 2005, pages 267-273.
- S. Teufel. 2005.
Argumentative Zoning for improved citation indexing.
In ``Computing Attitude and Affect in Text: Theory and Applications''
James G. Shanahan, Yan Qu, Janyce Wiebe (Eds.)
Springer, Dordrecht, The Netherlands, 2005. Pp 159-170.
Invited Speaker and Seminar Talks related to CitRAZ
- April 2007: GLDV-Spring Seminar. University of Tuebingen; invited
speaker: Change of citation use across disciplines: usibility for
augmented citation indexing.
- March 2007: University of Toulouse; seminar: The role of citation function in scientific discourse structure.
- January 2007: National Center for Text Mining,
Manchester. Seminar: Mining novelty from citation contexts
- June 2006: Conference ISDD (International Symposium on Discourse
and Document), University of Caen; invited speaker: Discourse structure in scientific articles: argumentation and citation
- Oct 2004: Open University, Milton Keynes; seminar: Information
Access, shallow discourse analysis and citations
- Jan 2004: Conference ``Modelling of Linguistic~Information
Resources'', University of Bielefeld; invited speaker: Discourse-level argumentation in scientific articles