email: sht25@cl.cam.ac.uk.
Reader in Information and Language at the NLIP group at the Computer Laboratory of the
University of Cambridge
Research
My main research focus is on the use of discourse
structure for NLP applications which require text understanding of
some form. Text summarization is an example of such
applications. Others are the detection of novel ideas in a scientific
literature; fine-grained, science-specific information retrieval and
search; improved citation indexing; and educational tools for
scientific writing. Projects I am involved in concern scientific
search (CitRAZ, SciBorg) and detection of emergence in a scientific
field (FUSE).
The discourse analysis I proposed for such tasks is called (Argumentative Zoning (AZ). The main factors
contributing to discourse structure in science, according to AZ, are
expressions of sentiment towards cited work, ownership of
ideas, and speech acts which express rhetorical statements
typical for science. Coherence of text pieces also plays an important
role. I am involved in research in coherence and anaphora
recognition and distributional semantics. All these methods
should serve to improve AZ recognition.
I am also interested in cognitive experiments to prove the
use of this type of robust processing in a real user
environment. Another ongoing interest is in the evaluation of
summarization systems (which is a hard problem plaguing the
community), particulary task-based evaluation.
Projects
I am/was involved in the following research projects:
- FUSE(Foresight and Understanding from
Scientific Exposition): Detecting the emergence of ideas from the scientific literature
- SciBorg (Semantic analysis of chemistry papers)
- CitRAZ (Citation relations and Argumentative Zoning)
- FlySlip
(Curation support for genetics papers)
College
I am Director of Studies in Computer Science at King's College.
Publications
My publications are online here.
Teaching
I am currently teaching the following courses:
I have taught the following courses in earlier years:
- Discourse Processing and Summarisation - a 16 lecture course on
the Computer Speech Text and Internet Technology (CSTIT)
- Information Access - a 16 lecture course on the CSTIT Mphil course
- Computing and the Web - an 8 lecture course on XML technology
PhD Students
- Anna Ritchie: Combining Citation-Based and Statistical
Information Retrieval (2004--2008)
- William Hollingsworth: Automatic Text Skimming of Scientific
Text using Lexical Chains (2004--2008)
- Johanna Geiss: Latent Semantic Analysis for Summarisation (2007-2011)
- Ekaterina Shutova: Interpretation of Figurative Language (2007-2011)
- James Jardine: A recommendation System for Scientific Reading Lists(2009-)
- Awais Athar: Sentiment Classification of Citations (2009-)
MPhil projects
- Coherence in Scientific Discourse (Testuggine, 2012)
- A flexible Scientific Summarizer (Pinnis, 2009)
- Text Segmentation for Scientific Text (Szczurba, 2008)
- Noisy Author Name Identification (Athar, 2008)
- Sentiment Detection for Text to Speech (Syropoulou, 2007)
- Automatic Slide Generation (Williams, 2007)
- Discourse-Based Topic Summarisation (de Souza, 2005)
- Syntactic and lexical variants of cue phrases (Abdalla, 2005)
- Subjectivity and Sentiment Classification for Movie Reviews (Qi, 2005)
- Cluster-based multi-document summarisation (de Silva, 2004)
- Topic Dependence in Sentiment Classification (Engstrom, 2004)
- Sentence Similarity Algorithms (Tennant, 2004)
- Automatic Sentiment Classification of rhetorical statements (O'Shea, 2004)
- Sentence-Based automatic sentiment classification (Bostad, 2003)
- Topic-directed multi-document summarisation (Lal, 2002)
- Web filter and Dynamic Text filter for NLP applications (Case, 2002)
- Word-based v. Citation-based Clustering (Cooper, 2002)
- Objectivity/Subjectivity Detection (Gourlay, 2002)
- Summarisation of Scientific Articles Using Named Entity tagging
(2002)
- Automatic detection of cue phrases for Summarisation (Yang, 2002)
Here is a list of my previous and current project suggestions (2010-2012).
Current postdocs
Past postdocs
Part II projects
- Patent Information Retrieval (ongoing, due 2013)
- Compression of XML Data (2003)
- Sentiment Classification of Movie Reviews (2002)
Biography
My first degree in Computer Science is from the University of
Stuttgart, more specifically from the Center for Computational
Linguistics (IMS). At the IMS, I was involved in designing the
STTS tagset for German corpora, and also was a member of the EAGLES
corpus and lexicon standardisation group. I also spent some time at XRCE Xerox in Grenoble, working
on the extraction of nominalizations and collocations.
I received my PhD in Cognitive Science from the School of Informatics
at the University of Edinburgh in 2000. My PhD thesis (on
Argumentative Zoning) is available here. During
my PhD, I was also a member of the HCRC Language Technology Group.
During a Postdoc at Columbia University (2000-2001), I worked on the
Digital Libraries Project PERSIVAL whose
aim it is to provide patient-specific access to large collections of
scientific articles, amongst others. In a subpart of the project, we
reranked the output of searches in the field of cardiology to those
articles which are of relevance to one particular patient the
cardiologist is currently considering. I also worked on the TIDES
project on multilingual summarization at Columbia.
I joined the NLIP group at the University of Cambridge in 2001 as
a lecturer, and have been reader in information and language since 2010.
Simone Teufel
Created: October 29, 2001