Simone Teufel

University of Cambridge Computer Laboratory
William Gates Building, JJ Thompson Ave,
CAMBRIDGE CB3 0FD, United Kingdom.
work: (+44) 1223 763643, fax: (+44) 1223 334678

Reader in Information and Language at the NLIP group at the Computer Laboratory of the University of Cambridge


My main research focus is on the use of discourse structure for NLP applications which require text understanding of some form. Text summarization is an example of such applications. Others are the detection of novel ideas in a scientific literature; fine-grained, science-specific information retrieval and search; improved citation indexing; and educational tools for scientific writing. Projects I am involved in concern scientific search (CitRAZ, SciBorg) and detection of emergence in a scientific field (FUSE).

The discourse analysis I proposed for such tasks is called (Argumentative Zoning (AZ). The main factors contributing to discourse structure in science, according to AZ, are expressions of sentiment towards cited work, ownership of ideas, and speech acts which express rhetorical statements typical for science. Coherence of text pieces also plays an important role. I am involved in research in coherence and anaphora recognition and distributional semantics. All these methods should serve to improve AZ recognition.

I am also interested in cognitive experiments to prove the use of this type of robust processing in a real user environment. Another ongoing interest is in the evaluation of summarization systems (which is a hard problem plaguing the community), particulary task-based evaluation.


I am/was involved in the following research projects:


I was involved in the creation of the following corpora (either in projects or with students), which are distributed here:


I am a Fellow of Computer Science at King's College.


My publications are online here.


I am currently teaching the following courses: I have taught the following courses in earlier years:

PhD Students

MPhil projects

Here is a list of my previous and current project suggestions (2010-2012).

Current postdocs

Past postdocs

Part II projects


My first degree in Computer Science is from the University of Stuttgart, more specifically from the Center for Computational Linguistics (IMS). At the IMS, I was involved in designing the STTS tagset for German corpora, and also was a member of the EAGLES corpus and lexicon standardisation group. I also spent some time at XRCE Xerox in Grenoble, working on the extraction of nominalizations and collocations.

I received my PhD in Cognitive Science from the School of Informatics at the University of Edinburgh in 2000. My PhD thesis (on Argumentative Zoning) is available here. During my PhD, I was also a member of the HCRC Language Technology Group.

During a Postdoc at Columbia University (2000-2001), I worked on the Digital Libraries Project PERSIVAL whose aim it is to provide patient-specific access to large collections of scientific articles, amongst others. In a subpart of the project, we reranked the output of searches in the field of cardiology to those articles which are of relevance to one particular patient the cardiologist is currently considering. I also worked on the TIDES project on multilingual summarization at Columbia.

I joined the NLIP group at the University of Cambridge in 2001 as a lecturer, and have been reader in information and language since 2010.

Simone Teufel
Created: October 29, 2001