email: sht25@cl.cam.ac.uk.
Professor in Information and Language at
the NLIP group at
the Department of Computer Science and Technology (formerly: Computer Laboratory) of the
University of Cambridge
Research
My area of research is text understanding. In
particular, I develop models of discourse structure and argumentation
in scientific text, and comprehension models for text summarisation. I
am also interested in how "folk logic" connects to linguistic
expressions, which is particularly of interest for argument mining.
The logical structure of a text is an important dimension of its
meaning, and several applications could profit from its analysis
-- for instance text summarization,
scientific search engines, improved bibliometrics, detection of "hot
ideas" in a scientific field, and tools for better academic writing.
I have proposed a discourse analysis
called Argumentative Zoning or AZ, which is based on
the recognition of the following phenomena: sentiment expressed
towards cited work, ownership of ideas, and speech acts which express
rhetorical statements typical for scientific
argumentation. Co-reference between entities mentioned in text, and
coherence of text pieces also plays an important role in my model.
I am also interested in cognitive experiments to prove the use of this
type of robust processing in a real user environment, particularly in
task-based evaluations.
Biography
My first degree in Computer Science is from the University of
Stuttgart, more specifically from the Center for Computational
Linguistics (IMS). At the IMS, I was involved in designing the
STTS tagset for German corpora, and also was a member of the EAGLES
corpus and lexicon standardisation group. I also spent some time at XRCE Xerox in Grenoble, working
on the extraction of nominalizations and collocations.
I received my PhD in Cognitive Science from the School of Informatics
at the University of Edinburgh in 2000. My PhD thesis (on
Argumentative Zoning) is available here. During
my PhD, I was also a member of the HCRC Language Technology Group.
During a Postdoc at Columbia University (2000-2001), I worked on the
Digital Libraries Project PERSIVAL whose
aim it is to provide patient-specific access to large collections of
scientific articles, amongst others. In a subpart of the project, we
reranked the output of searches in the field of cardiology to those
articles which are of relevance to one particular patient the
cardiologist is currently considering. I also worked on the TIDES
project on multilingual summarization at Columbia.
I joined the NLIP group at the University of Cambridge in 2001 as
a lecturer, and have been Professor in Information and Language since
2017. Most of my funded research involves text understanding or text
mining, summarisation and search from scientific articles or from language learner
texts.
Projects
I am/was involved in the following research projects:
- FUSE
(Foresight and Understanding from
Scientific Exposition): Detecting the emergence of ideas from the scientific literature
- SciBorg (Semantic analysis of chemistry papers)
- CitRAZ (Citation relations and Argumentative Zoning)
- FlySlip
(Curation support for genetics papers)
- FUSE (Foresight and Understanding from Scientific Exposition)
Corpora
I was involved in the creation of the following
corpora (either in projects or with students), which are distributed here:
Publications
My publications are online here.
Teaching
I am teaching the following courses in 2018/19:
I have taught the following courses in earlier years:
- L114 Lexical
Semantics - a 16 hour lecture course on the Advanced Computer Science Mphil course (since 2010/11).
- R216 Discourse
Processing - a 16 hour seminar on the Advanced Computer
Science Mphil course (since 2014/15).
- Information
Retrieval - an 8 hour lecture course on the Computer
Science Tripos (Part II)
- Discourse Processing and Summarisation - a 16 lecture course on
the Computer Speech Text and Internet Technology (CSTIT)
- Information Access - a 16 lecture course on the CSTIT Mphil course
- Computing and the Web - an 8 lecture course on XML
technology
-
%Natural
% Language Processing - an 8 lecture course on the
%Computer Science Tripos (Part II).
PhD Students
- Anna Ritchie: Combining Citation-Based and Statistical
Information Retrieval (2004--2008)
- William Hollingsworth: Automatic Text Skimming of Scientific
Text using Lexical Chains (2004--2008)
- Johanna Geiss: Latent Semantic Analysis for Summarisation (2007--2011)
- Ekaterina Shutova: Interpretation of Figurative Language (2007--2011)
- James Jardine: A recommendation System for Scientific Reading
Lists (2009-2014)
- Awais Athar: Sentiment Classification of Citations
(2009--2014)
- Sandro Bauer: Information and Knowledge Extraction using
Structured Knowledge Bases (2014--2017)
- Yimai Fang: A summariser based on human memory constraints (2013--)
- Yiannos Stathopoulos: Mathematical information retrieval
(2013--)
- Kevin Heffernan: Problem structure in Scientific writing (2015--)
- Olesya Razuvayevskaya: Enthymemes and A fortiori Reasoning in
Argumentation (2015--)
- Daniel Bruder: Change-Tracking in structured documents (2016--)
- Guy Aglionby: Neural Cognitive-inspired Summarisation (2018--)
MPhil projects
- Improving the output of a Proposition-based
Summariser (Zhu, 2016)
- Automatic Identification of Innovation in Scientific
Writing (Bauer, 2016)
- Automatic Induction of a Scientific Sentiment Lexicon
(Trendafilov, 2015)
- Communicative Artifacts in Scientific Text (Heffernan, 2015)
- Improving classification of noisy text by unsupervised spelling
correction (Kennedy, 2014)
- Coreference processing in narrative text (Sinclair, 2014)
- Tracking Word meaning through time (Zhang, 2014)
- Lexical simplification of news for children (Charalambides, 2013)
- A summariser based on psycholinguistic principles (Fang, 2013)
- Coherence in Scientific Discourse (Testuggine, 2012)
- A flexible Scientific Summarizer (Pinnis, 2009)
- Text Segmentation for Scientific Text (Szczurba, 2008)
- Noisy Author Name Identification (Athar, 2008)
- Sentiment Detection for Text to Speech (Syropoulou, 2007)
- Automatic Slide Generation (Williams, 2007)
- Discourse-Based Topic Summarisation (de Souza, 2005)
- Syntactic and lexical variants of cue phrases (Abdalla, 2005)
- Subjectivity and Sentiment Classification for Movie Reviews (Qi, 2005)
- Cluster-based multi-document summarisation (de Silva, 2004)
- Topic Dependence in Sentiment Classification (Engstrom, 2004)
- Sentence Similarity Algorithms (Tennant, 2004)
- Automatic Sentiment Classification of rhetorical statements (O'Shea, 2004)
- Sentence-Based automatic sentiment classification (Bostad, 2003)
- Topic-directed multi-document summarisation (Lal, 2002)
- Web filter and Dynamic Text filter for NLP applications (Case, 2002)
- Word-based v. Citation-based Clustering (Cooper, 2002)
- Objectivity/Subjectivity Detection (Gourlay, 2002)
- Summarisation of Scientific Articles Using Named Entity tagging
(2002)
- Automatic detection of cue phrases for Summarisation (Yang, 2002)
Here is a list of my project suggestions for 2014/2015. Some project suggestions from previous years: 2012/3 and 2013/4.
Past postdocs
Part II projects
- Interpreting natural language commands for interaction with a
game world (2014)
- Patent Information Retrieval (2013)
- Compression of XML Data (2003)
- Sentiment Classification of Movie Reviews (2002)
Summer Internships
Currently no openings for summer internships. (When I have
positions, I will annouce them here.)
Simone Teufel
Created: October 29, 2001