Simone Teufel

University of Cambridge Computer Laboratory

William Gates Building, JJ Thompson Ave,

CAMBRIDGE CB3 0FD, United Kingdom.

work: (+44) 1223 763643, fax: (+44) 1223 334678

Professor in Information and Language at the NLIP group at the Department of Computer Science and Technology (formerly: Computer Laboratory) of the University of Cambridge

Research

My area of research is text understanding. In particular, I develop models of discourse structure and argumentation in scientific text, and comprehension models for text summarisation. I am also interested in how "folk logic" connects to linguistic expressions, which is particularly of interest for argument mining.

The logical structure of a text is an important dimension of its meaning, and several applications could profit from its analysis -- for instance text summarization, scientific search engines, improved bibliometrics, detection of "hot ideas" in a scientific field, and tools for better academic writing. I have proposed a discourse analysis called Argumentative Zoning or AZ, which is based on the recognition of the following phenomena: sentiment expressed towards cited work, ownership of ideas, and speech acts which express rhetorical statements typical for scientific argumentation. Co-reference between entities mentioned in text, and coherence of text pieces also plays an important role in my model. I am also interested in cognitive experiments to prove the use of this type of robust processing in a real user environment, particularly in task-based evaluations.

Biography

My first degree in Computer Science is from the University of Stuttgart, more specifically from the Center for Computational Linguistics (IMS). At the IMS, I was involved in designing the STTS tagset for German corpora, and also was a member of the EAGLES corpus and lexicon standardisation group. I also spent some time at XRCE Xerox in Grenoble, working on the extraction of nominalizations and collocations.

I received my PhD in Cognitive Science from the School of Informatics at the University of Edinburgh in 2000. My PhD thesis (on Argumentative Zoning) is available here. During my PhD, I was also a member of the HCRC Language Technology Group.

During a Postdoc at Columbia University (2000-2001), I worked on the Digital Libraries Project PERSIVAL whose aim it is to provide patient-specific access to large collections of scientific articles, amongst others. In a subpart of the project, we reranked the output of searches in the field of cardiology to those articles which are of relevance to one particular patient the cardiologist is currently considering. I also worked on the TIDES project on multilingual summarization at Columbia.

I joined the NLIP group at the University of Cambridge in 2001 as a lecturer, and have been Professor in Information and Language since 2017. Most of my funded research involves text understanding or text mining, summarisation and search from scientific articles or from language learner texts.

Projects

I am/was involved in the following research projects:

FUSE (Foresight and Understanding from Scientific Exposition): Detecting the emergence of ideas from the scientific literature
SciBorg (Semantic analysis of chemistry papers)
CitRAZ (Citation relations and Argumentative Zoning)
FlySlip (Curation support for genetics papers)
FUSE (Foresight and Understanding from Scientific Exposition)

Corpora

I was involved in the creation of the following corpora (either in projects or with students), which are distributed here:

AZ corpus -- 80 annotated computational linguistics articles (from my PhD work)
Citation Function Classification corpus -- 161 annotated computational linguistics files (created in the CitRAZ project)
Sciborg corpus -- 50 annotated chemistry articles (created in the SciBorg project)
Awais Athar's Citation Sentiment corpus and Citation Context corpus

Publications

My publications are online here.

Teaching

I am teaching the following courses in 2018/19:

Machine Learning and Real-World Data (MLRD) - a hands-on course in machine learning and experimentation in Java (with Paula Buttery).
Natural Language Processing - a 12 lecture introduction course to computational linguistics and NLP (with Paula Buttery).

I have taught the following courses in earlier years:

L114 Lexical Semantics - a 16 hour lecture course on the Advanced Computer Science Mphil course (since 2010/11).
R216 Discourse Processing - a 16 hour seminar on the Advanced Computer Science Mphil course (since 2014/15).
Information Retrieval - an 8 hour lecture course on the Computer Science Tripos (Part II)
Discourse Processing and Summarisation - a 16 lecture course on the Computer Speech Text and Internet Technology (CSTIT)
Information Access - a 16 lecture course on the CSTIT Mphil course
Computing and the Web - an 8 lecture course on XML technology
%Natural % Language Processing - an 8 lecture course on the %Computer Science Tripos (Part II).

PhD Students

Anna Ritchie: Combining Citation-Based and Statistical Information Retrieval (2004--2008)
William Hollingsworth: Automatic Text Skimming of Scientific Text using Lexical Chains (2004--2008)
Johanna Geiss: Latent Semantic Analysis for Summarisation (2007--2011)
Ekaterina Shutova: Interpretation of Figurative Language (2007--2011)
James Jardine: A recommendation System for Scientific Reading Lists (2009-2014)
Awais Athar: Sentiment Classification of Citations (2009--2014)
Sandro Bauer: Information and Knowledge Extraction using Structured Knowledge Bases (2014--2017)
Yimai Fang: A summariser based on human memory constraints (2013--)
Yiannos Stathopoulos: Mathematical information retrieval (2013--)
Kevin Heffernan: Problem structure in Scientific writing (2015--)
Olesya Razuvayevskaya: Enthymemes and A fortiori Reasoning in Argumentation (2015--)
Daniel Bruder: Change-Tracking in structured documents (2016--)
Guy Aglionby: Neural Cognitive-inspired Summarisation (2018--)

MPhil projects

Improving the output of a Proposition-based Summariser (Zhu, 2016)
Automatic Identification of Innovation in Scientific Writing (Bauer, 2016)
Automatic Induction of a Scientific Sentiment Lexicon (Trendafilov, 2015)
Communicative Artifacts in Scientific Text (Heffernan, 2015)
Improving classification of noisy text by unsupervised spelling correction (Kennedy, 2014)
Coreference processing in narrative text (Sinclair, 2014)
Tracking Word meaning through time (Zhang, 2014)
Lexical simplification of news for children (Charalambides, 2013)
A summariser based on psycholinguistic principles (Fang, 2013)
Coherence in Scientific Discourse (Testuggine, 2012)
A flexible Scientific Summarizer (Pinnis, 2009)
Text Segmentation for Scientific Text (Szczurba, 2008)
Noisy Author Name Identification (Athar, 2008)
Sentiment Detection for Text to Speech (Syropoulou, 2007)
Automatic Slide Generation (Williams, 2007)
Discourse-Based Topic Summarisation (de Souza, 2005)
Syntactic and lexical variants of cue phrases (Abdalla, 2005)
Subjectivity and Sentiment Classification for Movie Reviews (Qi, 2005)
Cluster-based multi-document summarisation (de Silva, 2004)
Topic Dependence in Sentiment Classification (Engstrom, 2004)
Sentence Similarity Algorithms (Tennant, 2004)
Automatic Sentiment Classification of rhetorical statements (O'Shea, 2004)
Sentence-Based automatic sentiment classification (Bostad, 2003)
Topic-directed multi-document summarisation (Lal, 2002)
Web filter and Dynamic Text filter for NLP applications (Case, 2002)
Word-based v. Citation-based Clustering (Cooper, 2002)
Objectivity/Subjectivity Detection (Gourlay, 2002)
Summarisation of Scientific Articles Using Named Entity tagging (2002)
Automatic detection of cue phrases for Summarisation (Yang, 2002)

Here is a list of my project suggestions for 2014/2015. Some project suggestions from previous years: 2012/3 and 2013/4.

Past postdocs

Diarmuid O'Seaghdha
Dainan Kaplan
Advaith Siddharthan
Dan Tidhar

Part II projects

Interpreting natural language commands for interaction with a game world (2014)
Patent Information Retrieval (2013)
Compression of XML Data (2003)
Sentiment Classification of Movie Reviews (2002)

Summer Internships

Currently no openings for summer internships. (When I have positions, I will annouce them here.)

Simone Teufel
Created: October 29, 2001