Professor in Information and Language at the NLIP group at the Department of Computer Science and Technology (formerly: Computer Laboratory) of the University of Cambridge


My area of research is text understanding. In particular, I develop models of discourse structure and argumentation in scientific text, and comprehension models for text summarisation. I am also interested in how "folk logic" connects to linguistic expressions, which is particularly of interest for argument mining.

The logical structure of a text is an important dimension of its meaning, and several applications could profit from its analysis -- for instance text summarization, scientific search engines, improved bibliometrics, detection of "hot ideas" in a scientific field, and tools for better academic writing. I have proposed a discourse analysis called Argumentative Zoning or AZ, which is based on the recognition of the following phenomena: sentiment expressed towards cited work, ownership of ideas, and speech acts which express rhetorical statements typical for scientific argumentation. Co-reference between entities mentioned in text, and coherence of text pieces also plays an important role in my model. I am also interested in cognitive experiments to prove the use of this type of robust processing in a real user environment, particularly in task-based evaluations.


My first degree in Computer Science is from the University of Stuttgart, more specifically from the Center for Computational Linguistics (IMS). At the IMS, I was involved in designing the STTS tagset for German corpora, and also was a member of the EAGLES corpus and lexicon standardisation group. I also spent some time at XRCE Xerox in Grenoble, working on the extraction of nominalizations and collocations.

I received my PhD in Cognitive Science from the School of Informatics at the University of Edinburgh in 2000. My PhD thesis (on Argumentative Zoning) is available here. During my PhD, I was also a member of the HCRC Language Technology Group.

During a Postdoc at Columbia University (2000-2001), I worked on the Digital Libraries Project PERSIVAL whose aim it is to provide patient-specific access to large collections of scientific articles, amongst others. In a subpart of the project, we reranked the output of searches in the field of cardiology to those articles which are of relevance to one particular patient the cardiologist is currently considering. I also worked on the TIDES project on multilingual summarization at Columbia.

I joined the NLIP group at the University of Cambridge in 2001 as a lecturer, and have been Professor in Information and Language since 2017. Most of my funded research involves text understanding or text mining, summarisation and search from scientific articles or from language learner texts.


I am/was involved in the following research projects:


I was involved in the creation of the following corpora (either in projects or with students), which are distributed here:


My publications are online here.


I will be teaching the following courses in the next year: I have taught the following courses in earlier years:

