I am a Research Associate in the Natural Language and Information Processing Group at the Computer Laboratory at the University of Cambridge, working with Anna Korhonen on the CRAB project. My main research interest is statistical NLP and applications of NLP in real-world tasks, e.g. scientific text processing and literature-based discovery.

I hold a PhD in Computation, Cognition and Language and an MPhil in Computer Speech, Text and Internet Technology from the University of Cambridge, and a bachelor's degree in Computer Science from Peking University.



  • Douwe Kiela, Yufan Guo, Ulla Stenius and Anna Korhonen. 2014. Unsupervised Discovery of Information Structure in Biomedical Documents. In Bioinformatics 2014, doi: 10.1093/bioinformatics/btu758. Link
  • Yufan Guo, Diarmuid Ó Séaghdha, Ilona Silins, Lin Sun, Johan Högberg, Ulla Stenius and Anna Korhonen. 2014. CRAB 2.0: A text mining tool for supporting literature review in chemical cancer risk assessment. In Proceedings of COLING 2014. Dublin, Ireland. Link
  • Elisa Omodei, Yufan Guo, Jean-Philippe Cointet and Thierry Poibeau. 2014. Analyse discursive automatique du corpus ACL Anthology. 21ème conférence Traitement Automatique des Langues Naturelles 2014. Marseille, France. Link
  • Xiao Jiang, Yufan Guo, Jeroen Geertzen, Theodora Alexopoulou, Lin Sun and Anna Korhonen. 2014. Native Language Identification Using Large, Longitudinal Data. In Proceedings of LREC 2014. Reykjavik, Iceland. Link
  • Elisa Omodei, Yufan Guo, Jean-Philippe Cointet and Thierry Poibeau. 2014. Social and Semantic Diversity: Socio-semantic Representation of a Scientific Corpus. In Proceedings of LaTeCH 2014. Gothenburg, Sweden. Link
  • Anna Korhonen, Yufan Guo, Meliha Yetisgen-Yildiz, Ulla Stenius, Masashi Narita and Pietro Lio. 2014. Improving Literature-Based Discovery with Text Mining. In Proceedings of CIBB 2014. Cambridge, UK. Link
  • Kristin Larsson, Ilona Silins, Yufan Guo, Anna Korhonen, Ulla Stenius and Marika Berglund. 2014. Text mining for improved human exposure assessment. In Toxicology Letters 2014, doi:10.1016/j.toxlet.2014.06.427. Link
  • Ilona Silins, Anna Korhonen, Yufan Guo and Ulla Stenius. 2014. A text-mining approach for chemical risk assessment and cancer research. In Toxicology Letters 2014, doi:10.1016/j.toxlet.2014.06.565. Link


  • Yufan Guo, Ilona Silins, Ulla Stenius and Anna Korhonen. 2013. Active learning-based information structure analysis of full scientific articles and two applications for biomedical literature review. In Bioinformatics 2013, doi: 10.1093/bioinformatics/btt163. Link
  • Yufan Guo, Roi Reichart and Anna Korhonen. 2013. Improved Information Structure Analysis of Scientific Documents Through Discourse and Lexical Constraints. In Proceedings of NAACL 2013. Atlanta, US. Link


  • Yufan Guo, Ilona Silins, Roi Reichart and Anna Korhonen. 2012. CRAB Reader: A Tool for Analysis and Visualization of Argumentative Zones in Scientific Literature. In Proceedings of COLING 2012. Mumbai, India. Link
  • Danish Contractor, Yufan Guo and Anna Korhonen. 2012. Using Argumentative Zones for Extractive Summarization of Scientific Articles. In Proceedings of COLING 2012. Mumbai, India. Link
  • Yufan Guo. 2012. E-mail Spam Filtering and Natural Language Processing. In Hakin9 Exploiting Software, 2:5. Link


  • Yufan Guo, Anna Korhonen, Ilona Silins and Ulla Stenius. 2011. Weakly-supervised learning of information structure of scientific abstracts - is it accurate enough to benefit real-world tasks in biomedicine? In Bioinformatics 2011, doi: 10.1093/bioinformatics/btr536. Link
  • Yufan Guo, Anna Korhonen and Thierry Poibeau. 2011. A Weakly-supervised Approach to Argumentative Zoning of Scientific Documents. In Proceedings of EMNLP 2011. Edinburgh, UK. Link
  • Yufan Guo, Anna Korhonen, Maria Liakata, Ilona Silins, Johan Högberg and Ulla Stenius. 2011. A comparison and user-based evaluation of models of textual information structure in the context of cancer risk assessment. In BMC Bioinformatics 2011, 12:69. Link


  • Yufan Guo, Anna Korhonen, Maria Liakata, Ilona Silins, Lin Sun and Ulla Stenius. 2010. Identifying the Information Structure of Scientific Abstracts: An Investigation of Three Different Schemes. In Proceedings of bio-NLP 2010. Uppsala, Sweden. Link


  • Using Models of Textual Information Structure to Aid the Review of Biomedical Abstracts in Cancer Risk Assessment. Invited talk at Laboratoire d'Informatique de Paris-Nord, France. 2010. Slides
