Department of Computer Science and Technology

Zheng Yuan

Short biography

I am a Research Associate at the Department of Computer Science and Technology of the University of Cambridge, working on Automated Language Teaching and Assessment (ALTA). My research focuses on machine learning, natural language processing, and their use in real-world applications, including automated assessment, grammatical error detection/correction, feedback generation, content/test creation, emotion detection, and sentiment analysis.

I am also a Praeceptor in Computer Science at Corpus Christi College and a Postdoctoral Teaching Fellow in Computer Science at Trinity College, University of Cambridge. I am a Conference Co-Chair of women@CL and co-organise the 2022 Oxbridge Women in Computer Science Conference. I am a Treasurer of the Special Interest Group on Building Educational Applications (SIGEDU) of the Association for Computational Linguistics (ACL) and co-organise the annual Workshop on Innovative Use of NLP for Building Educational Applications (BEA Workshop). I am also a member of the ACL Professional Conduct Committee (PCC).

I completed my PhD in the Natural Language and Information Processing (NLIP) group of the University of Cambridge, which focused on grammatical error detection and correction for learners of English as a second language. Before that, I gained an MPhil Degree in Advanced Computer Science from the University of Cambridge. I received a BSc(Eng) Degree in Telecommunications Engineering with Management from Queen Mary University of London and Beijing University of Posts and Telecommunications.

Research interests

  • Natural language processing
  • Machine learning
  • Neural networks and deep learning
  • Transfer and multi-task learning
  • Explainable machine learning
  • Educational NLP
  • Language acquisition
  • Machine translation
  • Multilingual NLP
  • SocialNLP

I occasionally offer consultancy services to companies, in the areas of natural language processing and machine learning. If you are interested, feel free to get in touch.

Publications

[Google Scholar Profile | ACL Anthology Profile]

  • Zheng Yuan, Shiva Taslimipoor, Christopher Davis and Christopher Bryant. 2021. Multi-Class Grammatical Error Detection for Correction: A Tale of Two Systems. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP-2021). [To appear].
  • Zheng Yuan and David Strohmaier. 2021. Cambridge at SemEval-2021 Task 2: Neural WiC-Model with Data Augmentation and Exploration of Representation. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), pp. 730 - 737. Online. Association for Computational Linguistics. [pdf, bib].
    *This paper presents the winning system in the SemEval-2021 Task 2 on Multilingual and Cross-lingual Word-in-Context Disambiguation.
  • Zheng Yuan, Gladys Tyen and David Strohmaier. 2021. Cambridge at SemEval-2021 Task 1: An Ensemble of Feature-Based and Neural Models for Lexical Complexity Prediction. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), pp. 590 - 597. Online. Association for Computational Linguistics. [pdf, bib].
  • Øistein E. Andersen, Zheng Yuan, Rebecca Watson and Kevin Yet Fong Cheung. 2021. Benefits of alternative evaluation methods for Automated Essay Scoring. In Proceedings of the 14th International Conference on Educational Data Mining (EDM-2021). Online. [pdf].
  • Zheng Yuan and Christopher Bryant. 2021. Document-level grammatical error correction. In Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications (BEA-2021), pp. 75 - 84. Online. Association for Computational Linguistics. [pdf, bib, code].
  • Zheng Yuan, Felix Stahlberg, Marek Rei, Bill Byrne and Helen Yannakoudakis. 2019. Neural and FST-based approaches to grammatical error correction. In Proceedings of the 14th Workshop on Innovative Use of NLP for Building Educational Applications (BEA-2019), pp. 228 - 239. Florence, Italy. Association for Computational Linguistics. [pdf, bib].
    *This paper presents the top-ranked system amongst academic submissions in the BEA-2019 Shared Task on Grammatical Error Correction.
  • Zheng Yuan. 2018. Neural sequence modelling for learner error prediction. In Proceedings of the 13th Workshop on Innovative Use of NLP for Building Educational Applications (BEA-2018), pp. 381 - 388. New Orleans, Louisiana, USA. Association for Computational Linguistics. [pdf, bib]
  • Marek Rei, Mariano Felice, Zheng Yuan and Ted Briscoe. 2017. Artificial Error Generation with Machine Translation and Syntactic Patterns. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications (BEA-2017), pp. 287 - 292. Copenhagen, Denmark. Association for Computational Linguistics. [pdf, bib]
  • Helen Yannakoudakis, Marek Rei, Øistein E. Andersen and Zheng Yuan. 2017. Neural Sequence-Labelling Models for Grammatical Error Correction. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP-2017), pp. 2785 - 2794. Copenhagen, Denmark. Association for Computational Linguistics. [pdf, bib]
  • Zheng Yuan. 2017. Grammatical error correction in non-native English. PhD thesis, UCAM-CL-TR-904, Computer Laboratory, University of Cambridge. [pdf]
  • Zheng Yuan, Ted Briscoe and Mariano Felice. 2016. Candidate re-ranking for SMT-based grammatical error correction. In Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications (BEA-2016), pp. 256 - 266. San Diego, California, USA. Association for Computational Linguistics. [pdf, bib]
  • Zheng Yuan and Ted Briscoe. 2016. Grammatical error correction using neural machine translation. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2016), pp. 380 - 386. San Diego, California, USA. Association for Computational Linguistics. [pdf, bib]
  • Zheng Yuan and Matthew Purver. 2015. Predicting Emotion Labels for Chinese Microblog Texts. In M. Gaber et al. (eds.), Advances in Social Media Analysis, pp. 129 - 149, Studies in Computational Intelligence 602, Springer. [publisher's website][draft pdf]
  • Mariano Felice and Zheng Yuan. 2014. To err is human, to correct is divine. In XRDS: Crossroads, The ACM Magazine for Students, vol. 21 num. 1, pp. 22-27. New York, NY. ACM. [publisher's website]
  • Mariano Felice, Zheng Yuan, Øistein E. Andersen, Helen Yannakoudakis and Ekaterina Kochmar. 2014. Grammatical error correction using hybrid systems and type filtering. In Proceedings of the 18th Conference on Computational Natural Language Learning: Shared Task (CoNLL 2014 Shared Task), pp. 15 - 24. Baltimore, Maryland, USA. Association for Computational Linguistics. [pdf, bib].
    *This paper presents the winning system in the CoNLL 2014 Shared Task on Grammatical Error Correction.
  • Mariano Felice and Zheng Yuan. 2014. Generating artificial errors for grammatical error correction. In Proceedings of the Student Research Workshop at 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 14), pp. 116 - 126. Gothenburg, Sweden. Association for Computational Linguistics. [pdf, bib]
  • Zheng Yuan and Mariano Felice. 2013. Constrained grammatical error correction using Statistical Machine Translation. In Proceedings of the 17th Conference on Computational Natural Language Learning: Shared Task (CoNLL 2013 Shared Task), pp. 52 - 61. Sofia, Bulgaria. Association for Computational Linguistics. [pdf, bib]
  • Zheng Yuan and Matthew Purver. 2012. Predicting Emotion Labels for Chinese Microblog Texts. In Proceedings of the ECML-PKDD 2012 Workshop on Sentiment Discovery from Affective Data (SDAD 2012), pp. 40 - 47. Bristol, UK. [pdf]

Teaching

[Supervisions]

  • Databases, Part IA CST course, Department of Computer Science and Technology, University of Cambridge, 2021-2022
  • Data Science, Part IB CST course, Department of Computer Science and Technology, University of Cambridge, 2021-2022
  • Formal Models of Language, Part IB/II CST course, Department of Computer Science and Technology, University of Cambridge, 2020-2021
  • Demonstrating and ticking for Machine Learning and Real-world Data, Part IA/IB CST course, Department of Computer Science and Technology, University of Cambridge, 2016-2018, 2020-2021

Students

MPhil in Advanced Computer Science (ACS MPhil)

  • Ahdra Merali (2021) Generating Explanations for Essay Scoring
  • Victoria Liao (2018) The Extensions of Neural Error Correction
  • Gladys Tyen (2018) Supervised Attention for Neural Error Correction

Undergraduate

  • Joshua Cowan (2021) Automated multi-hop fact checking with NLP
  • Anish Das (2021) Investigating the effect of Translation Quality on Summarisation Quality
  • Charlie Seymour (2017) Machine Translation System Combination for Resource Poor Language Pairs

Contact

Department of Computer Science and Technology
University of Cambridge
William Gates Building
JJ Thomson Avenue
Cambridge CB3 0FD, UK
firstname.surname (at) cl.cam.ac.uk
+44 (0)1223 334645