This directory contains the official version of the First Certificate in English (FCE) corpus used in the BEA2019 shared task.

More details about the FCE corpus can be found in the following paper:

Helen Yannakoudakis, Ted Briscoe, and Ben Medlock. 2011. A new dataset and method for automatically grading ESOL texts. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 180–189.

The original FCE files are available here: https://ilexir.co.uk/datasets/index.html
The raw dataset is not explicitly split into training, development and test sets, and so we recreated this split based on the error detection version of the dataset available at the same link.

It is not entirely straightforward to convert the raw XML files to M2 format, so we only provide the output M2 files and not the scripts used to make them. Like the other files in BEA2019, these files were also made using the ERRANT annotation toolkit: https://github.com/chrisjbryant/errant

Specifically, they were created in Python 3.5 using spacy v1.9.0 and the en_core_web_sm-1.2.0 model.