Accurate and Comprehensive Lexical Classification for Natural Language Processing Applications (ACLEX)

08/2005-07/2008, funded by the EPSRC


PARTICIPANTS

Ted Briscoe, Anna Korhonen and Judita Preiss

University of Cambridge
Computer Laboratory
Natural Language and Information Processing Group
15 JJ Thomson Avenue
Cambridge CB3 OFD, United Kingdom


PROJECT SUMMARY

Lexical classes which capture useful generalizations over a range of (cross-)linguistic properties can be used to support a number of important computational linguistic tasks and applications (e.g. parsing, anaphora resolution, information extraction, open-domain question-answering, machine translation). However, to date their use in NLP has been limited because no technology for accurate and comprehensive (i.e. automatic) lexical classification is available. We will build on the preliminary research on automatic lexical classification, and develop a system capable of acquiring (i) large-scale cross-domain and (ii) domain-specific classifications from corpus data. We will evaluate and demonstrate the capabilities of this system directly and in the context of a number of NLP tasks, such as parsing and biomedical text mining. We will use the final version of the system to acquire a substantial, relatively domain-independent lexical database from standard corpora and the web which we will enrich with additional relevant information from corpora and public-domain manual classifications. The resulting resource, which will enable large-scale exploitation of lexical classes, will be distributed freely via the internet, along with the evaluation tools and the software which can be used to tune the frequency information stored in the database to particular domains/tasks.


RESOURCES




PUBLICATIONS

Anna Korhonen, Yuval Krymolowski and Nigel Collier. 2008. The Choice of Features for Classification of Verbs in Biomedical Texts. In Proceedings of Coling 2008. Manchester, UK.
PDF

Andreas Vlachos, Zoubin Ghahramani, and Anna Korhonen. 2008. Dirichlet Process Mixture Models for Verb Clustering. In Proceedings of the ICML Workshop on Prior Knowledge for Text and Language. Helsinki, Finland.
PDF

Karin Kipper, Anna Korhonen, Neville Ryant, and Martha Palmer. 2008. A Large-Scale Classification of English Verbs. In the Journal of Language Resources and Evaluation. 42(1). 21-40.

Lin Sun, Anna Korhonen, and Yuval Krymolowski. 2008. Verb Class Discovery from Rich Syntactic Data. In Proceedings of the 9th International Conference on Intelligent Text Processing and Computational Linguistics. Haifa, Israel.
PDF

Lin Sun, Anna Korhonen, and Yuval Krymolowski. 2008. Automatic Classification of English Verbs Using Rich Syntactic Features. In Proceedings of the 3rd International Joint Conference on Natural Language Processing. Hyderabad, India.
PDF

Judita Preiss, Ted Briscoe and Anna Korhonen. 2007. A System for Large-scale Acquisition of Verbal, Nominal and Adjectival Subcategorization Frames from Corpora. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. Prague, Czech Republic.
PDF

Anna Korhonen, Yuval Krymolowski, and Nigel Collier. 2006. Automatic Classification of Verbs in Biomedical Texts. In Proceedings of ACL-COLING 2006. Sydney, Australia.
PDF

Yoko Mizuta, Anna Korhonen, Tony Mullen and Nigel Collier. 2006. Zone Analysis in Biology Articles as a Basis for Information Extraction. In the International Journal of Medical Informatics on Natural Language Processing in Biomedicine and Its Applications. 75(6). 468-87.
PDF

Karin Kipper, Anna Korhonen, Neville Ryant, and Martha Palmer. 2006. A Large-Scale Extension of VerbNet with Novel Verb Classes. In Proceedings of EURALEX. Turin, Italy.
DOC

Anna Korhonen, Yuval Krymolowski, and Ted Briscoe. 2006. A Large Subcategorization Lexicon for Natural Language Processing Applications. In Proceedings of the 5th international conference on Language Resources and Evaluation. Genova, Italy.
PDF

Karin Kipper, Anna Korhonen, Neville Ryant, and Martha Palmer. 2006. Extending VerbNet with Novel Verb Classes. In Proceedings of 5th international conference on Language Resources and Evaluation. Genova, Italy.
PDF