Lexical classes which capture useful generalizations over a range of (cross-)linguistic properties
can be used to support a number of important computational linguistic tasks and applications
(e.g. parsing, anaphora resolution, information extraction, open-domain question-answering,
machine translation). However, to date their use in NLP has been limited because no technology
for accurate and comprehensive (i.e. automatic) lexical classification is available. We will build
on the preliminary research on automatic lexical classification, and develop a system capable of
acquiring (i) large-scale cross-domain and (ii) domain-specific classifications from corpus data.
We will evaluate and demonstrate the capabilities of this system directly and in the context of a
number of NLP tasks, such as parsing and biomedical text mining. We will use the final version of
the system to acquire a substantial, relatively domain-independent lexical database from standard
corpora and the web which we will enrich with additional relevant information from corpora and
public-domain manual classifications. The resulting resource, which will enable large-scale
exploitation of lexical classes, will be distributed freely via the internet, along with the
evaluation tools and the software which can be used to tune the frequency information stored in
the database to particular domains/tasks.
| |
Anna Korhonen, Yuval Krymolowski and Nigel Collier. 2008.
The Choice of
Features for Classification of Verbs in Biomedical Texts. In
Proceedings of Coling 2008. Manchester, UK.
PDF
Andreas Vlachos, Zoubin Ghahramani, and Anna Korhonen. 2008.
Dirichlet Process Mixture Models for Verb Clustering.
In Proceedings of the ICML Workshop on Prior Knowledge for Text and Language. Helsinki, Finland.
PDF
Karin Kipper, Anna Korhonen, Neville Ryant, and Martha Palmer. 2008.
A Large-Scale Classification of English Verbs. In the
Journal of Language Resources and Evaluation. 42(1). 21-40.
Lin Sun, Anna Korhonen, and Yuval Krymolowski. 2008.
Verb Class Discovery from Rich Syntactic Data. In
Proceedings of the 9th International Conference on Intelligent Text Processing
and Computational Linguistics. Haifa, Israel.
PDF
Lin Sun, Anna Korhonen, and Yuval Krymolowski. 2008. Automatic Classification of English
Verbs Using Rich Syntactic Features. In Proceedings of the 3rd International Joint
Conference on Natural Language Processing. Hyderabad, India.
PDF
Judita Preiss, Ted Briscoe and Anna Korhonen. 2007. A System for Large-scale Acquisition
of Verbal, Nominal and Adjectival Subcategorization Frames from Corpora. In
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. Prague, Czech Republic.
PDF
Anna Korhonen, Yuval Krymolowski, and Nigel Collier. 2006.
Automatic Classification of Verbs in Biomedical Texts.
In Proceedings of ACL-COLING 2006. Sydney, Australia.
PDF
Yoko Mizuta, Anna Korhonen, Tony Mullen and Nigel Collier. 2006.
Zone Analysis in Biology Articles as a Basis for Information Extraction. In
the International Journal of Medical Informatics on Natural Language
Processing in Biomedicine and Its Applications. 75(6). 468-87.
PDF
Karin Kipper, Anna Korhonen, Neville Ryant, and Martha Palmer. 2006.
A Large-Scale Extension of VerbNet with Novel Verb Classes.
In Proceedings of EURALEX. Turin, Italy.
DOC
Anna Korhonen, Yuval Krymolowski, and Ted Briscoe. 2006.
A Large Subcategorization Lexicon for Natural Language Processing Applications.
In Proceedings of the 5th international conference on Language Resources and Evaluation. Genova, Italy.
PDF
Karin Kipper, Anna Korhonen, Neville Ryant, and Martha Palmer. 2006.
Extending VerbNet with Novel Verb Classes.
In Proceedings of 5th international conference on Language Resources and Evaluation. Genova, Italy.
PDF
|
|