for English Subcategorization Acquisition Systems



Over the past years acquiring subcategorization lexicons from textual corpora has become increasingly popular. Several systems have recently been proposed which are capable of detecting comprehensive sets of subcategorization frames (SCFs) and producing large-scale lexicons which include valuable frequency information. Results from the evaluation of these systems have generally been encouraging. However, the variation in the evaluation methods, the number of target SCFs, test verbs, gold standards, and test corpora have made direct comparison of different results and systems difficult.

The aim of this website is to provide resources which can be used as a common test bed for evaluating the performance of subcategorization acquisition systems. These resources include an evaluation corpus and a gold standard for a set of 30 test verbs, and software which can be used to automatically evaluate SCF lexicons using several well-established methods.

As a starting point for inter-system comparison, we make available our results on the evaluation corpus, obtained by the current version of Briscoe and Carroll's (1997) subcategorization acquisition system (Korhonen, 2002). We would be pleased to hear about the results obtained by other systems.

Download the Evaluation Resources

Click the following links to download the evaluation resources. Please read the copyright notice below. For a detailed description of the materials provided, see the readme documents that are included in the downloads.


    The 65,000 word evaluation corpus was extracted from 20M words of the British National Corpus. It contains data for 30 test verbs, an average of 1000 occurrences for each verb. The test verbs were selected randomly, subject to the constraint that they take multiple SCFs and occur frequently enough in corpus data.


    The gold standard assumes Briscoe's (2000) subcategorization frame classification, which incorporates 163 SCF distinctions: a superset of those found in the ANLT and COMLEX Syntax dictionaries. The gold standard records the type and relative frequency of each SCF for a given verb. It was obtained via manual analysis of corpus data, by analysing around 300 occurrences for each verb.


    The software can be used to evaluate an automatically acquired SCF lexicon against the gold standard. It calculates the standard ranking accuracy, precision, recall and F-beta measures, and compares the similarity between the acquired and gold standard SCF distributions using various measures of distributional similarity (KL distance, JS divergence, cross entropy, skew divergence, rank correlation, and intersection).

    The software can also be used to filter out noisy SCFs from the automatically acquired lexicon prior to proceeding with the actual evaluation. It provides several filtering and thresholding techniques for this purpose, including e.g. the binomial hypothesis test and a threshold on the relative frequencies of SCFs.


    We evaluated the current version of Briscoe and Carroll's (1997) subcategorization acquisition system (Korhonen, 2002) using the resources provided here. We have made the results available for comparison. The test files provided with the evaluation software allow replicating our experiment/results.


Ted Briscoe and John Carroll. 1997. Automatic Extraction of Subcategorization from Corpora. In Proceedings of the Fifth Conference on Applied Natural Language Processing. Washington, DC.

Ted Briscoe. 2000. Dictionary and System Subcategorisation Code Mappings. Unpublished manuscript, University of Cambridge Computer Laboratory. Included in the download materials above.

Ted Briscoe. 2001. From Dictionary to Corpus to Self-Organizing Dictionary: Learning Valency Associations in the Face of Variation and Change. In Proceedings of Corpus Linguistics. Lancaster University, UK. PDF.

Anna Korhonen, Genevieve Gorrell and Diana McCarthy. 2000. Statistical Filtering and Subcategorization Frame Acquisition. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. Hong Kong. PostScript.

Anna Korhonen. 2002. Subcategorization Acquisition. PhD thesis published as Techical Report UCAM-CL-TR-530. Computer Laboratory, University of Cambridge. PDF.

Anna Korhonen and Yuval Krymolowski. 2002. On the Robustness of Entropy-Based Similarity Measures in Evaluation of Subcategorization Acquisition Systems. In Proceedings of the Sixth Conference on Natural Language Learning. Taipei, Taiwan. PostScript.


We would be pleased to receive comments on the materials provided here. Please
contact us with any feedback or suggestions you may have.

Copyright Notice

Copyright 2002
Anna Korhonen, University of Cambridge

These resources are distributed freely under the terms of the GNU General Public License. Click here to read the license. Please acknowledge the use of these resources in any publications by providing the appropriate reference and URL (e.g. "The evaluation resources were provided by A. Korhonen, and are available at URL: http://www.cl.cam.ac.uk/users/alk23/subcat/subcat.html").


The resources were produced as part of the UK EPSRC-funded project 'Robust Accurate Statistical Parsing' (

Back to Anna Korhonen's homepage

          t h i s   p a g e   w a s   l a s t   m o d i f i e d   n o v e m b e r   4,   2 0 0 2