Over the past years acquiring subcategorization lexicons from textual corpora has become
increasingly popular. Several systems have recently been proposed which are capable of
detecting comprehensive sets of subcategorization frames (SCFs) and producing large-scale
lexicons which include valuable frequency information. Results from the evaluation
of these systems have generally been encouraging. However, the variation in the
evaluation methods, the number of target SCFs, test verbs, gold standards, and test
corpora have made direct comparison of different results and systems difficult.
The aim of this website is to provide resources which can be used as a common
test bed for evaluating the performance of subcategorization acquisition systems.
These resources include an evaluation corpus and a gold standard for a set of 30 test
verbs, and software which can be used to automatically evaluate SCF lexicons using
several well-established methods.
As a starting point for inter-system comparison, we make available our results on the
evaluation corpus, obtained by the current version of Briscoe and Carroll's (1997)
subcategorization acquisition system (Korhonen, 2002). We would be pleased to hear about
the results obtained by other systems.
Download the Evaluation Resources
Click the following links to download the evaluation resources. Please read the copyright
For a detailed description of the materials provided, see the readme documents
that are included in the downloads.
The 65,000 word evaluation corpus was extracted from 20M words of the British
National Corpus. It contains data for 30 test verbs,
an average of 1000 occurrences for each verb. The test verbs were selected
randomly, subject to the constraint that they take multiple SCFs
and occur frequently enough in corpus data.
SCF CLASSIFICATION AND GOLD STANDARD
The gold standard assumes Briscoe's (2000) subcategorization frame
classification, which incorporates 163 SCF distinctions:
a superset of those found in the ANLT and COMLEX Syntax dictionaries.
The gold standard records the type and relative frequency of each SCF for a
given verb. It was obtained via manual analysis of corpus data, by
analysing around 300 occurrences for each verb.
The software can be used to evaluate an automatically acquired SCF lexicon
against the gold standard. It calculates the standard ranking accuracy,
precision, recall and F-beta measures, and compares the similarity between
the acquired and gold standard SCF distributions using various measures of
distributional similarity (KL distance, JS divergence, cross entropy,
skew divergence, rank correlation, and intersection).
The software can also be used to filter out noisy SCFs from the automatically
acquired lexicon prior to proceeding with the actual evaluation. It provides
several filtering and thresholding techniques for this purpose,
including e.g. the binomial hypothesis test and a threshold on
the relative frequencies of SCFs.
RESULTS FOR BRISCOE AND CARROLL'S
SUBCATEGORIZATION ACQUISITION SYSTEM
We evaluated the current version of Briscoe and Carroll's (1997) subcategorization
acquisition system (Korhonen, 2002) using the resources
provided here. We have made the results available for comparison. The test files provided
with the evaluation software allow replicating our experiment/results.
Ted Briscoe and John Carroll. 1997. Automatic
Extraction of Subcategorization from Corpora.
In Proceedings of the Fifth Conference on
Applied Natural Language Processing. Washington, DC.
Ted Briscoe. 2000. Dictionary and System
Subcategorisation Code Mappings. Unpublished manuscript,
University of Cambridge Computer Laboratory. Included in the download materials above.
Ted Briscoe. 2001. From Dictionary to Corpus to Self-Organizing
Dictionary: Learning Valency Associations in the Face of Variation and Change.
In Proceedings of Corpus Linguistics. Lancaster University, UK.
Anna Korhonen, Genevieve Gorrell and Diana McCarthy. 2000. Statistical
Filtering and Subcategorization Frame Acquisition.
In Proceedings of the Joint SIGDAT Conference on
Empirical Methods in Natural Language Processing and Very Large Corpora. Hong Kong.
Anna Korhonen. 2002. Subcategorization Acquisition.
PhD thesis published as Techical Report UCAM-CL-TR-530. Computer Laboratory, University of
Anna Korhonen and Yuval Krymolowski. 2002. On the Robustness
of Entropy-Based Similarity Measures in Evaluation of Subcategorization Acquisition
Systems. In Proceedings of the Sixth Conference on Natural Language Learning.
Taipei, Taiwan. PostScript.
We would be pleased to receive comments on the materials provided here. Please
contact us with any feedback or suggestions
you may have.
Copyright © 2002 Anna Korhonen,
University of Cambridge
These resources are distributed freely under the terms of the GNU General Public License.
Click here to read the license.
Please acknowledge the use of these resources in any publications by providing the appropriate
reference and URL
(e.g. "The evaluation resources were provided by A. Korhonen, and are available at URL:
The resources were produced as part of the UK EPSRC-funded project
'Robust Accurate Statistical Parsing'
Back to Anna Korhonen's homepage