Subcategorization Acquisition as an Evaluation Method for WSD
Task Description
This task involves evaluating word sense disambiguation (WSD) systems
in the context of automatic subcategorization acquisition. We have
shown in our previous work that accurate WSD can improve the
performance of a verbal subcategorization acquisition system (Preiss
and Korhonen 2002, Preiss et al. 2002). When the corpus data is
disambiguated accurately, the system uses correct sets of probability
estimates for the acquisition process. This yields a more accurate
subcategorization lexicon than the first sense heuristics (i.e.
assuming the most frequent sense for all the corpus instances, and
using only a single set of probability estimates for the acquisition
process).
Our task will restrict to a set of 29 verbs. These are "hard" verbs:
high in frequency and with multiple senses. The participants will be
given the list of verbs in advance to allow a training phase (no
training data will be made available). We will provide the test
corpus. This will contain around 1000 instances of each verb, which
the participants will be expected to annotate with WordNet 1.7.1
senses.
The participant's answers will be submitted in a standardized format
to us (the format specification, identical to the other English tasks,
can be found here). After
receiving the sense annotated data, we will map the detected WordNet
senses to our senses, which are based on broad Levin style verb
classes (Levin, 1993). Levin's notion of a sense is fairly broad but
adequate enough for our purposes.
We will feed the sense annotated data from each system to Anna
Korhonen's subcategorization acquisition software. The more accurate
the sense annotation is, the more comprehensive the probability
estimates are used in the acquisition process, and the more accurate
we can expect the acquired subcategoriation frames to be. The
acquired frames will be evaluated against manually obtained gold
standard frames, which will yield a ranking of the WSD systems.
Training/Test Data
No training data will be provided. Testing data will consist of up to
1000 sentences for each chosen verb. These sentences will be drawn
from the same corpus as is used for the creation of the gold standard.
Evaluation Methodology
Evaluation will consist of mapping the submitted WordNet 1.7.1 answers
to Levin senses, and generating a set of subcategorization frames for
each verb from each system. It will not be possible to evaluate
systems if too few instances are annotated. The acquired
subcategorization frame distributions will be evaluated against gold
standard distributions created previously (by Anna Korhonen). Using
the method described in Korhonen (2002), we will generate a ranking of
the submitted WSD systems.
Resources
No subcategorization acquisition resources directly made available to
participants. The test corpus can be obtained from the Senseval site, the list of verbs used in this task is available
through this site. Please let us know if there are any problems with
either of these.
Bibliography:
- Levin B. 1993. English Verb Classes and Alternations. Chicago
University Press.
- Korhonen. A. 2002. Subcategorization Acquisition. PhD
thesis. University of Cambridge.
- Preiss J. and A. Korhonen. 2002. Improving Subcategorization
Acquisition with WSD. In Proceedings of the ACL Workshop on Word Sense
Disambiguation: Recent Successes and Future Directions.
- Preiss J., A. Korhonen and E. Briscoe. 2002. Subcategorization
Acquisition as an Evaluation Method for WSD. In Proceedings of LREC.