Guest editors:
Aline
Villavicencio University of Cambridge, UK
Francis
Bond NTT Communication Science Laboratories, Japan
Anna
Korhonen University of Cambridge, UK
Diana
McCarthy University of Sussex, UK
Multiword
expressions (MWEs) include a large range of linguistic phenomena,
such as phrasal verbs (e.g. "add up"), nominal compounds (e.g.
"telephone box"), and institutionalized phrases (e.g. "salt and
pepper"), and they can be syntactically and/or semantically
idiosyncratic in nature. MWEs are used frequently in everyday
language, usually to express precisely ideas and concepts that
cannot be compressed into a single word. A considerable amount of
research has been devoted to this subject, both in terms of theory
and practice, but despite increasing interest in idiomaticity within
linguistic research, there is still a gap between the needs of
natural language processing (NLP) and the descriptive tradition of
linguistics. Most real-world applications tend to ignore MWEs or
address them simply by listing. However, it is clear that successful
applications will need to be able to identify and treat them more
appropriately.
In recent years there has been a growing
awareness in the NLP community of the problems that MWEs pose and
the need for their robust handling. This special issue of Computer
Speech and Language, due for publication in 2005, will be devoted to
the acquisition, identification and treatment of MWEs. We invite
papers adopting a quantitive approach to the following aspects of
MWE research:
* Extraction of
MWEs:
There has been considerable research into
extraction of lists of some multiword expressions and collocations
of certain types, such as noun noun compounds, institutionalised
expressions and verb particle constructions. Papers which explore
the benefits and weaknesses of methods across different MWE types,
and across different languages are particularly welcome. Also, we
encourage papers where the extraction is not limited to an
enumeration of MWEs of a given type, but permits some sort of
subcategorization or analysis of the syntactic or semantic
properties of the expression.
*
Evaluation of extracted MWEs:
To date researchers
have tended to evaluate MWE extraction by exploiting available
man-made lexical resources or using manual annotation of either the
input data or the automatically extracted lists. There is
considerable scope for proposals of standard evaluation metrics,
test and training data and for task-based evaluation.
* Identification of MWEs:
Whilst
there has been considerable research on extraction, less attention
has been paid to determining if a candidate multiword token is in
fact a genuine multiword, or simply a regular compositional
occurrence of the words that can comprise a multiword e.g. "She
looked up the road" vs "She looked up his telephone
number".
* The benefits of MWE
identification and treatment for applications:
Papers are encouraged which expose the problems
that MWEs pose for specific applications and solutions to these
problems.
Submission Information:
Deadline for
paper submissions: June 5, 2004 All
submissions will be subject to the normal peer review process for this
journal.
We recommend that papers do not exceed 15 pages, and they must
conform to the Computer Speech and Language specifications, which are
available at
http://authors.elsevier.com/journal/csl
.
Submission are to be done electronically, by sending the paper to
both: the editors, mailing
mwe-editors@cl.cam.ac.uk , and the journal, using the on-line
submission facility in
http://authors.elsevier.com/journal/csl
.
Any initial queries should be addressed to mwe-editors@cl.cam.ac.uk
|