Journal of Computer Speech and Language
Special Issue on Multiword Expressions


Guest editors:

Aline Villavicencio
University of Cambridge, UK

Francis Bond
NTT Communication Science Laboratories, Japan

Anna Korhonen
University of Cambridge, UK

Diana McCarthy
University of Sussex, UK

Multiword expressions (MWEs) include a large range of linguistic phenomena, such as phrasal verbs (e.g. "add up"), nominal compounds (e.g. "telephone box"), and institutionalized phrases (e.g. "salt and pepper"), and they can be syntactically and/or semantically idiosyncratic in nature. MWEs are used frequently in everyday language, usually to express precisely ideas and concepts that cannot be compressed into a single word. A considerable amount of research has been devoted to this subject, both in terms of theory and practice, but despite increasing interest in idiomaticity within linguistic research, there is still a gap between the needs of natural language processing (NLP) and the descriptive tradition of linguistics. Most real-world applications tend to ignore MWEs or address them simply by listing. However, it is clear that successful applications will need to be able to identify and treat them more appropriately.

In recent years there has been a growing awareness in the NLP community of the problems that MWEs pose and the need for their robust handling. This special issue of Computer Speech and Language, due for publication in 2005, will be devoted to the acquisition, identification and treatment of MWEs. We invite papers adopting a quantitive approach to the following aspects of MWE research:

* Extraction of MWEs:

There has been considerable research into extraction of lists of some multiword expressions and collocations of certain types, such as noun noun compounds, institutionalised expressions and verb particle constructions. Papers which explore the benefits and weaknesses of methods across different MWE types, and across different languages are particularly welcome. Also, we encourage papers where the extraction is not limited to an enumeration of MWEs of a given type, but permits some sort of subcategorization or analysis of the syntactic or semantic properties of the expression.

* Evaluation of extracted MWEs:

To date researchers have tended to evaluate MWE extraction by exploiting available man-made lexical resources or using manual annotation of either the input data or the automatically extracted lists. There is considerable scope for proposals of standard evaluation metrics, test and training data and for task-based evaluation.

* Identification of MWEs:

Whilst there has been considerable research on extraction, less attention has been paid to determining if a candidate multiword token is in fact a genuine multiword, or simply a regular compositional occurrence of the words that can comprise a multiword e.g. "She looked up the road" vs "She looked up his telephone number".

* The benefits of MWE identification and treatment for applications:

Papers are encouraged which expose the problems that MWEs pose for specific applications and solutions to these problems.

Submission Information:

Deadline for paper submissions: June 5, 2004

All submissions will be subject to the normal peer review process for this journal.

We recommend that papers do not exceed 15 pages, and they must conform to the Computer Speech and Language specifications, which are available at . Submission are to be done electronically, by sending the paper to both: the editors, mailing , and the journal, using the on-line submission facility in .

Any initial queries should be addressed to