ACL-2003 Workshop on


endorsed by the ACL Special Interest Group on the Lexicon (SIGLEX)

12 July 2003, Sapporo, Japan

Workshop Description

Multiword expressions (MWEs) include a large range of linguistic phenomenon, such as phrasal verbs (e.g. "add up"), nominal compounds (e.g. "telephone box"), and institutionalized phrases (e.g. "salt and pepper"), and they can be syntactically and/or semantically idiosyncratic in nature. MWEs are used frequently in everyday language, usually to express precisely ideas and concepts that cannot be compressed into a single word.

A considerable amount of research has been devoted to this subject, both in terms of theory and practice, but despite increasing interest in idiomaticity within linguistic research, there is still a gap between the needs of NLP and the descriptive tradition of linguistics. Owing to the lack of adequate resources to identify and treat MWEs properly, they pose a real challenge for NLP. Most real-world applications tend to ignore MWEs or address them simply by listing. However, it is clear that successful applications will need to be able to identify and treat them appropriately. This particularly applies to the many applications which require some degree of semantic processing (e.g. machine translation, question-answering, summarisation, generation).

In recent years there has been a growing awareness in the NLP community of the problems that MWEs pose and the need for their robust handling. A considerable amount of research has been conducted in this area, some within large research projects dedicated to MWEs (e.g. the Multiword Expression Project). There is also a growing interest in MWEs in projects focused on tasks such as parsing (e.g. Robust Accurate Statistical Parsing (RASP)) and word sense disambiguation (e.g. MEANING - Developing Multilingual Web-scale Language Technologies) which are required by real-world applications.

Previous workshops on MWEs have focused on certain MWE types, notably collocations, terminology and named entities. There are, however, further subtypes of MWEs, which are highly relevant for NLP tasks but which have not to date received specific attention. One example are lexicalised (non- or semi-compositional) MWEs which raise specific issues for applications which require semantic interpretation.

Target Audience

This workshop is intended to bring together NLP researchers working on all areas of MWEs. The objective is to summarise what has been achieved in the area, to establish common themes between different approaches, and to discuss future trends, with particular emphasis on addressing the problems that different MWE (sub)types pose for real-world NLP applications.

Areas of Interest

Papers are invited on, but not limited to, the following topics:

Papers can cover one or more of these areas.

Workshop Program

9:00-9:05   Welcome

9:05-9:30   Complex Structuring of Term Variants for Question Answering

James Dowdall, Fabio Rinaldi, Fidelia Ibekwe-SanJuan, and Eric SanJuan

9:30-9:55   Conceptual Structuring through Term Variations

Béatrice Daille

9:55-10:20   Noun-Noun Compound Machine Translation: A Feasibility Study on Shallow Processing

Takaaki Tanaka and Timothy Baldwin

10:50-11:15   Using Masks, Suffix Array-based Data Structures and Multidimensional Arrays to Compute Positional Ngram Statistics from Corpora

Alexandre Gil and Gaël Dias

11:15-11:40   A Language Model Approach to Keyphrase Extraction

Takashi Tomokiyo and Matthew Hurst

11:40-12:05   Multiword Unit Hybrid Extraction

Gaël Dias

12:05-12:30   Extracting Multiword Expressions with a Semantic Tagger

Scott Piao, Paul Rayson, Dawn Archer, Andrew Wilson, and Tony McEnery

14:00-14:25   Verb-Particle Constructions and Lexical Resources

Aline Villavicencio

14:25-14:50   A Statistical Approach to the Semantics of Verb-Particles

Colin Bannard, Timothy Baldwin, and Alex Lascarides

14:50-15:15   Detecting a Continuum of Compositionality in Phrasal Verbs

Diana McCarthy, Bill Keller, and John Carroll

15:15-15:40   A Disambiguation Method for Japanese Compound Verbs

Kiyoko Uchiyama and Shun Ishizaki

16:10-16:35   An Empirical Model of Multiword Expression Decomposability

Timothy Baldwin, Colin Bannard, Takaaki Tanaka, and Dominic Widdows

16:35-17:00   Licensing Complex Prepositions via Lexical Constraints

Beata Trawinski

17:00-17:30   Discussion

Workshop Chairs

Francis Bond
NTT Communication Science Laboratories, Japan

Anna Korhonen
University of Cambridge, UK

Diana McCarthy
University of Sussex, UK

Aline Villavicencio
University of Cambridge, UK

Program Committee

Anne Abeillé   (Université Paris 7, France)
Timothy Baldwin   (Stanford University, USA)
Ted Briscoe   (University of Cambridge, UK)
Nicoletta Calzolari   (Istituto di Linguistica Computazionale, Italy)
Tony Cowie   (University of Leeds, UK)
Ido Dagan   (Bar-Ilan University, Israel)
Christiane Fellbaum   (Princeton University, USA)
Chuck Fillmore   (UC Berkeley, USA)
Nancy Ide   (Vassar College, USA)
Kyo Kageura   (National Institute of Informatics, Japan)
Brigitte Krenn   (Austrian Research Institute for Artificial Intelligence, Austria)
Mirella Lapata   (University of Edinburgh, UK)
Simonetta Montemagni   (Istituto di Linguistica Computazionale, Italy)
Kentaro Ogura   (NTT Cyber Space Laboratories, Japan)
Darren Pearce   (University of Sussex, UK)
Ivan Sag   (Stanford University, USA)
Tom Wasow   (Stanford University, USA)
Annie Zaenen   (PARC, USA)


This workshop is supported by

