Multiword expressions (MWEs) include a large range of linguistic phenomenon, such as phrasal verbs (e.g. "add up"), nominal compounds (e.g. "telephone box"), and institutionalized phrases (e.g. "salt and pepper"), and they can be syntactically and/or semantically idiosyncratic in nature. MWEs are used frequently in everyday language, usually to express precisely ideas and concepts that cannot be compressed into a single word.
A considerable amount of research has been devoted to this subject, both in terms of theory and practice, but despite increasing interest in idiomaticity within linguistic research, there is still a gap between the needs of NLP and the descriptive tradition of linguistics. Owing to the lack of adequate resources to identify and treat MWEs properly, they pose a real challenge for NLP. Most real-world applications tend to ignore MWEs or address them simply by listing. However, it is clear that successful applications will need to be able to identify and treat them appropriately. This particularly applies to the many applications which require some degree of semantic processing (e.g. machine translation, question-answering, summarisation, generation).
In recent years there has been a growing awareness in the NLP community of the problems that MWEs pose and the need for their robust handling. A considerable amount of research has been conducted in this area, some within large research projects dedicated to MWEs (e.g. the Multiword Expression Project). There is also a growing interest in MWEs in projects focused on tasks such as parsing (e.g. Robust Accurate Statistical Parsing (RASP)) and word sense disambiguation (e.g. MEANING - Developing Multilingual Web-scale Language Technologies) which are required by real-world applications.
Previous workshops on MWEs have focused on certain MWE types, notably collocations, terminology and named entities. There are, however, further subtypes of MWEs, which are highly relevant for NLP tasks but which have not to date received specific attention. One example are lexicalised (non- or semi-compositional) MWEs which raise specific issues for applications which require semantic interpretation.
This workshop is intended to bring together NLP researchers working on all areas of MWEs. The objective is to summarise what has been achieved in the area, to establish common themes between different approaches, and to discuss future trends, with particular emphasis on addressing the problems that different MWE (sub)types pose for real-world NLP applications.
Papers are invited on, but not limited to, the following topics:
Papers can cover one or more of these areas.
9:05-9:30 Complex Structuring of Term Variants for Question Answering
James Dowdall, Fabio Rinaldi, Fidelia Ibekwe-SanJuan, and Eric SanJuan
9:30-9:55 Conceptual Structuring through Term Variations
9:55-10:20 Noun-Noun Compound Machine Translation: A Feasibility Study on Shallow Processing
Takaaki Tanaka and Timothy Baldwin
10:50-11:15 Using Masks, Suffix Array-based Data Structures and Multidimensional Arrays to Compute Positional Ngram Statistics from Corpora
Alexandre Gil and Gaël Dias
11:15-11:40 A Language Model Approach to Keyphrase Extraction
Takashi Tomokiyo and Matthew Hurst
11:40-12:05 Multiword Unit Hybrid Extraction
12:05-12:30 Extracting Multiword Expressions with a Semantic Tagger
Scott Piao, Paul Rayson, Dawn Archer, Andrew Wilson, and Tony McEnery
14:00-14:25 Verb-Particle Constructions and Lexical Resources
14:25-14:50 A Statistical Approach to the Semantics of Verb-Particles
Colin Bannard, Timothy Baldwin, and Alex Lascarides
14:50-15:15 Detecting a Continuum of Compositionality in Phrasal Verbs
Diana McCarthy, Bill Keller, and John Carroll
15:15-15:40 A Disambiguation Method for Japanese Compound Verbs
Kiyoko Uchiyama and Shun Ishizaki
16:10-16:35 An Empirical Model of Multiword Expression Decomposability
Timothy Baldwin, Colin Bannard, Takaaki Tanaka, and Dominic Widdows
16:35-17:00 Licensing Complex Prepositions via Lexical Constraints
NTT Communication Science Laboratories, Japan
University of Cambridge, UK
University of Sussex, UK
University of Cambridge, UK
Anne Abeillé (Université Paris 7, France)
Timothy Baldwin (Stanford University, USA)
Ted Briscoe (University of Cambridge, UK)
Nicoletta Calzolari (Istituto di Linguistica Computazionale, Italy)
Tony Cowie (University of Leeds, UK)
Ido Dagan (Bar-Ilan University, Israel)
Christiane Fellbaum (Princeton University, USA)
Chuck Fillmore (UC Berkeley, USA)
Nancy Ide (Vassar College, USA)
Kyo Kageura (National Institute of Informatics, Japan)
Brigitte Krenn (Austrian Research Institute for Artificial Intelligence, Austria)
Mirella Lapata (University of Edinburgh, UK)
Simonetta Montemagni (Istituto di Linguistica Computazionale, Italy)
Kentaro Ogura (NTT Cyber Space Laboratories, Japan)
Darren Pearce (University of Sussex, UK)
Ivan Sag (Stanford University, USA)
Tom Wasow (Stanford University, USA)
Annie Zaenen (PARC, USA)
Workshop registration information is available from the ACL-2003 website. Note that the deadline for Early Registration is June 14. The registration fee will include attendance at the workshop and a copy of workshop proceedings.
This workshop is supported by