ACL 2004 Workshop on

Multiword Expressions: Integrating Processing

26th July, 2004, at ACL 2004  Forum Convention Centre Barcelona, Spain

Workshop Description

In recent years, there has been a growing awareness in the NLP community of the problems that Multiword Expressions (MWEs) pose and the need for their robust handling. MWEs include a large range of linguistic phenomena, such as phrasal verbs (e.g. "add up"), nominal compounds (e.g. "telephone box"), and institutionalized phrases (e.g. "salt and pepper"). These expressions, which can be syntactically and/or semantically idiosyncratic in nature, are used frequently in everyday language, usually to express precisely ideas and concepts that cannot be compressed into a single word.

Most real-world applications tend to ignore MWEs or address them simply by listing. However, it is clear that successful applications will need to be able to identify and treat them appropriately. This particularly applies to the many applications which require some degree of semantic interpretation (e.g. machine translation, question-answering, summarisation, generation) and require tasks such as parsing and word sense disambiguation.

A considerable amount of research has lately been conducted in this area, some within large research projects dedicated to MWEs. In this context, a successful workshop on MWEs was held at ACL 2003, with papers presenting a cross section of research on MWEs. There is some research on MWEs in general. Some is very computational, examining detection and extraction using a variety of methods. Some is more linguistic, focusing on classification of the various types. There is also a lot of research on particular subtypes of MWEs, especially English phrasal verbs.

In this workshop the focus is on papers that integrate analysis, acquisition and treatment of various kinds of multiword expressions (MWEs) in NLP. For example,

(1) research that combines a linguistic analysis with a method of automatically acquiring the classes described

(2) work that combines the computational treatment of a class of MWEs with a solid linguistic analysis

(3) research that extracts MWEs and either classifies them or uses them in some task.

These combinations of research will help to bridge the gap between the needs of NLP and the descriptive tradition of linguistics.

Target Audience

The workshop will be of interest to anyone working on MWEs, e.g. in the areas of computational grammars, computational lexicography, automatic lexical acquisition, machine translation, information retrieval, text mining, and computer-assisted language teaching and learning. The objective is to summarise what has been achieved in the area, to establish common themes between different approaches, and to discuss future trends.

Areas of Interest

Papers are invited on, but not limited to, the following topics:

Papers can cover one or more of these areas, but research that combines different topics is especially encouraged.

Workshop Program

9:30-9:35 Welcome

9:35-10:00 Statistical Measures of the Semi-Productivity of Light Verb Constructions

Suzanne Stevenson, Afsaneh Fazly and Ryan North

10:00-10:25 Paraphrasing of Japanese Light-verb Constructions Based on Lexical Conceptual Structure

Atsushi Fujita, Kentaro Furihata, Kentaro Inui, Yuji Matsumoto and Koichi Takeuchi

10:25-10:50 What is at Stake: a Case Study of Russian Expressions Starting with a Preposition

Serge Sharoff

10:50-11:20 BREAK

11:20-11:45 Translation by Machine of Complex Nominals: Getting it Right

Timothy Baldwin and Takaaki Tanaka

11:45-12:10 MWEs as Non-propositional Content Indicators

Kosho Shudo, Toshifumi Tanabe, Masahito Takahashi and Kenji Yoshimura

12:10-12:35 Multiword Expression Filtering for Building Knowledge

Shailaja Venkatsubramanyan and Jose Perez-Carballo

12:35-14:00 LUNCH

14:00-14:25 Representation and Treatment of Multiword Expressions in Basque

Inaki Alegria, Olatz Ansa, Xabier Artola, Nerea Ezeiza, Koldo Gojenola and Ruben Urizar

14:25-14:50 Multiword Expressions as Dependency Subgraphs

Ralph Debusmann

14:50-15:15 Integrating Morphology with Multi-word Expression Processing in Turkish

Kemal Oflazer, Ozlem Cetinoglu and Bilge Say

15:15-15:45 BREAK

15:45-16:10 Frozen Sentences of Portuguese: Formal Descriptions for NLP

Jorge Baptista, Anabela Correia and Graca Fernandes

16:10-16:35 Lexical Encoding of MWEs

Aline Villavicencio, Ann Copestake, Benjamin Waldron and Fabre Lambeau

16:35-17:30 DISCUSSION
PANEL, e.g. Francis Bond (NTT) and Hitoshi Iida (Tokyo University of Technology)

Organizing Committee

Takaaki Tanaka
NTT Communication Science Laboratories, Japan

Aline Villavicencio
University of Cambridge, UK

Francis Bond
NTT Communication Science Laboratories, Japan

Anna Korhonen
University of Cambridge, UK

Program Committee

Timothy Baldwin   (Stanford University, USA)
Colin Bannard   (University of Edinburgh, UK)
Gael Dias   (Beira Interior University, Portugal)
James Dowdall   (University of Zurich, Switzerland)
Dan Flickinger   (Stanford University, USA)
Matthew Hurst   (Intelliseek, USA)
Stephan Oepen   (Stanford University, USA; University of Oslo, Norway)
Kyonghee Paik   (ATR Spoken Language Translation Research Laboratories, Japan)
Scott Piao   (University of Lancaster, UK)
Beata Trawinski   (University of Tuebningen, Germany)
Kiyoko Uchiyama   (Keio University, Japan)


