ACL 2004 Workshop
on Multiword Expressions: Integrating Processing
26th July, 2004, at ACL 2004 Forum Convention Centre Barcelona, Spain
In recent years, there has been a growing awareness in the NLP community of the problems that Multiword Expressions (MWEs) pose and the need for their robust handling. MWEs include a large range of linguistic phenomena, such as phrasal verbs (e.g. "add up"), nominal compounds (e.g. "telephone box"), and institutionalized phrases (e.g. "salt and pepper"). These expressions, which can be syntactically and/or semantically idiosyncratic in nature, are used frequently in everyday language, usually to express precisely ideas and concepts that cannot be compressed into a single word.
Most real-world applications tend to ignore MWEs or address them simply by listing. However, it is clear that successful applications will need to be able to identify and treat them appropriately. This particularly applies to the many applications which require some degree of semantic interpretation (e.g. machine translation, question-answering, summarisation, generation) and require tasks such as parsing and word sense disambiguation.
A considerable amount of research has lately been conducted in this area, some within large research projects dedicated to MWEs. In this context, a successful workshop on MWEs was held at ACL 2003, with papers presenting a cross section of research on MWEs. There is some research on MWEs in general. Some is very computational, examining detection and extraction using a variety of methods. Some is more linguistic, focusing on classification of the various types. There is also a lot of research on particular subtypes of MWEs, especially English phrasal verbs.
In this workshop the focus is on papers that integrate analysis, acquisition and treatment of various kinds of multiword expressions (MWEs) in NLP. For example,
(1) research that combines a linguistic analysis with a method of automatically acquiring the classes described
(2) work that combines the computational treatment of a class of MWEs with a solid linguistic analysis
(3) research that extracts MWEs and either classifies them or uses them in some task.
These combinations of research will help to bridge the gap between the needs of NLP and the descriptive tradition of linguistics.
The workshop will be of interest to anyone working on MWEs, e.g. in the areas of computational grammars, computational lexicography, automatic lexical acquisition, machine translation, information retrieval, text mining, and computer-assisted language teaching and learning. The objective is to summarise what has been achieved in the area, to establish common themes between different approaches, and to discuss future trends.
Papers are invited on, but not limited to, the following topics:
Papers can cover one or more of these areas, but research that combines different topics is especially encouraged.
9:30-9:35 Welcome
9:35-10:00 Statistical Measures of the Semi-Productivity of Light Verb Constructions
Suzanne Stevenson, Afsaneh Fazly and Ryan North
10:00-10:25 Paraphrasing of Japanese Light-verb Constructions Based on Lexical Conceptual Structure
Atsushi Fujita, Kentaro Furihata, Kentaro Inui, Yuji Matsumoto and Koichi Takeuchi
10:25-10:50 What is at Stake: a Case Study of Russian Expressions Starting with a Preposition
Serge Sharoff
10:50-11:20 BREAK
11:20-11:45 Translation by Machine of Complex Nominals: Getting it Right
Timothy Baldwin and Takaaki Tanaka
11:45-12:10 MWEs as Non-propositional Content Indicators
Kosho Shudo, Toshifumi Tanabe, Masahito Takahashi and Kenji Yoshimura
12:10-12:35 Multiword Expression Filtering for Building Knowledge
Shailaja Venkatsubramanyan and Jose Perez-Carballo
12:35-14:00 LUNCH
14:00-14:25 Representation and Treatment of Multiword Expressions in Basque
Inaki Alegria, Olatz Ansa, Xabier Artola, Nerea Ezeiza, Koldo Gojenola and Ruben Urizar
14:25-14:50 Multiword Expressions as Dependency Subgraphs
Ralph Debusmann
14:50-15:15 Integrating Morphology with Multi-word Expression Processing in Turkish
Kemal Oflazer, Ozlem Cetinoglu and Bilge Say
15:15-15:45 BREAK
15:45-16:10 Frozen Sentences of Portuguese: Formal Descriptions for NLP
Jorge Baptista, Anabela Correia and Graca Fernandes
16:10-16:35 Lexical Encoding of MWEs
Aline Villavicencio, Ann Copestake, Benjamin Waldron and Fabre Lambeau
16:35-17:30 DISCUSSION
PANEL, e.g. Francis Bond (NTT) and Hitoshi Iida (Tokyo University of Technology)
Takaaki Tanaka
NTT Communication Science Laboratories, Japan
Aline Villavicencio
University of Cambridge, UK
Francis Bond
NTT Communication Science Laboratories, Japan
Anna Korhonen
University of Cambridge, UK
Timothy Baldwin (Stanford University, USA)
Colin Bannard (University of Edinburgh, UK)
Gael Dias (Beira Interior University, Portugal)
James Dowdall (University of Zurich, Switzerland)
Dan Flickinger (Stanford University, USA)
Matthew Hurst (Intelliseek, USA)
Stephan Oepen (Stanford University, USA; University of Oslo, Norway)
Kyonghee Paik (ATR Spoken Language Translation Research Laboratories, Japan)
Scott Piao (University of Lancaster, UK)
Beata Trawinski (University of Tuebningen, Germany)
Kiyoko Uchiyama (Keio University, Japan)