FINAL REPORT

The Acquilex projects were funded by the European Commission under the Basic Research initiative. The goal of the first project was to explore the utility of constructing a multilingual lexical knowledge base from machine-readable versions of conventional dictionaries. The second project extended this goal by exploring the utility of machine readable textual corpora as a source of lexical information not coded in conventional dictionaries, and by adding dictionary publishing partners to exploit the lexical database and corpus extraction software developed by the projects for conventional lexicography.

The goals/deliverables of both projects were met or exceeded and the projects have generated a large number of working papers of which over 50% have already been published in refereed journals or conference proceedings. One book has appeared and another is planned. Three workshops involving participants from the IT and dictionary publishing industries and from academia were organised. There are several follow up projects, and ideas and software developed under Acquilex are currently being exploited in several ongoing projects to develop a new generation of lexical databases to support the publication of learners' dictionaries as well as provide more general lexical resources for the language industries.

Information and Results

Exploitation of Results

As Acquilex-II was a Basic Research project with an emphasis on medium-term development, this report will be brief.

The overall goals of the project were met or exceeded and several significant follow-ups in the form of continuing research and industrial exploitation are already underway.

Two new European Community projects have been approved for funding under the Luxembourg based

  • Linguistic Resources and Engineering (LRE) initiative. SPARKLE is a project involving three of the Acquilex partners aiming to further develop robust parsing technology in a multilingual context for the extraction of lexical information from textual corpora. This project builds on corpus extraction software developed under Acquilex-II. It should enable us to solve some of the remaining problems of resource efficient acquisition of a predicate's argument structure as well as support the development of multilingual information retrieval systems. EuroWordNet is another project involving several Acquilex partners which aims to develop a generic multilingual database containing semantic relations between words for several European languages (English, Dutch, Italian, and Spanish). The networks will be developed using existing resources as far as possible, including resources developed under Acquilex, and will be linked to the WordNet database developed at Princeton, thus providing a tool to improve information retrieval for languages other than English.

    The dictionary publishing industrial partners in Acquilex are exploiting Acquilex developed software and ideas concerning lexical representation in ongoing projects to develop lexical databases to support the production of learners' dictionaries and to provide lexical resources for the language industries in general. The Cambridge Language Survey lexical database at Cambridge University Press was used as the basis for their recently published and well received new learners' dictionary, CIDE. This database is a multi-user commercial dictionary development environment which incorporates many ideas from the prototype Acquilex Lexical Database and Lexical Knowledge Base software. The Cambridge system is available on a commercial basis to other dictionary publishers. If the ideas from Acquilex concerning uniform coding of lexical information and integration of corpus evidence embodied in this system are widely adopted, this could lead to a new generation of commercially developed compatible lexical databases for the community languages.

    More than 120 working papers were produced by the two projects and of these over 50% have already been published in refereed conference proceedings or journals, with more in the pipeline. One book has been published and another is planned. This creates a significant research resource which will be accessible to the community (directly or indirectly) via the World Wide Web. The results of the Acquilex projects have been influential in the development of a number of international initiatives and projects, including the Text Encoding Initiative (TEI), COMLEX, COMPASS, DELIS, and EAGLES. There are signs that several IT companies are exploiting the results of Acquilex on the basis of this public-domain information; for example, Microsoft (Natural Language Processing group) is making extensive use of machine readable dictionaries in the development of multilingual lexicons for parsing and cite Acquilex publications in their research reports. Several theses have been based on work associated with Acquilex, and many researchers who were employed on Acquilex are now at major commercial and academic centres around the world, further improving the dissemination of results (see Where are they now? for a partial list of ex-Cambridge Computer Laboratory people).

    Much Acquilex software is available for third party use and a number of universities and companies are currently utilising project software in their own lexical research and development, including Sharp Laboratories of Europe, Rank Xerox European Research Centre, Apple Computer Inc., British Telecom, France Telecom, Northwestern University, University of the Basque Country, Copenhagen Business School, University of Quebec at Montreal, Brandeis University, University of Edinburgh, and others.