This file gives brief descriptions of Acquilex software and details of its availability to third parties (as of November 1994).

(Contact: Piek Vossen,

- Parser for the English definitions in LDOCE

Description: Gives constituent structure and predicate argument
relations of the definitions in the form of a labelled and bracketed
tree. Three versions have been developed for the definitions of nouns,
verbs and adjectives. Applied to all senses of LDOCE. Output is stored
as an LDOCE derived dictionary in the LDB.  Technicalities: Parser
developed with Atlas parser-generator, runs on VAX VMS and uses Atlas
run-system and a lexicon.

- Parser for the Dutch  definitions in Van Dale

Description: Same as above but restricted to noun definitions. Has
been applied to 2000 food and drink denoting senses and a random
sample of 3000 noun senses Output is stored as a Van Dale derived
dictionary in the LDB.  Technicalities: Parser developed with Atlas
parser-generator, runs on VAX VMS and uses Atlas run-system and a

- Extract

Description: Converts parse trees which have the form of labelled and
bracketed trees into flat relational lists expressing the logical form
underlying the defintiions and abstracting from non-semantic syntactic
surface structures. Intertactive version and batch mode is
provided. In the interactive mode any specified (menu-driven) relation
is extracted from a parse-tree in the batch mode all present relations
are extracted.  Technicalities: Developed in Pascal on VAX VMS and
developed in Procyon Common Lisp, MacIntosh, LDB.

- Word Devil

Description: browses through hierarchical relation in
dictionaries. Compares hierarchies cross-linguistically, disambiguates
senses of genus words, detects circularities and genusowrd gaps,
outputs taxonomy structures.  Technicalities: Developed in Pascal on
VAX VMS using L-tree lexicons, developed in Procyon Common Lisp,
MacIntosh, LDB and in C on Unix using L-tree lexicons.

- Trans

Description: Creates tlinks between LKB fragments using mono- and
bilingual dictionaries and genus lexicons loaded in the LDB.
Technicalities: Procyon Common Lisp, MacIntosh, LDB

All software available

(Contact: Horacio Rodriguez,

MACO: Morphological Analyser Corpus-Oriented

Description: MACO (Morphological analyser Corpus-Oriented) is a tool
for morphological analysis of corpora.  MACO has been designed to
attach as much morphological information as possible (of course the
part of speech but also other information, depending on the linguistic
source) to every word in the input text.

MACO has been conceived and designed as a general purpose morphological 
tool although the current implementation of the system (and the involved  
Data Sources) is devoted to the morphological analysis of Spanish  texts.

MACO may be considered as a toolbox. Therefore, it allows the user to taylor it 
to the desired working environment.

MACO allows different ways of integrate different sets of morphological 
analysers with different coverage and usually adapted to specialized tasks. 
Current Spanish version includes:   SegWord,  Amcas, Formario, Number, 
Accumulate, Proper-noun, Initial and Default-cats.

Software is fully available. Data sources are available with the exception of 
Vox dictionary that would require an agreement from Biblograf.

LDB/LKB Integration Software

Description: The central aim of this software, is to provide tools for
loading intermediate and relative stable versions of lexicons
developed in the LKB into the LDB, allowing, in this way, flexible
database-like access/search to entries based on any aspect of their
contents. The system is fully compatible with LKB functionalities and
its display capabilities are adaptad to the special characteristics of
the material (FS) to be displayed.

Fully available.

TGE: Tlinks Generation Environment.

Description: TGE (Tlinks Generation Environment) is a software system
designed and built in order to provide a way of constructing Tlinks
semi-automatically from LKB data and bilingual dictionaries loaded in
the LDB. The system allows several forms of extraction, depending on
the classes of tlinks to be produced, the involved data sources and
the degree of human intervention.

Fully available.

SAIBT (Semi Automatic Index Building Tool)

Description: The aim of SAIBT (Semi Automatic Index Building Tool)
software system is to help users to index MRDs (Machine Readable
Dictionaries) within the LDB environment in a user-friendly way, due
to the problems the users have found for building the Dictionary and
Interface Definition. No need to mention that a good knowledge of the
LDB software and environment is absolutely unavoidable, in order to
fully understand the functionality of SAIBT and be able to use
it. More specifically, SAIBT computes the *interface- definitions* and
the *dictionary-definitions* of the dictionary to be indexed in an
easy way and saves them into a file that needs loading when indexing
the dictionary. It also offers the user some help in the process of
designing the extract functions and display function.

Fully available.

(Contact: Ted Briscoe,

The Lexical Data Base (LDB) System

Description: The LDB is a specialised database system implemented in
Common Lisp with a graphical user interface (X-windows, Macintosh)
enabling fast and flexible access to MRDs via index files. It is fully
described in Boguraev, B and Briscoe (eds) "Computational Lexicography
for Natural Language Processing" Longman, 1989 and by Carroll, J in
Sanfilippo (ed) "The (Other) Acquilex Papers", Cambridge Computer
Laboratory, TR-253, 1992.

The Lexical Knowledge Base (LKB) System 

Description: The LKB is an implementation of a multilingual lexicon
system in Common Lisp with a graphical user interface (Allegro CL with
Common Windows, Procyon CL, MCL). The lexicon is structured as 
a multiple default inheritance hierarchy of typed feature structures to which
lexical and morphological rules and translation links can be
applied. It is fully described in Briscoe, Copestake and de Paiva
(eds) "Inheritance, Defaults and the Lexicon" CUP, 1993 and various
working papers.  A standalone version (Stuffit archive)
is currently available for older Macs, a new MCL version will replace
this shortly.

The Acquilex PoS Tagger

Description: A HMM (bigram) part-of-speech tagger implemented in C
with options to apply the Viterbi or Forward-Backward algorithms for
direct or maximum likelihood estimation of transition and/or lexical
probabilities from tagged or untagged training data. This is fully
described in Elworthy, D Part of Speech Tagging, Acquilex-II Working
Paper 10 and also in Elworthy, D. Does Baum-Welch Re-estimation Help
Taggers? Proc. of ANLP-94.


Description: A system implemented in Common Lisp for automatically
assigning tlinks between monolingual LKB entries from lexicons for
different languages developed using a common type system. The system
uses the delta rule to find the best match between alternative
possible pairs training itself on unambiguous cases. It requires a
bilingual MRD or word list pairing potentially translation equivalent
word forms between the two languages. It is described fully in
Copestake et al, Multilingual Lexical Representation, Acquilex-I
Working Paper 43.

Availability: All Cambridge Software is available with full
documentation. It is free to bona fide researchers based in non-profit
educational organisations for approved research on the basis of
acknowledgement of its use and agreement to not distribute it further.
It is available to commercial organisations for research and
commercial use by negotiation, which usually involves a donation to the
Computer Laboratory.

(Contact: Nicoletta Calzolari,


SO-extractor    Parsing System for the              Written in a proprietory
                extraction of typical subjects      code for the IBM-VM
                and objects from the definitions    Operating System
                and the example sentences
                contained in Italian
                monolingual and bilingual
                Machine Dictionaries.               [NOT available]

SO-identifier   Automatic Self-learning System      Written in C Language for
                for the Identification of  Subject  the Unix environment
                and Object in Italian. It makes
                use of morphological, syntactic,
                lexico-semantic and pragmatic
                It is based on principles of
                linguistic analogy.
                Training Input: output list
                produced by the SO-extractor
                Test Input: list of Italian
                sentences preprocessed by a
                proprietory Italian Grammar
                Output: identification of subject
                and object relations.               [Available]

SO-disambiguator  Automatic Self-learning System    Written in C Language for
                  for the disambiguation of         the Unix environment
                  Subject and Object assignment
                  in Italian. It makes use of
                  lexico-semantic knowledge and
                  taxonomical generalizations.
                  It is based on principles of
                  linguistic analogy.
                  Input and Output are
                  structurally similar to the input
                  and output of the
                  SO-identifier system, except for
                  a specific focus on lexico-
                  semantic knowledge integrated
                  by taxonomical information.       [Available]

GENUS-extractor   Semantic Parsing System for       Written in a proprietory
                  the extraction  of the Genus      code for the IBM-VM
                  terms and the semantic            Operating System
                  relations linking them to the
                  definiendum. It operates on
                  Noun and Verb definitions of
                  Italian monolingual
                  Input: output of a proprietory
                  Italian Grammar  [S. Montemagni,
                  1995, Subject and Object Assignment
                  in Italian. PhD dissertation,
                  UMIST Manchester, in preparation]
                  Output: genus/relation lists      [NOT available]

PALCO: Phrasal    Core of a parsing system for      Written in
Analyzer for      the analysis of real texts, in    MacCommonLisp for the
Large COrpora     terms of their syntactic features MacIntosh Operating
                  at the phrasal level.             System
                  It  constists of:
                  - a grammar doing the analysis
                  - an interface that allows the
                  user to customize his parsing
                  It is based on PGDE [AITech, 1992,
                  PGDE User Manual, AIT.TR].
                  It is interfaced with the DMI
                  (Italian Machine Dictionary) of
                  the ILC.
                  Input: Italian text
                  Output: syntactic structures of   [Available on
                  the text in terms of its phrasal  conditions to be
                  constituents.                     stipulated]

(Contact: Margreet Moerland,

Co-Co... is a full-screen multi-file editor with built-in corpus tools.
The editor allows you to create, edit and save ASCII text files.
The corpus tools allow you to create dictionary entries, word
frequencies and find collocations. Co-Co... implements a small fast editor.
Co-Co... is a graphical user interface for:
- KWIC lists
- Frequency lists
- Z-score calculation
- Complex collocations calculation [Stassen:10]
- Viewing: BVD-list [Stassen:7]
           FRQ-list [Stassen:7]
           F-score list [Stassen:8]

Co-Co... runs on the IBM-PC family of computers. Co-Co... requires at
least 640K to run smoothly. It runs on any 80-column monitor. The
minimum requirement is at least one VDL-microCorpus and a harddisk.
Co-Co... also supports a mouse. It is however possible to enlarge the
power of Co-Co... with other VDL-microCorpora.

MicroCorpora have to be tagged with part-of-speech and lemma.

Available to research institutions and institutions participating in
CEC projects as indicated in ESPRIT contracts at no cost, at
commercial conditions to all other third parties.