Computer Laboratory

The Dictionary Challenge

Learning Semantic Composition with Dictionaries

The Dictionary Challenge is a resource for training and evaluating models that learn the meaning of phrases and sentences.

The dataset consists of just over 852,000 (defintion-word) pairs. Models can be trained to map from the definitions to the target words. Here are some examples from the training set:

Definition Word
in a comical manner comically
the last place as on a list bottom

Because the defined word should be a good reflection of what the definition phrase means, if you train a model with this data it should learn to 'compose' the meaning of the words in a sensible way.

Download training definitions

Pre-trained word embeddings

Our initial approach to the problem involved representing the target word with a pre-trained word embedding. This is not necessary, but if you do choose to do this, you can use the same embeddings as a target for your model by downloading them below. These embeddings were acquired by a CBOW architecture using the Word2Vec software from the large training corpus downloaded from here.

Download pre-trained word embeddings

Test sets

We provide several ways to evaluate the performance of trained models.

The first uses held-out definitions. For definitions that are either seen (retrieval) or unseen (generalization) by the model during training, see how highly the model ranks the correct target word among a vocabulary of over 66,000 possible words.

For a slightly harder test, the same procedure can be applied with test items that are concept descriptions rather than formal definitions. For 200 random words, we asked crowdsourced subjects to write short, informal descriptions of those words (to mimic the input of a reverse dictionary user).

Finally, the same procedure can be applied to general knowledge crossword clues. We provide a selection of clues taken from the Guardian quick crossword. For this evaluation, we allow models access to knowledge about the length (number of letters) in the correct answer, which can be used to exclude candidates before ranking.

Download test data


If you use this data, please cite the paper below, where you can also find the performance of several baseline neural language models on the test sets.

Hill, F. Cho, KH., Korhonen, A., and Bengio, Y. Learning to Understand Phrases by Embedding the Dictionary. 2016. Transactions of the Association for Computational Linguistics (TACL).