If you are interested in any of the proposals below, please email Ann Copestake. The deadline for project choice has been extended so the NLIP preference form can be submitted by November 24. We expect students to meet with potential supervisors before making their project selection, at least for first choice projects.

Neural network simulations of the recognition of semantically ambiguous words

Proposer: Matt Davis
Supervisor: Matt Davis and Ann Copestake

Description

This is a psycholinguistic project involving neural network simulation of the recognition of semantically ambiguous words (e.g., `bank') and the effect of prior and recent experience on meaning selection for these words (e.g., `river' vs `money'). It extends work carried out by Jenni Rodd and Matt Davis. Good Matlab skills would be necessary since the initial stage of the project would involve using existing Matlab code. Later stages would involve experimenting with different network architectures or learning algorithms.

Rodd, J.M., Lopez Cutin, B., Millar, A., Kirsch, H., Davis, M.H. (2013) Long-term priming of the meanings of ambiguous words. Journal of Memory and Language, 68(2), 180-198

Rodd, J. M., Gaskell, M. G., & Marslen-Wilson, W. D. (2004). Modelling the effects of semantic ambiguity in word recognition. Cognitive Science, 28, 89–104.

email me (Ann Copestake) if you want pdfs of the above.

Language development for deep learning agents in a cooperative task

Proposers: Alexander Kuhnle and Ann Copestake
Supervisor: Ann Copestake with Alexander Kuhnle

Description

Work in natural language processing often focuses on training a machine to understand or produce human language like e.g. English. In this project, however, we want to investigate general language development between machine agents. The common approach in the field of language evolution (e.g. Steels and Belpaeme) is to set up a task for two (or more) agents with a cooperative goal which can be achieved only if both agents communicate successfully. While the communication channel is pre-defined, it is left to the agents what information they share and in what way they encode it in language.

In a recent paper, Lazaridou et al. present such an investigation of language development between two agents. Both agents were shown two images and the task was for the first to describe one of the images in a word-like vector representation. The other agent didn't know which image the first agent picked and had to guess based on the description vector. Both agents were implemented as deep learning frameworks with convolutional nets to process the image.

The aim of this project is to implement a similar setup, with two deep learning agents solving a cooperative task by successfully developing a language for communication. This implementation will preferably be in Python3 using Tensorflow. A first step is then to replicate Lazaridou et al's results on the datasets they used. In a second step, we want to look at abstract images with a richer and more diverse structure than the ones in their experiments, and in doing so address an issue they noted - that the agents are surprisingly successful even when restricted to a vocabulary size of two words. Their analysis gives rise to the idea that a richer image dataset prevents the agents from cheating, i.e. developing a very specialised language to solve the problem, and hence hopefully enforces the development of a more natural language.

The distributional effect of meaning differences in adjective usages

Proposers: Alexander Kuhnle and Ann Copestake
Supervisor: Ann Copestake with Alexander Kuhnle

For many years, distributional semantics - the idea that the meaning of words can be approximated by their context in text - has been established as a very successful approach to lexical semantics for many task. However, it is clear that by "averaging" over all occurrences of a specific word in text, one misses many meaning variations which depend on the concrete position in (the syntactic structure of) a sentence and on the other words involved in it. Different word senses of some words ("river/financial bank") or idiomatic phrases ("kick the bucket") are obvious examples, but there are also more systematic and productive meaning variations in, for instance, noun compounds.

In this project, we want to look at another well-known source of meaning variation in syntactic structure - the difference between adjectives in attributive ("the red car") vs predicative ("the car is red") position. where the former is much more prone to idiomatic constructions (consider e.g. "sore loser", "bad luck", etc). The aim is to investigate the difference between attributive adjectives and the corresponding predicative ones within distributional semantics. Preliminary work by Kuhnle has shown that there seem to be substantial differences in many cases, but did not make the nature or significance of these differences clear.

To be able to distinguish the two usages of adjectives, we will extract the distributional information from a DMRS-parsed corpus (Dependency minimal recursion semantics (DMRS) is a semantic representation of sentences). A first step is then to analyse the difference between attributive and predicative adjective vectors and investigate whether this difference is substantial, e.g. by contrasting it to the difference to another random or synonymic adjective. Afterwards, we want to analyse whether this effect has an effect on down-stream tasks, for instance, following Boleda et al. and their work on adjective-noun composition of distributional vectors. A qualitative analysis of some adjective pairs with large differences of attributive-predicative usage could also be attempted.

Learning vector representations from dependency graphs

Proposers: Guy Emerson and Ann Copestake
Supervisor: Ann Copestake with Guy Emerson

Description

Vector representations of semantics are currently very popular in NLP, with Word2Vec (Mikolov et al., 2013) being particuarly well-known. More recently, Levy and Goldberg (2014) extend the skip-gram training objective to use dependency arcs as contexts, rather than a window of adjacent tokens. However, both of these approaches simply use the dot-product of the target vector and the context vector, which makes the assumption that the two should be similar. For many dependency contexts (such as subject-verb, or adjective-noun), we would intuitively expect the two vectors to be related in some way, but not necessarily be *similar*. Several authors have tried to tease apart these notions of relatedness and similarity (Agirre et al., 2009; Hill et al., 2015). The aim of this project is to extend a skip-gram model by treating different types of context differently. Rather than directly taking the dot product of the target word's vector and the context word's vector, we can first multiply the context vector by a matrix according to the type of dependency arc. In other words, we use a different inner product for each type of dependency. By jointly learning the word vectors and the dependency matrices, we would be able to not only use dependency information in learning word representations, but also learn how different types of dependency interact with word meaning.

A first step could be to replicate Levy and Goldberg's work, before trying the joint learning of word vectors and dependency matrices.

Links:
Mikolov et al., 2013
Levy and Goldberg, 2014
Agirre et al., 2009
Hill et al., 2015