Suggestions for Part II projects

I am interested in supervising projects in the general areas of artificial intelligence and machine learning, so if you have suggestions of your own other than the following then please do contact me.

Project suggestions for 2012/13

1. Monte carlo tree search

Until very recently the game of Go was regarded as an insurmountable challenge for AI research, primarily as its branching factor is vastly higher than that for games such as chess and no technique providing a significant advantage over α-β minimax based search was known. Recently the introduction of Monte Carlo Tree Search has seen vast improvements made to computer Go players; the advances have in fact been so significant that even very recently they would have been unthinkable.

If you are already a good Go player (and this is a hard constraint to which I will make no exceptions) then a project may easily be run around reproducing a good level of play. However if you are not a Go player then it is of interest to explore how the new search technique might be applied to other games or other problems.

2. Matlab toolbox for transductive and semi-supervised learning

Matlab is now the default language for the development of machine learning algorithms at the prototyping and research stage (see further projects below) and has a large number of toolboxes consisting of code for particular application areas such as neural networks, signal processing etc.

While there are many toolboxes aimed at supervised learning, the situation for transductive and semi-supervised learning — essentially approaches to learning from a mixture of labelled and unlabelled data — is very different. A project to develop a Matlab toolbox for these algorithms would be extremely useful for applied research in, for instance, the next project in this list.

3. Machine learning for stem cell research

I have an ongoing collaboration with Dr Matt Trotter, who has large quantities of real data related to learning problems in stem cell research. There is potentially an opportunity here to construct projects around many different areas of machine learning.

4. Bayesian learning and margin methods

A somewhat more open ended project. Bayesian inference has seen very significant success in recent years. Its closest competitors are support vector machines and boosting, both of which can be understood in terms of the "margins" that they achieve on training examples. Despite large quantities of theoretical research it is not clear at present how far the margin interpretation goes toward explaining the effectiveness of SVMs and boosting, or to what extend the same ideas can be used in the context of Bayesian learning. Can you construct simulation experiments that clarify matters?

5. Functional programming and machine learning

Traditionally machine learning at the research stage has been implemented in Matlab, and at the later stages in languages such as C and C++. This was initially due to efficiency and familiarity. However the current Glasgow Haskell Compiler can compete well with C in terms of efficiency, and the advantages of functional coding are well-known. This begs the question of the extent to which Haskell might represent a better language for machine learning in the future, compared with imperative and object-oriented languages. There are three projects that immediately come to mind.

Functional linear algebra

Linear algebra has a ubiquitous presence in machine learning. My own investigations in using Haskell have so far implemented linear algebra using the HMatrix library, which is essentially a functional interface to BLAS, LAPACK and other existing libraries written in imperative languages. The advantages of functional programming should apply to linear algebra as to anything else.

Can you develop a good purely functional linear algebra library? While doing this in C would not make a very interesting project, Haskell makes it much more interesting. To do it well you will need to address the fact that C etc. allow constant time access to arrays and allow you to update the contents of an array easily. As Haskell is a pure, lazy language it has immutible arrays and constant time access is not straightforward. Also, as the language has no side-effects updating has to be handled carefully. Its Map library improves matters by using a well-implemented tree-based map data structure, however I suspect that to implement matrices really well probably requires use of the ST monad and some very careful coding.

Can you develop a comprehensive library with efficiency comparable to that of those used by HMatrix?

Functional inductive logic programming

Inductive logic programming is an approach to machine learning based on first order logic. Existing libraries are implemented predominantly in C, however the fact that learning involves significant manipulation of first order formulae suggests that a functional approach might be more convenient. Can you develop an implementation comparable in efficiency to existing libraries?

Haskell for general machine learning

I am interested in general in the use of Haskell as a language for machine learning, and would be interested in supervising projects which involve the development of functional implementations of learning algorithms and the comparison of the performance of the resulting code with traditional approaches.

6. AI and interactive fiction

I have always thought that interactive fiction provides a fantastic array of opportunities for experimenting with AI.

Interactive fiction has died out somewhat since its heyday in the 1980s, probably because computer games have for obvious reasons taken a more graphical route. The idea was simple in essence, but extremely hard to implement convincingly: it was to make textual books wherein one could interact with the story by typing commands describing what your character should do. Typically they involved puzzle solving as a key element. In response to a command, you would get a description of what happened, and the object was typically to complete some task within a textually described world. If you're completely unaware of this kind of game, take a look at The Hitchhiker's Guide to the Galaxy, which is based on the book of the same name and was developed by Infocom, who were probably the leading company at the time.

From an AI perspective, the challenges should be clear. One needs in essence to model a convincing world, the state of which is updated in response to commands supplied in natural language, and for which the results of actions are conveyed in natural language. At the very least one might expect this to require considerable effort in effective knowledge representation: we need to know that a small wooder box can contain a pencil but not a bus, and also not anything gaseous, although for different reasons. Of course, once you want to include other characters in the story as AI agents things get really tricky.

If this genre of gaming interests you, I'm sure we can put together a project proposal addressing some relevant area.

7. Gaussian process inference for drug design

We have recently investigated the use of multiple kernel learning algorithms for drug design. The principle is simple: a chemical can be described as a collection of features related to its structure, and the learning problem is to learn to identify specific properties that might be useful, such as toxicity. Multiple kernel learners are attractive as they can provide more information than standard learning algorithms. We'd now like to repeat the investigation using an alternative based on Gaussian process inference rather than the support vector machine.

8. Large-scale machine learning using Gaussian processes

Gaussian processes are a state-of-the-art method for supervised machine learning. One of the limiting factors in their deployment is computational complexity. For some data sets this is not an issue. However massive data sets have started to appear for which it is. There are several algorithms that try to address this by maintaining only a subset of the data, and several possibilities for further developing such algorithms.