I have a long-standing research interest in computational learning theory, a body of work which aims to prove bounds on the performance of learning systems. My primary interest here has been in results applying in some way to the learning curves of supervised learners; that is, graphs of generalization performance against training set size, either in the worst case (with respect, for example, to the statistics of the examples) or in the average case. For worst case results proofs tend to depend on recent work in mathematical statistics; for average case results techniques developed in statistical mechanics are often employed.
The following subjects are at present of particular interest to me:
I have worked for quite a long time on the analysis of error estimation techniques within a framework based on probably approximately correct (PAC) learning. In particular, I am interested in the sample complexity of the cross-validation estimate - which remains a source of several interesting open problems. A prize has been offered by John Langford for a solution to one of the problems of interest.
Some recent work in learning theory has attempted to obtain results of the kind seen in PAC learning specifically within the framework of Bayesian learning. See some of David McAllester's papers for examples.
I am interested in attempts to bring together assorted approaches to obtaining PAC-like bounds. A good introduction can be found here.
I am interested in the general problem of Bayesian inference. Contrary to what some might argue, this interest is not something I consider to be in conflict with my interest in computational learning theory. Philosophically speaking the two subjects tend to fall into two camps regarding interpretation: the Bayesian camp and the Frequentist camp. Both have much to offer. Each asks its own questions. Each provides answers of interest within its own context. Most importantly as far as I'm concerned, each is fun to think about.
Specifically, I am interested in the following issues:
Pretty much any quantity of interest in this area involves an intractable integration. Several ideas are available allowing us to approximate these integrals in some way: Markov chain monte carlo, variational approximation etc. I am investigating new approaches to this problem.
Sparse and robust methods.
Latent variable methods.
Extended ensemble monte carlo methods.
Boosting is a thoroughly intriguing procedure allowing several weak learners to be combined to form a single more powerful classifier. It has its roots in computational learning theory, but most recent work has focussed on explaining its apparent, rather counterintuitive, ability to resist overfitting, by seeking to analyze its operation within a number of different frameworks.
I am interested in Boosting in general, and in particular in alternative explanations for the effectiveness of this procedure in its applied forms.
A long-standing and very successful research programme in this laboratory is that of automated theorem proving. I am working in collaboration with Larry Paulson and James Bridge on ways in which machine learning might be incorporated into such systems.
At present I supervise:
Nicholas Pilkington: researching multiple kernel learning methods applied to drug design, and fast parameter selection for large problems.
Past research students are:
Andrew Naish-Guzmann: Thesis Sparse and Robust Kernel Methods, 2007.
Ulrich Paquet: Thesis Bayesian Inference for Latent Variable Models, 2007.
Matthew Trotter: Thesis Machine Learning for Drug Design, 2006.
How do we design good drugs?
Prior to around 1980 the approach was simple: a compound was designed specifically - by one or more clever chemists - to interact in a desired way with a target of interest. This process was, unsurprisingly, both expensive and time-consuming.
More recently the process has changed. Instead of designing a single compound of interest we search chemical space by elimination for compounds with desirable properties such as toxicity or "drug-likeness". This can be cast as a classic, although very hard, supervised learning problem wherein compounds are described by attribute vectors and labeled with the presence or otherwise of the property of interest. Typically the resulting learning problem has highly non-symmetric priors (as most compounds are not interesting) and non-symmetric costs (as the cost of rejecting a compound as inactive when in fact it is a perfect cure for HIV is extremely high).
Problems of this kind have traditionally been addressed using various statistical and other intelligent systems techniques. As part of the Rocket project we applied support vector machines to this problem, in collaboration with GlaxoSmithKline.
Quantum computation is a very intriguing concept indeed. And that's putting it mildly. We are at present living in very interesting times; although there is evidence that quantum phenomena might be harnessed to perform computations considered intractable (in the formal sense) for any standard (non-quantum) computer, we are not at present aware of exactly how the complexity of quantum algorithms relates to the non-quantum complexity classes. Also, it is not clear whether a large-scale quantum computer can be built, and the design of quantum algorithms is not as straightforward (?!) as the design of algorithms in its classical sense.
I am interested in the way in which the advantages potentially offered by quantum computers might be of benefit within the general field of machine learning.