Applying to do a PhD in machine learning

Unfortunately, I receive rather a large number of enquiries from people who have not familiarised themselves with either the process for applying to Cambridge, or in some cases my research interests. (No kidding!) Please read the following in its entirety before contacting me and make sure that the magic word appears in the subject of your email. If you don't receive a reply, it's probably because you didn't include the magic word...

General

I can not offer short-term placements, internships or similar positions to undergraduate or postgraduate students. Please do not ask. If you do then at best your email will be ignored and at worst I reserve the right to get quite cross.

I aim at present to take on approximately two PhD students per year to study for the PhD degree in some area of machine learning of interest to me. This is not a figure set in stone, but think in terms of somewhere between zero and four with two being the target I have in mind. (I can always find room for a couple extra if there are good applicants.)

First of all, take a look at the information published by the Computer Laboratory on the PhD applications process:

http://www.cl.cam.ac.uk/admissions/phd/

There is potentially quite a lot for you to do, but the primary issue - and one that you may want to talk to me about first - is in preparing a statement of proposed research.

Your statement of proposed research

Now, this is likely to be challenging, and Cambridge is perhaps unusual in requiring it as part of your application, but it's an extraordinarily useful exercise. You need to produce up to 3000 words outlining your intended topic and your research strategy. If you are serious about applying, then before choosing a research topic you should talk to me, and to anyone else in the lab who has interests aligned with your own. This is almost certainly critical, as acceptance of your application needs to get past a very major hurdle: somebody has to agree to supervise you. I will not agree to supervise someone whose research topic is not of interest to me, so some discussion ahead of time is likely to be a good idea.

It's worth noting that in no sense does your research proposal bind you to the proposed research. You might of course continue with it and eventually submit a dissertation, based entirely on the research proposal as submitted during the application process. It is quite likely however that you will, in the course of the huge quantity of reading you'll be doing in the first year, find something that sends you in an alternative direction. This is the nature of research; it is to be expected. And that goes regardless of what well-meaning people who expect everything to be fixed in detail four years in advance might try to tell you. (And I speak from experience: I started my PhD looking at classification of time-series, and submitted one on Computational Learning Theory.)

If you have an idea of what you're interested in researching: please call or email and discuss it, but consider the fact that if it's not close to one of my areas of interest then I'm unlikely to be the best person to supervise you.

If you do not have an idea then take a look at my research page , and at my suggested reading for PhD applicants, then call and have a chat: at any given time I have a list of potential PhD topics that you might want to take a closer look at. If by this time you think you like the kind of work I do you should be in a good position to demonstrate that you understand what's already been done in the field, and to talk about an open problem. The statement also requires you to come up with goals and deliverables for the first year, and to demonstrate that you know how to attain them. These should, hopefully, follow without too much trouble.

Words of warning. (Or rant, depending on your point of view...)

There are some major words of warning. These may sound obvious but I have in the past seen numerous examples of offenders:

  1. Do not claim that you will make an intelligent machine. You won't. There are hundreds of fantastic researchers in artificial intelligence (AI) who haven't done it yet, so unless you're a genius don't claim to be able to. (If you are such a genius then you should DEFINITELY send me an application.)

  2. Don't believe the hype, or the popular press. I will be particularly cross if your proposal includes any of the following: "singularity", "consciousness", "emotion" or "enslavement of humanity by our robot overlords". (And only the last of those was intended as a joke.)

  3. Do not claim that getting hold of an enormous amount of computing power will let you solve everything. It won't. Pretty much any task of interest in AI is NP-complete or worse, and the best applied research ends up taking great care of complexity.

  4. In fact, a huge amount of modern AI is essentially the attempt to find algorithms for obtaining good approximate solutions, most of the time, to problems for which finding the optimum solution is provably computationaly intractable. For example, the current state-of-the-art for reasoning under uncertainty is Bayesian inference. This is intractable and consequently there is a great deal of work on variational and other approximations. Similarly, the recent progress on playing the game of Go relies on randomizing the search through the game tree as searching the whole tree is intractable.

  5. Be realistic. This follows from the last point. History is littered with people who have underestimated how wonderful brains actually are, which is why the brain as a whole is not much studied by AI researchers: we tend these days to look at specific types of task and to gain a better understanding of how to solve them.

  6. If you're going to jump on a bandwagon, jump on the bit that's worth riding. To elaborate: every few years an interesting idea comes along and starts to get a lot of funding. If the idea genuinely is interesting then this is good. Funding is nice, so everyone in sight jumps on the bandwagon. Now a bandwagon is not necessarily a bad thing. However, the good part of the idea is generally followed up rigorously by only a minority. Make sure you are part of the minority. In the 80s and early 90s the bandwagon was neural networks, and an awful lot of rot was written about them along with a smaller body of work that endures and continues to be developed with great success. The researchers that did the worthwhile stuff were the ones who noticed that they are nonlinear statistical techniques that should be treated like any other. At present there is a major bandwagon trundling around bearing names such as "natural computing", "evolutionary computing" and so on. (Make sure you put the magic word “Darwin” in the subject line of your email.) As usual there is some outstanding work being done in these fields, but do not be led astray by the fancy names. A genetic algorithm is an optimization technique. Nothing more. It just sounds more appealing than "simulated annealing".

  7. The last point hints at a second reason not to believe in hype. The amount of funding a new area gets is not necessarily a reflection of whether it's interesting or worthwhile. If it's not an interesting idea, then just don't jump on the bandwagon at all.

  8. Concentrate on finding optimal solutions (or as near optimal as you can) to interesting problems. This is related to some of the earlier points. Edsgar Dijkstra is famous for, amongst other things, pointing out that the question of whether computers can think is about as relevant as whether submarines can swim. Of course the people who designed submarines didn't try to copy fish---they looked for good solutions to what is in fact a different problem. Similarly, nobody tries to build fighter jets that flap their wings. Unsurprisingly, when faced with a difficult problem in optimization, it doesn't necessarily make much sense to start from: "Let's see, how would an ant colony do it?"

  9. Don't claim that you need to build special-purpose hardware. Experience suggests that by the time you've finished, Moore's law will have caught up with you and you might as well just have written a program. (In addition, if you claim your special hardware does something that a program can't then you're demonstrating a somewhat less than graduate-level grasp of the theory of computation.) It's something of an open problem of course whether or not quantum computation will help, but I am not the appropriate supervisor to help you build a large-scale quantum computer, and research in the algorithms is sufficiently tricky that I would advise you against it.

Other advice

As a couple of pieces of advice, rather than warning: learn LaTeX and install Linux if you haven't already done so. My group uses them, so you'll need them if you want to work effectively within our context. And in any case, you'll thank me when you're writing your thesis...

Funding

I can not at present offer funding for PhD study from personally held research grants.

However, there are a number of potential sources of funding to which you might wish to apply. Again details are available at:

http://www.cl.cam.ac.uk/admissions/phd/

Other considerations

My approach to machine learning, in common with most people in Cambridge who are interested in this and related areas, is based on a formal mathematical approach to the subject: while aiming at producing workable applied techniques, history tells us loud and clear that over the long term this is the approach that works.

First of all, be aware that the maths you did in your first degree while providing a good background is unlikely to be sufficient on its own and you'll almost certainly need to learn some more, and to put it into practice. Moral: if you don't like maths then machine learning in this lab is possibly not for you!

Second, don't let that scare you off. While you will certainly have to learn some new things, it is not necessary (depending on the actual subject of research) to cope with the kind of mathematical complexity that a PhD in number theory might entail. (Having said that, some of the more esoteric areas in computational learning theory will get pretty close, if that's what you want to do.) Realistically, research in machine learning, done the right way, will mean you need to learn some more advanced calculus and linear algebra than you currently know about, and perhaps some more advanced/obscure stuff that's dependent on your field of study. And this of course counts as FUN!