There are some truly wonderful books that deal with machine learning and AI in general. For an introduction to a large part of the field, but at a reasonably easy level:
Machine Learning
Tom Mitchell
McGraw Hill 1997
For more depth try:
Pattern Classification
Second Edition
Richard O. Duda, Peter E. Hart and David G. Stork
Wiley-Interscience 2000
The Elements of Statistical Learning
T. Hastie, R. Tibshirani and J. H. Friedman
Springer-Verlag 2001
There are many books available on neural networks. Of everything I've seen to date the best two to take a look at are:
Neural Networks for Pattern Recognition
Christopher M. Bishop
Oxford University Press 1995
Neural Networks: A Comprehensive Foundation
Second Edition
Simon Haykin
Prentice Hall 1998
If your background is more towards physics and/or you want to see some further material with that kind of flavour try:
Introduction to the Theory of Neural Networks
John Hertz, Anders Krogh and Richard G. Palmer
Perseus Books Group 1991
For general AI the best thing you can read is:
Artificial Intelligence: A Modern Approach
Second edition
Stuart Russell and Peter Norvig
This is long, and it will take you a while to get through, but its presentation and its general approach are outstanding.
There is an excellent web site at:
containing numerous resources. For a readable introduction to probably approximately correct (PAC) learning etc try:
Computational Learning Theory
Martin Anthony and Norman Biggs
Cambridge University Press
and for more in-depth coverage of more recent and advanced material try:
Neural Network Learning: Theoretical Foundations
Martin Anthony and Peter L. Bartlett
Cambridge University Press 1999
Another introductory book with a different feel to it, and covering some further areas is:
An Introduction to Computational Learning Theory
Michael J. Kearns and Umesh V. Vazirani
The MIT Press 1994
An excellent an readable account of support vector machines is:
An
Introduction to Support Vector Machines and Other Kernel-based
Learning Methods
Nello Cristianini and John Shawe-Taylor
Cambridge University Press 2000
For an in depth coverage of some more recent material, in particular connections between the learning theoretic approach and Bayes take a look at:
Learning Kernel Classifiers: Theory and Algorithms
Ralf Herbrich
The MIT Press 2002
Again, there is a nice web site with many resources at:
A nice introduction (you should read something on general machine learning from the suggestions above first) is:
The Boosting Approach to Machine Learning: An Overview
Robert E. Schapire
MSRI Workshop on Nonlinear Estimation and Classification 2002
The following two papers propose explanations for how boosting works:
Additive Logistic Regression: a Statistical View of Boosting
J. Friedman, T. Hastie and R. Tibshirani
The Annals of Statistics
Volume 28, number 2, pages 337-374, 2000
Boosting the margin: A new explanation for the effectiveness of voting methods
Robert E. Schapire, Yoav Freund, Peter Bartlett and Wee Sun Lee
The Annals of Statistics
Volume 26, number 5, pages 1651-1686, 1998
An excellent introduction to a very wide range of modern techniques can be found in:
Pattern
Recognition and Machine Learning
Christopher M. Bishop
Springer, 2006
A good introduction to many of the issues can be found in
Information Theory, Inference and Learning
Algorithms
David J. C. MacKay
Cambridge University Press, 2002
although your best bet is to concentrate on the machine learning parts. The coding is very interesting, but not so relevant to applying here for a machine learning PhD. A great book by a legend is:
Probability Theory: The Logic of Science
E. T. Jaynes and G. Larry Bretthorst
Cambridge University Press, 2003
For a good summary see Zoubin Ghahramani's talk at:
http://www.gatsby.ucl.ac.uk/~zoubin/ICML04-tutorial.html
An excellent introduction to Gaussian processes as applied to machine learning, now available online, is:
Gaussian
Processes for Machine Learning
Carl E Rasmussen and Christopher K I Williams
MIT Press, 2006.
More specific material on approximate integration can be found in:
An Introduction to MCMC for Machine Learning
C. Andrieu, N. de Freitas, A. Doucet and M. Jordan
Machine Learning, Volume 50, pages 5-43, 2003
On the same subject, see also the excellent review:
Probabilistic Inference Using Markov Chain Monte Carlo Methodswhich is available from:
http://www.cs.toronto.edu/~radford/publications.html