Next: Artificial Intelligence
Up: Lent Term 1999: Part
Previous: Information Retrieval
Lecturers: Dr J.G. Daugman and Dr C.M. Bishop
(jgd1000@cl.cam.ac.uk
and
cmbishop@microsoft.com)
No. of lectures: 16
Prerequisite courses: Continuous Mathematics, Probability
- Natural versus artificial substrates of intelligence.
- Investigation into how biological nervous systems accomplish many of
the goals of machine intelligence, but using radically different
strategies, architectures, and hardware. Comparison of those differences,
and examination of their importance or irrelevance. Levels of analysis;
mechanism and explanation; philosophical issues. Basic neural network
architectures compared with rule-based or symbolic approaches to
learning and problem-solving.
- Neurobiological wetware: architecture and function of the brain.
- Human brain architecture. Sensation and perception; learning and memory.
What we can learn from neurology of brain trauma; modular organisation
and specialisation of function. Aphasias, agnosias, apraxias.
How stochastic communications media, unreliable and randomly distributed
hardware, very slow and asynchronous clocking, and imprecise connectivity
blueprints, give us unrivalled performance in real-time tasks involving
perception, learning, and motor control.
- Neural processing and signalling.
- Information content of neural signals. Spike generation processes.
Neural hardware for both processing and communications. Can the
mechanisms for neural processing and signalling be viably separated?
Biophysics of nerve cell membranes and differential ionic
permeability. Excitable membranes. Logical operators.
- Stochasticity in neural codes.
- Principal Components Analysis of spike trains. Evidence for detailed
temporal modulation as a neural coding and communications strategy.
Is stochasticity also a fundamental neural computing strategy for
searching large solution spaces, entertaining candidate hypotheses
about patterns, and memory retrieval? John von Neumann's conjecture.
Simulated annealing.
- Neural operators that encode, analyse, and represent image structure.
- How the mammalian visual system, from retina to brain, extracts information
from optical images and sequences of them to make sense of the world.
Description and modelling of neural operators in engineering terms as
filters, coders, compressors, and pattern matchers.
- Cognition and evolution. Neuropsychology of face recognition.
- The sorts of tasks, primarily social, that shaped the evolution of
human brains. The computational load of social cognition as the
driving factor for the evolution of large brains. How the
degrees-of-freedom within faces and between faces are extracted and
encoded by specialised areas of the brain concerned with the
detection, recognition, and interpretation of faces and facial
expressions. Efforts to simulate these faculties in artificial
systems.
- Pattern recognition.
- A brief history of artificial neural networks. Examples of successful
applications. Central concepts of learning from data, and foundations
in probability theory. Regression and classification problems viewed
as non-linear mappings. Analogy with polynomial curve
fitting. General ``linear'' models. The curse of dimensionality and the
need for adaptive basis functions. Brief review of Perceptrons and
their limitations.
- Feed-forward networks.
- Two-layer feed-forward neural network model. Derivation of the error
back-propagation algorithm for feed-forward networks of arbitrary
topology. Efficiency of back-propagation and comparison with numerical
differentiation. Gradient descent optimisation and its limitations.
- Generalisation and model complexity.
- Relation of model complexity to generalisation error. Training and
validation sets. Cross validation. Regularisation using simple weight
decay. Analysis of weight decay in terms of eigenvector decomposition
of the Hessian matrix. Illustration of regularisation using simple
radial basis function model.
- Probabilistic inference.
- Sum and product rules of probability. Conditional and marginal
distributions. Bayes' theorem. Use of probability to quantify
uncertainty. Bayesian and frequentist viewpoints. Density estimation,
regression and classification expressed in terms of probability
distributions. Likelihood function. Maximum likelihood illustrated
using a Gaussian distribution. Conditional Gaussian distribution for
regression, and derivation of the sum-of-squares error
function. Network output viewed as conditional mean.
- Network models for classification.
- Probabilistic formulation of classification problems. Prior and
posterior probabilities. Decision theory and minimum misclassification
rate. The distinction between inference and decision. Estimation of
posterior probabilities compared with the use of discriminant
functions. Neural networks as estimators of posterior
probabilities. Two-class problems and the Bernoulli distribution.
Derivation of the cross-entropy error function. Derivation of
logistic sigmoid activation function from assumption of Gaussian
class-conditional distributions of hidden unit activations. Concept
of a canonical link function.
- Classification and decision theory.
- Multi-class problems and the multi-nomial distribution. Derivation of
the cross-entropy error function. Derivation of the softmax activation
function. Compensating for different prior probabilities in training
and test sets. Loss matrices and risk minimisation. Reject
option. Illustration using hypothetical medical screening example.
Main recommended book:
Bishop, C.M. (1995). Neural Networks for Pattern Recognition.
Oxford University Press.
Other recommended books:
Haykin, S. (1994). Neural Networks: A Comprehensive Foundation.
Macmillan.
Hecht-Nielsen, R. (1991). Neurocomputing. Addison-Wesley.
Aleksander, I. (1989). Neural Computing Architectures.
North Oxford Academic Press.
Next: Artificial Intelligence
Up: Lent Term 1999: Part
Previous: Information Retrieval
Christine Northeast
1998-10-01