Neural Computing

Next: Artificial Intelligence Up: Lent Term 1999: Part Previous: Information Retrieval

Neural Computing

Lecturers: Dr J.G. Daugman and Dr C.M. Bishop

(jgd1000@cl.cam.ac.uk and cmbishop@microsoft.com)

No. of lectures: 16

Prerequisite courses: Continuous Mathematics, Probability

Natural versus artificial substrates of intelligence.: Investigation into how biological nervous systems accomplish many of the goals of machine intelligence, but using radically different strategies, architectures, and hardware. Comparison of those differences, and examination of their importance or irrelevance. Levels of analysis; mechanism and explanation; philosophical issues. Basic neural network architectures compared with rule-based or symbolic approaches to learning and problem-solving.
Neurobiological wetware: architecture and function of the brain.: Human brain architecture. Sensation and perception; learning and memory. What we can learn from neurology of brain trauma; modular organisation and specialisation of function. Aphasias, agnosias, apraxias. How stochastic communications media, unreliable and randomly distributed hardware, very slow and asynchronous clocking, and imprecise connectivity blueprints, give us unrivalled performance in real-time tasks involving perception, learning, and motor control.
Neural processing and signalling.: Information content of neural signals. Spike generation processes. Neural hardware for both processing and communications. Can the mechanisms for neural processing and signalling be viably separated? Biophysics of nerve cell membranes and differential ionic permeability. Excitable membranes. Logical operators.
Stochasticity in neural codes.: Principal Components Analysis of spike trains. Evidence for detailed temporal modulation as a neural coding and communications strategy. Is stochasticity also a fundamental neural computing strategy for searching large solution spaces, entertaining candidate hypotheses about patterns, and memory retrieval? John von Neumann's conjecture. Simulated annealing.
Neural operators that encode, analyse, and represent image structure.: How the mammalian visual system, from retina to brain, extracts information from optical images and sequences of them to make sense of the world. Description and modelling of neural operators in engineering terms as filters, coders, compressors, and pattern matchers.
Cognition and evolution. Neuropsychology of face recognition.: The sorts of tasks, primarily social, that shaped the evolution of human brains. The computational load of social cognition as the driving factor for the evolution of large brains. How the degrees-of-freedom within faces and between faces are extracted and encoded by specialised areas of the brain concerned with the detection, recognition, and interpretation of faces and facial expressions. Efforts to simulate these faculties in artificial systems.
Pattern recognition.: A brief history of artificial neural networks. Examples of successful applications. Central concepts of learning from data, and foundations in probability theory. Regression and classification problems viewed as non-linear mappings. Analogy with polynomial curve fitting. General ``linear'' models. The curse of dimensionality and the need for adaptive basis functions. Brief review of Perceptrons and their limitations.
Feed-forward networks.: Two-layer feed-forward neural network model. Derivation of the error back-propagation algorithm for feed-forward networks of arbitrary topology. Efficiency of back-propagation and comparison with numerical differentiation. Gradient descent optimisation and its limitations.
Generalisation and model complexity.: Relation of model complexity to generalisation error. Training and validation sets. Cross validation. Regularisation using simple weight decay. Analysis of weight decay in terms of eigenvector decomposition of the Hessian matrix. Illustration of regularisation using simple radial basis function model.
Probabilistic inference.: Sum and product rules of probability. Conditional and marginal distributions. Bayes' theorem. Use of probability to quantify uncertainty. Bayesian and frequentist viewpoints. Density estimation, regression and classification expressed in terms of probability distributions. Likelihood function. Maximum likelihood illustrated using a Gaussian distribution. Conditional Gaussian distribution for regression, and derivation of the sum-of-squares error function. Network output viewed as conditional mean.
Network models for classification.: Probabilistic formulation of classification problems. Prior and posterior probabilities. Decision theory and minimum misclassification rate. The distinction between inference and decision. Estimation of posterior probabilities compared with the use of discriminant functions. Neural networks as estimators of posterior probabilities. Two-class problems and the Bernoulli distribution. Derivation of the cross-entropy error function. Derivation of logistic sigmoid activation function from assumption of Gaussian class-conditional distributions of hidden unit activations. Concept of a canonical link function.
Classification and decision theory.: Multi-class problems and the multi-nomial distribution. Derivation of the cross-entropy error function. Derivation of the softmax activation function. Compensating for different prior probabilities in training and test sets. Loss matrices and risk minimisation. Reject option. Illustration using hypothetical medical screening example.

Main recommended book:

Bishop, C.M. (1995). Neural Networks for Pattern Recognition. Oxford University Press.

Other recommended books:

Haykin, S. (1994). Neural Networks: A Comprehensive Foundation. Macmillan.

Hecht-Nielsen, R. (1991). Neurocomputing. Addison-Wesley.

Aleksander, I. (1989). Neural Computing Architectures. North Oxford Academic Press.

Next: Artificial Intelligence Up: Lent Term 1999: Part Previous: Information Retrieval

Christine Northeast
1998-10-01