Course material 2010–11
- Lecture Notes (updated Feb 2011)
- Exercises (updated Feb 2011)
- Slides: Lecture 1,
- Past exam questions
- Information for supervisors (contact lecturer for access permission)
No. of lectures: 12
Prerequisite courses: Probability, Mathematical Methods for Computer Science. Artificial Intelligence I (recommended)
The aims of this course are to introduce the principles, models and applications of computer vision. The course will cover: image formation, structure, and coding; edge and feature detection; texture, colour, stereo, and motion; wavelet methods for visual coding and analysis; interpretation of surfaces, solids, and shapes; appearance modelling; pattern recognition and classification; visual inference and learning. Several of these issues will be illustrated using the examples of optical character recognition, image retrieval, and face recognition.
- Goals of computer vision; why they are so difficult. How images are formed, and the ill-posed problem of making 3D inferences from them about objects and their properties.
- Image sensing, pixel arrays, cameras. Elementary operations on image arrays; coding and information measures. Sampling and aliasing. Biological vision.
- Mathematical operators for extracting image structure. Finite differences and directional derivatives. Filters; convolution; correlation. Fourier and wavelet transforms.
- Edge detection operators; the information revealed by edges. The Laplacian operator and its zero-crossings. Logan’s theorem.
- Multi-scale feature detection and matching. Gaussian pyramids and SIFT (scale-invariant feature transform). Energy-minimising snakes. 2D wavelets as visual primitives.
- Texture, colour, stereo, and motion descriptors. Disambiguation and the achievement of invariances. Image and motion segmentation.
- Lambertian and specular surfaces. Reflectance maps. Image formation geometry. Discounting the illuminant when inferring 3D structure and surface properties.
- Shape representation. Inferring 3D shape from shading; surface geometry. Boundary descriptors; codons; superquadrics and the “2.5-Dimensional” sketch.
- Perceptual psychology and visual cognition. Vision as model-building and graphics in the brain. Learning to see. Visual illusions, and what they may imply about how vision works.
- Bayesian inference in vision; knowledge-driven interpretations. Classifiers and pattern recognition. Probabilistic methods in vision.
- Applications of machine learning in computer vision. Discriminative and generative methods. Optical character recognition. Content based image retrieval.
- Approaches to face detection, face recognition, and facial interpretation. Appearance and model based representations. 2D and 3D approaches. Cascaded detectors.
At the end of the course students should
- understand visual processing from both “bottom-up” (data oriented) and “top-down” (goals oriented) perspectives;
- be able to decompose visual tasks into sequences of image analysis operations, representations, specific algorithms, and inference principles;
- understand the roles of image transformations and their invariances in pattern recognition and classification;
- be able to describe and contrast techniques for extracting and representing features, edges, shapes, and textures
- be able to analyse the robustness, brittleness, generalisability, and performance of different approaches in computer vision;
- understand some of the major practical application problems, such as face interpretation, character recognition, and image retrieval.
- Forsyth, D.A. & Ponce, J. (2003). Computer vision: a modern approach. Prentice Hall.
- Shapiro, L. & Stockman, G. (2001). Computer vision. Prentice Hall.
Background material: some research papers on
- SIFT (see lecture 5)
- Convolutional neural networks (see lecture 11)
- CBIR, and a CBIR survey paper (see lecture 11)
- Face recognition (see lecture 12)
- Iris recognition and another about how it works (not covered in detail in lectures)
Exercises and examples
- Lectures 1-3: Exercises 1 - 3. For an example of "split brain" phenomena, see this clip.
- Lectures 4 and 5: Exercises 4 - 8. You might also like to experiment with this online image analysis tool. More information on Fourier and Wavelet approaches to image processing and filtering can be found here.
- Lecture 6: Exercises 9 - 10.
- Lectures 7-9: Exercises 11 - 12. As mentioned in lectures, see whether you can get the dancer to spin in both directions (clip 1, clip 2, clip 3). Some other fascinating dynamic illusions are presented here. Also study this compelling lightness illusion, this illustration of colour-constancy, this motion illusion, and this collection of dynamic, colour, and cognitive illusions, and try to explain them! More collections exist here.
- Lectures 10-12: Exercises 13 - 15. View this 5-minute video about 3-D morphable face representations, and this 1-minute demonstration of generative models for facial expression, applied dynamically to ("talking") paintings and photographs. If you are really interested, you might also like Yann Le Cun's Google Tech Talk on convolutional neural networks for "deep learning" and pattern recognition.