Computer Vision
Principal lecturers: Dr Christopher Town, Prof John Daugman, Dr Marwa Mahmoud
Taken by: MPhil ACS
Code: L248
Hours: 18 (16 lectures plus an additional briefing session)
Class limit: max. 20 students
Aims
The aims of this course are to introduce the principles, models and applications of computer vision, as well as some mechanisms used in biological visual systems that may inspire design of artificial ones. The course will cover: image formation, structure, and coding; edge and feature detection; neural operators for image analysis; texture, colour, stereo, and motion; wavelet methods for visual coding and analysis; interpretation of surfaces, solids, and shapes; probabilistic classifiers; visual inference, recognition, and learning.
Lectures
- Goals of computer vision; why they are so difficult. Image formation, and the ill-posed problem of making 3D inferences about objects and their properties from images.
- Image sensing, pixel arrays, CCD and CMOS cameras. Image coding and information measures. Elementary operations on image arrays.
- Biological visual mechanisms, from retina to cortex. Photoreceptor sampling; receptive field profiles; stochastic impulse codes; channels and pathways. Neural image encoding operators.
- Mathematical operations for extracting image structure. Finite differences and directional derivatives. Filters; convolution; correlation. 2D Fourier domain theorems.
- Edge detection operators; the information revealed by edges. Gradient vector field; Laplacian operator and its zero-crossings.
- Multi-scale contours, feature detection and matching. SIFT (scale-invariant feature transform); pyramids. 2D wavelets as visual primitives. Active contours. Energy-minimising snakes.
- Higher visual operations in brain cortical areas. Multiple parallel mappings; streaming and divisions of labour; reciprocal feedback through the visual system.
- Texture, colour, stereo, and motion descriptors. Disambiguation and the achievement of invariances. Colour computation, motion and image segmentation.
- Lambertian and specular surfaces; reflectance maps. Geometric analysis of image formation from surfaces. Discounting the illuminant when inferring 3D structure and surface properties.
- Shape representation. Inferring 3D shape from shading; surface geometry. Boundary descriptors; codons. Object-centred volumetric coordinates.
- Perceptual organisation and cognition. Vision as model-building and graphics in the brain. Learning to see.
- Lessons from neurological trauma and visual deficits. Visual agnosias and illusions, and what they may imply about how vision works.
- Bayesian inference in vision; knowledge-driven interpretations. Classifiers, decision-making, and pattern recognition.
- Model estimation. Machine learning and statistical methods in vision.
- Applications of machine learning in computer vision. Discriminative and generative methods. Convolutional neural networks.
- Approaches to face detection, face recognition, and facial interpretation. Cascaded detectors. Appearance versus model-based methods.
Objectives
At the end of the course students should
- understand visual processing from both “bottom-up” (data oriented) and “top-down” (goals oriented) perspectives;
- be able to decompose visual tasks into sequences of image analysis operations, representations, specific algorithms, and inference principles;
- understand the roles of image transformations and their invariances in pattern recognition and classification;
- be able to describe and contrast techniques for extracting and representing features, edges, shapes, and textures;
- be able to describe key aspects of how biological visual systems work; and be able to think of ways in which biological visual strategies might be implemented in machine vision, despite the enormous differences in hardware;
- be able to analyse the robustness, brittleness, generalizability, and performance of different approaches in computer vision;
- understand the roles of machine learning in computer vision today, including probabilistic inference, discriminative and generative methods;
- understand in depth at least one major practical application problem, such as face recognition, detection, or interpretation.
Recommended reading
* Forsyth, D. A. and Ponce, J. (2003). Computer Vision: A Modern Approach. Prentice Hall.
Shapiro, L. and Stockman, G. (2001). Computer vision. Prentice Hall.
Practical work
Two practical exercises and a mini-project are carried out in Lent Term and early Easter Term.
In addition to lectures, briefing and feedback meetings will be scheduled. Details will follow.
Assessment
- Exercise 1: 10%
- Exercise 2: 20%
- Mini-project: 70% (Proposal and presentation 5%; Final report: 65%)
Further Information
Due to COVID-19, the method of teaching for this module will be adjusted to cater for physical distancing and students who are working remotely. We will confirm precisely how the module will be taught closer to the start of term.
Current Cambridge undergraduate students who are continuing onto Part III or the MPhil in Advanced Computer Science may only take this module if they did NOT take it in Part II.
This course is borrowed from Part II of the Computer Science Tripos. As such, assessment will be adjusted to an appropriate level for those enrolled for Part III of the Tripos or the M.Phil in Advanced Computer Science. Further information about assessment and practicals will follow at the first lecture.