Project Suggestions by Chris Town

Nov 2005 Update: Due to overwhelming demand by partII students, I am unlikely to be able to supervise any Diploma projects this year. However, it might be possible to make an exception for a particularly strong candidate who already has some relevant experience.

Here are my project suggestions for Part II or Diploma students in the academic year 2005/2006. Some of the information on last year's suggestions may also be relevant. I have supervised about 12 Part II and Diploma projects in recent years, almost all of which received 1st class marks, with several being singled out for special commendation by the examiners. I also recently co-authored three academic papers together with former project students of mine. In short, my project suggestions are likely to be challenging but I am fully committed to putting in a lot of effort to provide the best support I can to make sure the project is completed successfully. Who knows, you might end up having a lot of fun too!

The platform of choice for implementation of the projects is Matlab, which is available in most Colleges and in the CL. Matlab has excellent facilities for numerical computation and visualisation, and there are many useful toolboxes (e.g. for image processing, statistics, optimisation, neural networks). For reasons of runtime efficiency, it might however be appropriate to implement part of the required functionality in a lower level compiled language such as C++ and integrate such modules into Matlab by means of the Matlab compiler package. There are various free computer vision packages available which use or support C/C++ such as OpenCV, VXL, and Lush.

No previous experience of image and video processing is required, just enthusiasm. The projects are challenging in that they address interesting research problems, but plenty of support will be available. Apart from an interest in the project, a reasonable grounding in continuous mathematics and probability theory would be helpful, as would proficiency with high level programming languages such as C++ or the Matlab environment.

As regards general references, I particularly recommend the textbook by Forsyth and Ponce on computer vision. References on image processing, such as the book by Gonzales and Woods, and on numerical methods, such as Numerical Recipes in C++, might also be handy. Useful online resources for computer vision include CVonline and the Computer Vision Homepage at CMU. The best (free) online tools for finding papers etc. are Google, Google Scholar, and Citeseer. There are many online tutorials for Matlab, this is a local one at CUED. Eventually you might want to use Tex/LaTex to produce your dissertation, here is a basic introduction and further information can be found here and here.

Context-based face detection

Face detection remains an important and challenging problem. Much progress has recently been made using modern machine learning methods applied to very large data sets of faces. However, most of these methods are restricted to a particular view of a face (e.g. full frontal) and are sensitive with respect to light conditions, occlusions (glasses, beards, hair etc.), scale and noise (which is a particular problem in the case of e.g. CCTV footage).

This project will investigate how image context can be used to build face detectors which are more robust. Whereas most face detectors are currently trained only on images of the face itself, the aim is to make use of information from features such as hair and skin colour, and focussing cues such shoulder outline and body silhouettes. For example, a "head and shoulders" detector may be used to quickly identify potential faces and such regions could then be selectively passed to a more specialised (and computationally more intensive) face detector. I already have a range of Matlab code and data sets for face detection which could form the basis of this project.

Papers which suggest using context information for face detection include Kruppa 2003 (see also this paper), Sinha 2002 (see also this paper and this page), and Liao 1997. A good starting point to learn more about face detection in general is the Face detection homepage. One of the best performing face detection methods is that proposed by Viola and Jones (see Viola 2001 and Lienhart 2002). Also see the project resources below for more relevant links on face detection.

Multi-view face detection

As noted above, most face detection methods only work on full frontal images. This project will investigate ways to make face detection work for different poses (out-of-plane rotations, e.g. frontal, 30 degree angle, and profile) and orientations (in-plane rotation around the camera's optical axis). The simplest option would be to train an existing face detector with an extended training set to create either a single multi-view face detector (the danger being that this may suffer from a high false positive rate) or a set of detectors (which will require some disambiguation to deal with multiple detections of the same face). I already have some data sets and face detector code which could be used as a starting point.

For additional information, have a look at the following:

Facial feature detection

A number of computer vision applications such as pose estimation, expression recognition, and image indexing benefit from being able to automatically detect facial features such as points around the eyes, nose, mouth, and cheeks. This project will use a well known method such an Active Shape or Active Appearance Models to build a robust model of facial appearance and fit it to face images in order to estimate the position of landmark points around facial components.

References:

"Body plans" for detection and recognition of human shapes

In order to index images for purposes such as content-based retrieval, it is often important to be able to identify people and body parts such as limbs on the basis of contour (silhouette) and shape information.

An important line of work in this area is the "body parts" approach which works by finding candidate body segments and then linking them in a way which is consistent with constraints based on human appearance and kinematics. See e.g Forsyth 1997 and Ioffe 1998. Micilotta 2005 shows how this can be used for tracking and pose estimation (see here for some demos and here for a related paper with further details). Other approaches include part-based appearance models (see Xie 2004), mixtures of tree-structured probabilistic models (Ramanan 2003), and contour-based learning (Shotton 2005). In order to detect potential body segments, the segmentation and recognition frameworks presented by Martin 2002 and Mori 2004 might be of use.

Linking names to image content for face analysis

Many news images carry captions or annotations such as "George Bush meeting Tony Blair". By extracting such phrases (so called named entity recognition, NER) from text and linking this to the output of face detectors, it is possible to automatically derive labelled training data which could be used to build face recognisers for frequently photographed people such as politicians and other celebrities. Search engines such as Google and Yahoo already extract image captions and textual context and many image archives carry basic annotations. Depending on interest, the project could emphasise the natural language processing aspects of NER, or build a simple parser (e.g. using a fixed name list) and focus more on the computer vision tasks of identifying data outliers (mislabeled faces) and training face recognisers. The task will be made easier through the use of existing face detection and other image analysis code.

References:

Global image content inference

While arguably most of computer vision research is concerned with the detection and recognition of particular objects, there is much scope for work which seeks to characterise the visual content of images at a broader "macroscopic" level. For example, for the purposes of image retrieval it is generally more important to be able to describe images with labels such as "indoors", "city", or "BBQ" rather than trying to identify and label every object in the scene.

The aim of the project is to develop a Bayesian probabilistic framework for inferring the overall scene depicted in a digital photograph for a limited number of labels (e.g. indoor, outdoor, daytime, at night, city, countryside, beach etc.). This will be based on both image classification information (you can make use of various methods which I have already implemented) and image metadata such as digital camera settings. For example, knowing that the flash was used might make it more likely that the picture was taken indoors, and images containing lots of straight edges are more likely to show man-made objects such as buildings.

Recent references are Boutell 2004 (a longer version is available here), Vogel 2004, Luo 2001, and Vailaya 2001. A useful overview can be found here.

Estimating the orientation of digital photographs

A common problem encountered in digital photography is the fact that images captured in portrait rather than landscape format typically need to be manually identified and rotated by the user for proper viewing and printing on a PC. This project will develop methods to identify whether or not a given photographic image is "right-side-up" and if necessary perform the required 90 degree (clockwise or anti-clockwise) or 180 degree rotation.

Some modern digital cameras have orientation sensors which allow the orientation of the image to be inferred based on file metadata. However, in most cases some analysis of the visual content of the image is required. A number of suitable methods for this have recently been proposed in the literature. For example, one can consider the predominant orientation of line segments in the image or infer the position of content such as sky to estimate the orientation of the horizon and hence the image.

Useful references include Wang 2004 (PDF available here, links to cited papers here), Zhang 2002, and Vailaya 2002. A good overview is given in the recent MSc dissertation by Su.


Chris Town, Copyright 2005