Project Suggestions by Chris Town

Here are my project suggestions for Part II or Diploma students in the academic year 2007/2008. Some of the information on last year's suggestions may also be relevant. I have supervised about 22 Part II and Diploma projects in recent years, most of which received 1st class marks, with several being singled out for special commendation by the examiners. I also recently co-authored five academic papers together with former project students of mine. In short, my project suggestions are likely to be challenging but I am fully committed to putting in a lot of effort to provide the best support I can to make sure the project is completed successfully. Who knows, you might end up having a lot of fun too!

The platform of choice for implementation of most projects is Matlab, which is available in most Colleges and in the CL. Matlab has excellent facilities for numerical computation and visualisation, and there are many useful toolboxes (e.g. for image processing, statistics, optimisation, neural networks). For reasons of runtime efficiency, it might however be appropriate to implement part of the required functionality in a lower level compiled language such as Java or C++ and integrate such modules into Matlab by means of the Matlab compiler package. There are various free computer vision packages available which use or support C/C++ such as OpenCV, VXL, and Lush.

No previous experience of image and video processing is required, just enthusiasm. The projects are challenging in that they address interesting research problems, but plenty of support will be available. Apart from an interest in the project, a reasonable grounding in continuous mathematics and probability theory would be helpful, as would proficiency with high level programming languages such as Java, C++, or the Matlab environment.

As regards general references, I particularly recommend the textbook by Forsyth and Ponce on computer vision. References on image processing, such as the book by Gonzales and Woods, and on numerical methods, such as Numerical Recipes in C++, might also be handy. Useful online resources for computer vision include CVonline and the Computer Vision Homepage at CMU. The best (free) online tools for finding papers etc. are Google, Google Scholar, and Citeseer. There are many online tutorials for Matlab, this is a local one at CUED. Eventually you might want to use Tex/LaTex to produce your dissertation, here is a basic introduction and further information can be found here and here.

Some of the following descriptions are still a bit brief and open to interpretation, watch this space or (better) contact me to find out more. I may add or change project suggestions as the various deadlines approach.

Semi-supervised image annotation

Automated image Annotation is a prosmising approach for searching through large collections of unlabelled images. When it comes to object recognition, one of the major problems is to have a large set of labeled examples of the required object class. Usually these data sets consist of images generated through manual annotation. The other way these data sets are generated is through expanding an existing image set using automated techniques, such as perturbing exemplars in some way, or even generating new objects and views using computer graphics.

Conversely, the internet can be viewed as a vast repository of poorly labeled training data. There are several billion images on the web, most of which have some associated text that may or may not be related to some aspects of the image content. For example, a picture might carry a caption such as "Tom Cruise getting married to Katie Homes" or "Football world cup in Germany". Some text strings, such as names of famous people e.g. “Ronan Keating” or "Michael Jackson"; and places e.g. “Oxford” or "Cambridge", or terms for generic settings e.g. "forest" and "beach"; and objects e.g. “ball” and "ships" occur so often that one can build a statistical model of the co-occurrence between such words and images of their denoted content. This mapping is likely to be very noisy and uncertain, for example words found on a webpage may not clearly relate to the content of proximate images, or there may be ambiguity regarding which parts of an image they refer to. For example there might be several people in the picture or several objects in an image. Recently there has been some research which shows how fairly simple features extracted from images can be used to train probabilistic models to predict appropriate labels for images after having trained them on a sufficiently large corpus, such as the web.

Automated Scene Classification for Image Enhancement

Cameras often have special shooting modes such as Portrait, Sports, Landscape, and Night mode. These result in automated adjustment of camera settings such as exposure, shutter speed, ISO, white balance, the focus mode, and the sharpening of the image. The photographer must manually select these modes before taking the photo. This project will implement a set of image classiers to automaticall categorise an image into one or several of these modes to allow automated image enhancement after capture.

Cascaded multi-pose face detection

Face detection remains an important and challenging problem. Much progress has recently been made using modern machine learning methods applied to very large data sets of faces. However, most of these methods are restricted to a particular view of a face (e.g. full frontal) and are sensitive with respect to light conditions, occlusions (glasses, beards, hair etc.), scale and noise (which is a particular problem in the case of e.g. CCTV footage).

Most face detection methods only work on full frontal images. This project will investigate ways to make face detection work for different poses (out-of-plane rotations, e.g. frontal, 30 degree angle, and profile). The simplest option would be to train an existing face detector with an extended training set to create either a single multi-pose face detector (the danger being that this may suffer from a high false positive rate) or a set of detectors (which will require some disambiguation to deal with multiple detections of the same face). I already have some data sets and face detector code which could be used as a starting point.

Detection of man-made objects

Artificial objects such as buildings and cars can often be distinguished on the basis of simple properties such as the density and orientation of straight line segments and texture or colour features. Rather than building specific detectors and recognisers for each conceivable class of object, it should therefore be possible to construct a classifier which incorporates such heuristics to broadly distinguish "artificial" from "natural" objects in photographic images (ignoring special cases such as crystals).

Further reading: Zhu 2004 demonstrates an approach to car detection based on integrating multiple cues. Agarwal 2004 proposes a parts-based representation for object detection. Iqbal 2002 shows how perceptual grouping techniques can be applied to classify images containing large scale man-made objects. Also look here for an overview of object detection and recognition methods.