Project Suggestions by Chris Town

Here are my project suggestions for Part II or Diploma students in the academic year 2007/2008. Some of the information on last year's suggestions may also be relevant. I have supervised about 20 Part II and Diploma projects in recent years, most of which received 1st class marks, with several being singled out for special commendation by the examiners. I also recently co-authored five academic papers together with former project students of mine. In short, my project suggestions are likely to be challenging but I am fully committed to putting in a lot of effort to provide the best support I can to make sure the project is completed successfully. Who knows, you might end up having a lot of fun too!

The platform of choice for implementation of most projects is Matlab, which is available in most Colleges and in the CL. Matlab has excellent facilities for numerical computation and visualisation, and there are many useful toolboxes (e.g. for image processing, statistics, optimisation, neural networks). For reasons of runtime efficiency, it might however be appropriate to implement part of the required functionality in a lower level compiled language such as Java or C++ and integrate such modules into Matlab by means of the Matlab compiler package. There are various free computer vision packages available which use or support C/C++ such as OpenCV, VXL, and Lush.

No previous experience of image and video processing is required, just enthusiasm. The projects are challenging in that they address interesting research problems, but plenty of support will be available. Apart from an interest in the project, a reasonable grounding in continuous mathematics and probability theory would be helpful, as would proficiency with high level programming languages such as Java, C++, or the Matlab environment.

As regards general references, I particularly recommend the textbook by Forsyth and Ponce on computer vision. References on image processing, such as the book by Gonzales and Woods, and on numerical methods, such as Numerical Recipes in C++, might also be handy. Useful online resources for computer vision include CVonline and the Computer Vision Homepage at CMU. The best (free) online tools for finding papers etc. are Google, Google Scholar, and Citeseer. There are many online tutorials for Matlab, this is a local one at CUED. Eventually you might want to use Tex/LaTex to produce your dissertation, here is a basic introduction and further information can be found here and here.

Some of the following descriptions are still a bit brief and open to interpretation, watch this space or (better) contact me to find out more. I may add or change project suggestions as the various deadlines approach.

Automated image annotation

A big problem in the use of machine learning for object recognition is the need to have a large set of labelled examples of the desired object class. Such data sets are often generated through manual annotation, a tedious and expensive process, or by expanding an existing image set using automated techniques (e.g. perturbing exemplars in some way, or even generating new objects and views using computer graphics).

Conversely, the internet can be viewed as a vast repository of "weakly labelled" training data. There are several billion images on the web, most of which have some associated text that may or may not be related to some aspects of the image content. For example, a picture might carry a caption such as "President George Bush meeting PM Tony Blair" or "Christmas holiday in Hawaii". Some text strings, such as names of famous people ("David Beckham") and places ("Paris"), or terms for generic settings ("beach") and objects ("cars"), occur so often that one can build a statistical model of the co-occurrence between such words and images of their denoted content. This mapping is likely to be very noisy and uncertain, for example words found on a webpage may not clearly relate to the content of proximate images, or there may be ambiguity regarding which parts of an image they refer to (e.g. there might be several people or objects in an image).

Recently there has been some research which shows how fairly simple features extracted form images can be used to train probabilistic models to predict appropriate labels for images after having trained them on a sufficiently large corpus (e.g. from the web). Some data sets and relevant image processing code will be provided. I may also be in a position to provide substantial processing resources to ensure that large (multi-million image) datasets could be used as part of the evaluation.

Automated aesthetic analysis of photographic images

Most image retrieval research is only concerned with finding images which are relevant to a given query. However, users also generally want to find "good images", i.e. pictures which are artistically pleasing and have good visual properties such as composition, focus, contrast, lighting, etc..

This project will consider ways of automatically identifying criteria that allow one to distinguish "good" from "bad" images. While aesthetic appeal is ultimately a subjective human attributed quality, there are features such as composition and contrast which can be assessed algorithmically. For some applications such as a portrait shots it may be sufficient to specify heuristics over such features manually, whereas for broader categories such as scenery images some form of machine learning or automated clustering could be more appropriate. I have some relevant data sets and image processing code that could form the basis for the project.

Generic object detection using object boundary models

Most object recognition methods are either appearance based or model based. However, some recent work has shown that simpler features such as line and outline based models can be more efficient in practice.

This project will implement a generic object detection module based on the work of Jamie Shotton at the Engineering Department. This will entail development of a fast tree based fitting procedure for finding generic objects from line and region boundary information in images. The code and datasets used for Jamie's PhD work are available here and I have various other datasets that can be used to assess and optimise the performance of the methods generated by the project. Also look here for an overview of object detection and recognition methods.

Detection of man-made objects

Artificial objects such as buildings and cars can often be distinguished on the basis of simple properties such as the density and orientation of straight line segments and texture or colour features. Rather than building specific detectors and recognisers for each conceivable class of object, it should therefore be possible to construct a classifier which incorporates such heuristics to broadly distinguish "artificial" from "natural" objects in photographic images (ignoring special cases such as crystals).

Further reading: Zhu 2004 demonstrates an approach to car detection based on integrating multiple cues. Agarwal 2004 proposes a parts-based representation for object detection. Iqbal 2002 shows how perceptual grouping techniques can be applied to classify images containing large scale man-made objects. Also look here for an overview of object detection and recognition methods.

Linking names to image content for face analysis

Many news images carry captions or annotations such as "George Bush meeting Tony Blair". By extracting such phrases (so called named entity recognition, NER) from text and linking this to the output of face detectors, it is possible to automatically derive labelled training data which could be used to build face recognisers for frequently photographed people such as politicians and other celebrities. Search engines such as Google and Yahoo already extract image captions and textual context and many image archives carry basic annotations. Depending on interest, the project could emphasise the natural language processing aspects of NER, or build a simple parser (e.g. using a fixed name list) and focus more on the computer vision tasks of identifying data outliers (mislabeled faces) and training face recognisers. The task will be made easier through the use of existing face detection and other image analysis code.

References:

Multi-pose face detection

Face detection remains an important and challenging problem. Much progress has recently been made using modern machine learning methods applied to very large data sets of faces. However, most of these methods are restricted to a particular view of a face (e.g. full frontal) and are sensitive with respect to light conditions, occlusions (glasses, beards, hair etc.), scale and noise (which is a particular problem in the case of e.g. CCTV footage).

Most face detection methods only work on full frontal images. This project will investigate ways to make face detection work for different poses (out-of-plane rotations, e.g. frontal, 30 degree angle, and profile). The simplest option would be to train an existing face detector with an extended training set to create either a single multi-pose face detector (the danger being that this may suffer from a high false positive rate) or a set of detectors (which will require some disambiguation to deal with multiple detections of the same face). I already have some data sets and face detector code which could be used as a starting point.

For additional information, have a look at the following:


Chris Town, Copyright 2007