Computer Laboratory



Active depth cues for stereoscopic displays

Picture of my binocular eye tracker.

My binocular eye tracker

On any display, depth cues (such as occlusion, shading and relative size) give a viewer the perception 3D content. Stereoscopic displays add the possibility of binocular depth cues (convergence and binocular disparity), however these new depth cues can conflict with other depth cues in the image; for example, when looking at an object close to the viewer, they would expect the background to be blurred to to the limited depth of focus of the human eye.

At best, this causes a loss of 3D perception, but at worst it can cause viewing discomfort and nausea. This is especially noticeable in personal stereo displays, such as computer monitors or 3DTVs, which are currently becoming more and more pervasive.

My research is into enhancing the comfort and 3D perception of content on personal stereoscopic 3D displays by actively changing the displayed content to match what a viewer would expect to see. In particular, I am building a binocular eye tracker on top of a pair of active stereo glasses, to track gaze position in 3D and adapt the focal blur of the displayed image to match the viewer's expectation.

Pupil Tracking

The pupil tracker I have developed as part of this project, for use with the binocular eye tracker, was published in ETRA 2012. See the project page for more details, the paper, source code and datasets.

Fully automatic, glint-free eye gaze estimation

In PETMEI 2013, I presented a glint-free, calibration-free approach to gaze estimation using a model fitted to several eye images over time. I also presented a realitic rendered eye model, rendered in Blender, which I used as ground truth. See the project page for more details, the paper, source code, the rendered dataset and the Blender model.

Cambridge AUV

In my spare time, I work on a project called Cambridge AUV (CAUV). This is a project to build an autonomous underwater vehicle for use in surveying and mapping large areas of water with little human input. For more information, check out the website:


Layered photo pop-up

Masters project

A common technique in documentaries, when video footage is not available, is to animate photographs by panning across them slowly. More recently, it has become popular to divide such photographs into layers, and to animate these layers as moving over each other to create a motion parallax effect, commonly known as the “3D Ken Burns effect”. Although this effect is now ubiquitous in documentaries, producing it involves a laborious manual process that requires hours of manual layer segmentation, clone-brushing, positioning in 3D, and adjusting the panning speeds of individual layers.

Given depth information, most of this work can be automated. This project investigated how this could be achieved in practice. I developed a novel workflow which, given an image with depth, mimics the manual creation of this motion parallax effect, by creating a layered image representation. Objects in the image are segmented into separate layers, and the regions behind them are filled by inpainting from the surrounding background. This imitation of the manual process gives a user the means to adjust the layers at various stages; for example, objects can be manually marked for segmentation, or automatically filled regions can be augmented using human knowledge of the scene. However, the amount of user interaction needed is usually minimal; for example, precise segmentation can be achieved with only a rough labelling.

The final result of the system is a layered image which contains colour and depth information for each layer, and can be rendered as the user desires. The project also included a real-time mesh-based renderer that can render the layered image from novel views, with an optional depth-of-field effect.

Automatic people removal from photographs

Undergraduate final year project

The aim of this project was to write a program that, given a set of images taken from the same approximate location, can automatically infer the background by aligning the images, detecting foreground obstructions in each photo, and constructing a new image out of the remaining background regions. It allows a user to manually align the images, and to manually amend the anomaly selection.