Computer Laboratory

Projects

Real and virtual sphere photographed in our ultra-realistic HDR-MF-S display.

Overview

The goal of the project is to convey the sense of perceiving real scenes when viewing content on a custom-built electronic display. More broadly, we want to capture, encode and display highly realistic images that go beyond typical 2D images, video or 3D stereo content.

Being able to capture, represent and display visual content is important for several emerging technologies and applications, such as AR/VR, remote operation, remote exploration, telepresence and entertainment. For example, we want to be able to send robotic drones to places where it is too expensive or too risky to send people (space exploration, deep-sea exploration, disaster areas) and still be able to perceive, experience and interact in those environments as we were present there.

The problem area is very broad, and in this project, we focus on how we can exploit the limitations of our visual system to reduce the amount of data and hardware requirements for a perceptually realistic imaging pipeline. We want to capture, encode and display only the visual information that is visible to the human eye and ignore anything that is imperceivable.

Research

The following sections cover the main areas of investigation:

Capture

We were able to build camera systems (rigs) for capturing high dynamic range light fields of both small and large-size scenes. To overcome the limitations of the capture, we explored the existing methods for 3D scene acquisition, from the traditional multi-view 3D stereo to the recent learning-based methods that rely on multi-plane images. Our initial investigation found that existing multi-view / light field methods, which do not attempt to recover 3D information, do not offer sufficient quality and data efficiency for our application. Therefore, we started using methods that either attempt to recover depth or rely on the existing depth information from other sources. We explored the recent differentiable volumetric rendering techniques, such as neural radiance fields, but we found those unsuitable for high-resolution light fields, and incapable of delivering the required quality. We also found that the colour accuracy of the existing imaging pipelines is insufficient for our ultra-realistic display. To that end, we developed methods for more accurate high-dynamic range merging [HDRutils].

Visual models

To develop solutions that convey the optimal amount of visual information, we need to develop visual models that can predict what is and what is not visible to the human eye. To that end, we built comprehensive models of the spatial-chromatic contrast sensitivity and spatio-temporal contrast sensitivity. We also collected datasets of visible differences [Visually lossless compression, LocVisVC] and the degradation of image quality due to distortions [UPIQ]. All those, let us create new visibility and quality metrics for images [DPVM] and video [FovVideoVDP], based on both machine learning techniques and psychophysical models of vision. Such metrics could be used to optimize imaging, video transmission and display systems to align their performance with perceptual limitations.

Encoding

We have made substantial progress in terms of efficient encoding of visual content in three domains: temporal, luminance contrast and colour. We came up with a technique, called temporal resolution multiplexing (TRM), that allows displaying smooth motion at high frame rates while rendering and encoding every second frame at half the resolution [TRM]. This work has been awarded the Best IEEE VR Journal Paper Award in 2019. To further exploit the limitation of spatio-temporal vision, we devised a control mechanism for finding the best trade-off between spatial and temporal resolution [Motion quality], which we further extended to also control local shading rate in the new generation of GPUs [ALSaRR].

We developed a new encoding of luminance (PU21), which lets us represent high dynamic range luminance values in a perceptually uniform manner. This lets us adapt a large number of metrics intended for standard dynamic range content to also work on high dynamic range content.

To efficiently encode high dynamic range colour information, we need to find a colour space which can encode colour with the fewest possible number of bits and without revealing banding or contouring artifacts. To address this problem, we developed psychophysical models of banding visibility, both for luminance and colour.

Display

We have completed the construction of a high-dynamic-range multi-focal stereo (HDR-MF-S) display, which delivers high brightness (4000 nit), deep blacks, high resolution (100 pixels per degree), stereo disparity and two focal planes for accommodation and defocus depth cues. Furthermore, the display has a see-through capability, so it is possible to see the displayed images on top of a real-world scene (like in AR displays) or to see the displayed image alone. The display is equipped with an eye-tracking camera, which can provide feedback on the position of the eyes. All this is combined with a real-time 3D rendering algorithm that can deliver images matching the appearance of the real scenes.

Efficient perceptual measurements

Our work, to a large degree, relies on perceptual measurements. Since collecting perceptual data typically requires tedious psychophysical experiments, we devoted some effort to new machine-learning techniques to make such measurements as efficient and accurate as possible. For those purposes, we have developed a new active sampling method that lets us collect data in an optimal manner by sampling the points in our problem space that deliver the most information [ASAP]. This work has been recognized with the best paper award.

Contact

Please contact Rafał K. Mantiuk with any questions regarding the project.