WHAT'S BEHIND THE MASK: ESTIMATING UNCERTAINTY IN IMAGE-TO-IMAGE PROBLEMS Anonymous

Abstract

Estimating uncertainty in image-to-image networks is an important task, particularly as such networks are being increasingly deployed in the biological and medical imaging realms. In this paper, we introduce a new approach to this problem based on masking. Given an existing image-to-image network, our approach computes a mask such that the distance between the masked reconstructed image and the masked true image is guaranteed to be less than a specified threshold, with high probability. The mask thus identifies the more certain regions of the reconstructed image. Our approach is agnostic to the underlying image-to-image network, and only requires triples of the input (degraded), reconstructed and true images for training. Furthermore, our method is agnostic to the distance metric used. As a result, one can use L p -style distances or perceptual distances like LPIPS, which contrasts with interval-based approaches to uncertainty. Our theoretical guarantees derive from a conformal calibration procedure. We evaluate our mask-based approach to uncertainty on image colorization, image completion, and super-resolution tasks, demonstrating high quality performance on each.

1. INTRODUCTION

Deep Learning has been very successful in many applications, spanning computer vision, speech recognition, natural language processing, and beyond. For many years, researchers were mainly content to develop new techniques to achieve unprecedented accuracy, without concern for understanding the uncertainty implicit in such models. More recently, however, there has been a concerted effort within the research community to understand and quantify the uncertainty of deep models. This paper addresses the problem of estimating uncertainty in the realm of image-to-image (sometimes referred to as image reconstruction) tasks. Such tasks include super-resolution, deblurring, colorization, and image completion, amongst others. Computing the uncertainty is important generally, but is particularly so in application domains such as biological and medical imaging, in which fidelity to the ground truth is paramount. If there is an area of the reconstructed image where such fidelity is unlikely or unreliable due to high uncertainty, this is crucial to convey. Our approach to uncertainty estimation is based on masking. Specifically, we are interested in the possibility of computing a mask such that the uncertain regions in the image are masked out. More formally, we would like the distance between the masked reconstructed image and the masked true image to be small in expectation. Ideally, the method should be agnostic to the choice of distance function, which should be dictated by the application. A high level overview of our approach is illustrated in Figure 1 . We show a direct connection between the mask and a theoretically well-founded definition of uncertainty that matches image-to-image tasks. We then derive an algorithm for computing such a mask which can apply to any existing (i.e. pre-trained) image-to-image network, and any distance function between image pairs. All that is required for training the mask network is triplets of the input (degraded), reconstructed and true images. Using a procedure based on conformal prediction (Angelopoulos & Bates, 2021a), we can guarantee that the masks so produced satisfy the following criterion: the distance between the masked reconstructed image and the masked true image is guaranteed to be less than a specified threshold, with high probability. We demonstrate the power of the method on image colorization, image completion, and super-resolution tasks.

