WHAT'S BEHIND THE MASK: ESTIMATING UNCERTAINTY IN IMAGE-TO-IMAGE PROBLEMS Anonymous

Abstract

Estimating uncertainty in image-to-image networks is an important task, particularly as such networks are being increasingly deployed in the biological and medical imaging realms. In this paper, we introduce a new approach to this problem based on masking. Given an existing image-to-image network, our approach computes a mask such that the distance between the masked reconstructed image and the masked true image is guaranteed to be less than a specified threshold, with high probability. The mask thus identifies the more certain regions of the reconstructed image. Our approach is agnostic to the underlying image-to-image network, and only requires triples of the input (degraded), reconstructed and true images for training. Furthermore, our method is agnostic to the distance metric used. As a result, one can use L p -style distances or perceptual distances like LPIPS, which contrasts with interval-based approaches to uncertainty. Our theoretical guarantees derive from a conformal calibration procedure. We evaluate our mask-based approach to uncertainty on image colorization, image completion, and super-resolution tasks, demonstrating high quality performance on each.

1. INTRODUCTION

Deep Learning has been very successful in many applications, spanning computer vision, speech recognition, natural language processing, and beyond. For many years, researchers were mainly content to develop new techniques to achieve unprecedented accuracy, without concern for understanding the uncertainty implicit in such models. More recently, however, there has been a concerted effort within the research community to understand and quantify the uncertainty of deep models. This paper addresses the problem of estimating uncertainty in the realm of image-to-image (sometimes referred to as image reconstruction) tasks. Such tasks include super-resolution, deblurring, colorization, and image completion, amongst others. Computing the uncertainty is important generally, but is particularly so in application domains such as biological and medical imaging, in which fidelity to the ground truth is paramount. If there is an area of the reconstructed image where such fidelity is unlikely or unreliable due to high uncertainty, this is crucial to convey. Our approach to uncertainty estimation is based on masking. Specifically, we are interested in the possibility of computing a mask such that the uncertain regions in the image are masked out. More formally, we would like the distance between the masked reconstructed image and the masked true image to be small in expectation. Ideally, the method should be agnostic to the choice of distance function, which should be dictated by the application. A high level overview of our approach is illustrated in Figure 1 . We show a direct connection between the mask and a theoretically well-founded definition of uncertainty that matches image-to-image tasks. We then derive an algorithm for computing such a mask which can apply to any existing (i.e. pre-trained) image-to-image network, and any distance function between image pairs. All that is required for training the mask network is triplets of the input (degraded), reconstructed and true images. Using a procedure based on conformal prediction (Angelopoulos & Bates, 2021a), we can guarantee that the masks so produced satisfy the following criterion: the distance between the masked reconstructed image and the masked true image is guaranteed to be less than a specified threshold, with high probability. We demonstrate the power of the method on image colorization, image completion, and super-resolution tasks. Our contributions are as follows: • We present an approach to uncertainty computation based on masking. We show that moving from binary to continuous masks preserves the connection with a theoretically wellfounded definition of uncertainty that is relevant for image-to-image tasks. • We derive an algorithm for computing the masks which works for arbitrary image-to-image networks, and any distance function between image pairs. The masks are guaranteed to yield a small distance between masked ground truth and reconstructed images, with high probability. • We demonstrate the effectiveness of the method on image colorization, image completion, and super-resolution tasks attaining high quality performance on each.

NOTATIONS

Throughout the paper we represent an image as a column-stacked vector x ∈ R n . We use x k to denote the kth image of a collection, while x (i) marks the ith entry of x. We define #» 0 ∈ R n and #» 1 ∈ R n to be vectors of all zeros and ones respectively. The operation x y stands for the Hadamard (point-wise) product between x and y. For p ≥ 1, we denote the p -norm of x by x p p n i=1 |x (i) | p . The symbol d(•, •) represents a general distortion measure between two images. We define a continuous mask as a vector m ∈ [0, 1] n , where the average size of a mask equals #» 1 -m 1 /n such that the mask #» 1 (no-masking) has size 0 while the mask #» 0 has size of 1. Given a mask m, we define d m (x, y) d(m x, m y) as the distortion measure between the masked versions of images x and y. For a natural number n ≥ 1 we define the set [n] {1, ..., n}.

2. RELATED WORK

Bayesian Uncertainty Quantification The Bayesian paradigm defines uncertainty by assuming a distribution over the model parameters and/or activation functions. The most prevalent approach is Bayesian neural networks (MacKay, 1992; Valentin Jospin et al., 2020; Izmailov et al., 2020) , which are stochastic models trained using Bayesian inference. Yet, as the number of model parameters has



Figure1: A high level view of our approach: x is the degraded input (colorization in this example) to the regression model. x and ŷ (the colorized image) serve as the input to the masking model which predicts the uncertainty mask m. The ground-truth image is denoted by y. With high probability, the distortion between the masked images, y m and ŷ m is low.

