UNCERTAINTY ESTIMATION VIA DISCRETE LATENT REPRESENTATION

Abstract

Many important problems in the real world don't have unique solutions. It is thus important for machine learning models to be capable of proposing different plausible solutions with meaningful probability measures. In this work we propose a novel deep learning based framework, named modal uncertainty estimation (MUE), to learn the one-to-many mappings between the inputs and outputs, together with faithful uncertainty estimation. Motivated by the multi-modal posterior collapse problem in current conditional generative models, MUE uses a set of discrete latent variables, each representing a latent mode hypothesis that explains one type of input-output relationship, to generate the one-to-many mappings. Benefit from the discrete nature of the latent representations, MUE can estimate any input the conditional probability distribution of the outputs effectively. Moreover, MUE is efficient during training since the discrete latent space and its uncertainty estimation are jointly learned. We also develop the theoretical background of MUE and extensively validate it on both synthetic and realistic tasks. MUE demonstrates (1) significantly more accurate uncertainty estimation than the current state-of-the-art, and (2) its informativeness for practical use.

1. INTRODUCTION

Making predictions in the real world has to face with various uncertainties. One of the arguably most common uncertainties is due to partial or corrupted observations, as such it is often insufficient for making a unique and deterministic prediction. For example, when inspecting where a single CT scan of a patient contains lesion, without more information it is possible for radiologists to reach different conclusions, as a result of the different hypotheses they have about the image. In such an ambiguous scenario, the question is thus, given the observable, which one(s) out of the many possibilities would be more reasonable than others? Mathematically, this is a one-to-many mapping problem and can be formulated as follows. Suppose the observed information is x ∈ X in the input space, we are asked to estimate the conditional distribution p(y|x) for y ∈ Y in the prediction space, based on the training sample pairs (x, y). There are immediate challenges that prevent p(y|x) being estimated directly in practical situations. First of all, both X and Y, e.g.as spaces of images, can be embedded in very high dimensional spaces with very complex structures. Secondly, only the unorganized pairs (x, y), not the one-tomany mappings x → {y i } i , are explicitly available. Fortunately, recent advances in conditional generative models based on Variational Auto-Encoder (VAE) framework from Kingma & Welling (2014) shed light on how to tackle our problem. By modelling through latent variables c = c(x), one aims to explain the underlying mechanism of how y is assigned to x. And hopefully, variation of c will result in variation in the output ŷ(x, c), which will approximate the true one-to-many mappings distributionally. Many current conditional generative models, including cVAE in Sohn et al. (2015) , BiCycleGAN in Zhu et al. (2017b) , Probabilistic U-Net in Kohl et al. (2018) , etc., are developed upon the VAE framework, with Gaussian distribution with diagonal covariance as the de facto parametrization of the latent variables. However, in the following we will show that such a parametrization put a dilemma between model training and actual inference, as a form of what is known as the posterior collapse problem in the VAE literature Alemi et al. (2018); Razavi et al. (2018) . This issue is particularly easy to understand in our setting, where we assume there are multiple y's for a given x.

