VARIATIONAL DETERMINISTIC UNCERTAINTY QUANTIFICATION

Abstract

Building on recent advances in uncertainty quantification using a single deep deterministic model (DUQ), we introduce variational Deterministic Uncertainty Quantification (vDUQ). We overcome several shortcomings of DUQ by recasting it as a Gaussian process (GP) approximation. Our principled approximation is based on an inducing point GP in combination with Deep Kernel Learning. This enables vDUQ to use rigorous probabilistic foundations, and work not only on classification but also on regression problems. We avoid uncertainty collapse away from the training data by regularizing the spectral norm of the deep feature extractor. Our method matches SotA accuracy, 96.2% on CIFAR-10, while maintaining the speed of softmax models, and provides uncertainty estimates competitive with Deep Ensembles. We demonstrate our method in regression problems and by estimating uncertainty in causal inference for personalized medicine.

1. INTRODUCTION

Deploying machine learning algorithms as part of automated decision making systems, such as self driving cars and medical diagnostics, requires implementing fail-safes. Whenever the model is presented with a novel or ambiguous situation, it would not be wise to simply trust its prediction. Instead, the system should try to get more information or simply withhold or defer judgment. While significant progress has been made towards estimating predictive uncertainty reliably in deep learning (Gal & Ghahramani, 2016; Lakshminarayanan et al., 2017) , there is no single method that is shown to work on large datasets in classification and regression without significant computation overheads, such as multiple forward passes. We propose Variational Deterministic Uncertainty Quantification (vDUQ), a method for obtaining predictive uncertainty in deep learning for both classification and regression problems in only a single forward pass. In previous work, van Amersfoort et al. (2020) show that combining a distance aware decision function with a regularized feature extractor in the form of a deep RBF network, leads to a model (DUQ) that matches a softmax model in accuracy, but is competitive with Deep Ensembles for uncertainty on large datasets. The feature extractor is regularized using a two-sided gradient penalty, which encourages the model to be sensitive to changes in the input, avoiding feature collapse, and encouraging generalization by controlling the Lipschitz constant. This model, however, has several limitations; for example the uncertainty (a distance in feature space) cannot be interpreted probabilistically and it is difficult to disentangle aleatoric and epistemic uncertainty. Additionally, the loss function and centroid update scheme are not principled and do not extend to regression tasks. A probabilistic and principled alternative to deep RBF networks are Gaussian Processes (GPs) in combination with Deep Kernel Learning (DKL) (Hinton & Salakhutdinov, 2008; Wilson et al., 2016b) . DKL was introduced as a "best of both worlds" solution: apply a deep model on the training data and learn the GP in feature space, ideally getting the advantages of both models. In practice, however, DKL suffers from the same failure as Deep RBF networks: the deep model is free to map out of distribution data close to the feature representation of the training data, removing the attractive properties of GPs with distance sensitive kernels. Using insights from DUQ, we are able to mitigate the problems of uncertainty collapse in DKL. In particular, we use direct spectral normalization (Gouk et al., 2018; Miyato et al., 2018) in combination with a ResNet (He et al., 2016) , a variation that was suggested in Liu et al. (2020) . The spectral 1

