EVALUATING ROBUSTNESS OF PREDICTIVE UNCER-TAINTY ESTIMATION: ARE DIRICHLET-BASED MOD-ELS RELIABLE?

Abstract

Robustness to adversarial perturbations and accurate uncertainty estimation are crucial for reliable application of deep learning in real world settings. Dirichletbased uncertainty (DBU) models are a family of models that predict the parameters of a Dirichlet distribution (instead of a categorical one) and promise to signal when not to trust their predictions. Untrustworthy predictions are obtained on unknown or ambiguous samples and marked with a high uncertainty by the models. In this work, we show that DBU models with standard training are not robust w.r.t. three important tasks in the field of uncertainty estimation. First, we evaluate how useful the uncertainty estimates are to (1) indicate correctly classified samples. Our results show that while they are a good indicator on unperturbed data, performance on perturbed data decreases dramatically. (2) We evaluate if uncertainty estimates are able to detect adversarial examples that try to fool classification. It turns out that uncertainty estimates are able to detect FGSM attacks but not able to detect PGD attacks. We further evaluate the reliability of DBU models on the task of (3) distinguishing between in-distribution (ID) and out-of-distribution (OOD) data. To this end, we present the first study of certifiable robustness for DBU models. Furthermore, we propose novel uncertainty attacks that fool models into assigning high confidence to OOD data and low confidence to ID data, respectively. Both approaches show that detecting OOD samples and distinguishing between ID-data and OOD-data is not robust. Based on our results, we explore the first approaches to make DBU models more robust. We use adversarial training procedures based on label attacks, uncertainty attacks, or random noise and demonstrate how they affect robustness of DBU models on ID data and OOD data.

1. INTRODUCTION

Neural networks achieve high predictive accuracy in many tasks, but they are known to have two substantial weaknesses: First, neural networks are not robust against adversarial perturbations, i.e., semantically meaningless input changes that lead to wrong predictions (Szegedy et al., 2014; Goodfellow et al., 2015) . Second, neural networks tend to make over-confident predictions at test time (Lakshminarayanan et al., 2017) . Even worse, standard neural networks are unable to identify samples that are different from the samples they were trained on. In these cases, they provide uninformed decisions instead of abstaining. These two weaknesses make them impracticable in sensitive domains like financial, autonomous driving or medical areas which require trust in predictions. To increase trust in neural networks, models that provide predictions along with the corresponding uncertainty have been proposed. There are three main families of models that aim to provide meaningful estimates of their predictive uncertainty. The first family are Bayesian Neural Networks (Blundell et al., 2015; Osawa et al., 2019; Maddox et al., 2019) , which have the drawback that they are computationally demanding. The second family consists of Monte-Carlo drop-out based models (Gal & Ghahramani, 2016) and ensembles (Lakshminarayanan et al., 2017) that estimate uncertainty by computing statistics such as mean and variance by aggregating forward passes of multiple models. A disadvantage of all of these models is that uncertainty estimation at inference time is expensive.

