Islands of Confidence: Robust Neural Network Classification with Uncertainty Quantification

Abstract

We propose a Gaussian confidence measure and its optimization, for use in neural network classifiers. The measure comes with theoretical results, simultaneously resolving two pressing problems in Deep Neural Network classification: uncertainty quantification, and robustness. Existing research in uncertainty quantification mostly revolves around the confidence reflected in the input feature space. Instead, we focus on the learned representation of the network and analyze the confidence in the penultimate layer space. We formally prove that, independent of optimization-procedural effects, a set of centroids always exists such that softmax classifiers are nearestcentroid classifiers. Softmax confidence, however, does not reflect that the classification is based on nearest centroids: artificially inflated confidence is also given to out-of-distributions samples that are not near any centroid, but slightly less distant from one centroid than from the others. Our new confidence measure is centroid-based, and hence no longer suffers from the artificial confidence inflation of out-of-distribution samples. We also show that our proposed centroidal confidence measure is providing a robustness certificate against attacks. As such, it manages to reflect what the model doesn't know (as demanded by uncertainty quantification), and to resolve the issue of robustness of neural networks.

1. Introduction

The last layer of state-of-the-art neural networks computes the final classifications by approximation through the softmax function (Boltzmann, 1868) . This function partitions the transformed input space into Voronoi calls, each of which encompasses a single class. Conceptually, this is equivalent to putting a number of centroids in this transformed space, and clustering the data points in the dataset by proximity to these centroids through k-means. Several recent papers posed that exploring a relation between softmax and k-means can be beneficial (Kilinc & Uysal, 2018; Peng et al., 2018; Schilling et al., 2018) . The current state of scientific knowledge on the relation between k-means and softmax is empirical. In this paper, we theoretically prove that softmax is a centroid-based classifier, and we derive a centroid-based robustness certificate. This certificate motivates the usage of a confidence measurefoot_0 , the Gauss confidence, which reflects the distance of observations to their assigned centroids. Gauss confidence therefore expresses the uncertainties of the model; moreover, it indicates the vulnerabilities to attacks. We show that our Gauss networks can match (median absolute difference: 0.45 percentage points) the test accuracy of softmax networks, but at a lower confidence (as desired); both outperform the competing DUQ networks (van Amersfoort et al., 2020) when the dataset has many classes. The lower confidence also results in Gauss networks being much less susceptible to adversarial attacks. Hence, the islands of confidence as illustrated in the rightmost plot of Figure 1 reflect reality much better than the confidence landscapes of existing methods (cf. other two plots in Figure 1 ). et al., 2015) , and the effect that softmax probabilities are badly calibrated (Guo et al., 2017) . Figure 1 illustrates that this effect is not surprising: in the transformed feature space of the penultimate layer, points that do not lie directly on the decision boundaries are all confidently assigned to a class. Softmax confidence has no capability to express what the model doesn't know.

2.1. Quantifying Uncertainties of DNNs

To remedy the softmax issue, various methods have been proposed. Although networks with a centroid-based confidence intuitively make sense, training them is a major challenge. When using the softmax cross-entropy loss, pushing a transformed feature vector ϕ(x) away from one class region means that it is pushed towards other class regions. Employing a centroidal confidence, one could push a point away from all centroids. This introduces trivial global optima of common loss functions, which does not lead to a well-performing classifier. To this end, Wen et al. ( 2016 



previously published informally (non-peer-reviewed) in January as (NN et al., 2020)



Figure 1: Classification confidence plots of actual decision boundaries/areas of a d = 2dimensional penultimate layer space for three distinct Deep Neural Networks. The handwritten numbers indicate the MNIST class predictions in {0, . . . , 4}.

); Pang et al. (2019) minimize the distance of transformed feature points to given (fixed) centroids. Wan et al. (2018) employ a centroidal cross entropy loss, which directly optimizes for a nearest centroid classifier, but which does not increase the density around centroids. Likewise, Lebedev et al. (2018) define a confidence measure based on the normalized RBF kernel function. Hobbhahn et al. (2022); Mukhoti et al. (2021) propose a retraining of only the last layer to reflect uncertainties based on a multivariate Gaussian representation of confidences. Closest to our approach are Deterministic Uncertainty Classifiers (DUQs)(van Amersfoort et al., 2020), which directly optimize for a multivariate Gaussian confidence measure (cf. Figure1). The authors employ binary cross-entropy loss, which is given for one-hot encoded labels Y * ∈ R m×c for m samples and c classes byℓ BCE (f DUQ , Y * ) = -log (f DUQ (x j )) + (1 -Y * jk ) log (1 -f DUQ (x j )) f DUQ (x) k = exp -(ϕ(x) -Z •k ) ⊤ Σ -1 k (ϕ(x) -Z •k ) .

