IMPROVING CALIBRATION THROUGH THE RELATION-SHIP WITH ADVERSARIAL ROBUSTNESS

Abstract

Neural networks lack adversarial robustness -they are vulnerable to adversarial examples that through small perturbations to inputs cause incorrect predictions. Further, trust is undermined when models give miscalibrated uncertainty estimates, i.e. the predicted probability is not a good indicator of how much we should trust our model. In this paper, we study the connection between adversarial robustness and calibration on four classification networks and datasets. We find that the inputs for which the model is sensitive to small perturbations (are easily attacked) are more likely to have poorly calibrated predictions. Based on this insight, we examine if calibration can be improved by addressing those adversarially unrobust inputs. To this end, we propose Adversarial Robustness based Adaptive Label Smoothing (AR-AdaLS) that integrates the correlations of adversarial robustness and uncertainty into training by adaptively softening labels for an example based on how easily it can be attacked by an adversary. We find that our method, taking the adversarial robustness of the in-distribution data into consideration, leads to better calibration over the model even under distributional shifts. In addition, AR-AdaLS can also be applied to an ensemble model to further improve model's calibration.

1. INTRODUCTION

The robustness of machine learning algorithms is becoming increasingly important as ML systems are being used in higher-stakes applications. In one line of research, neural networks are shown to lack adversarial robustness -small perturbations to the input can successfully fool classifiers into making incorrect predictions (Szegedy et al., 2014; Goodfellow et al., 2014; Carlini & Wagner, 2017b; Madry et al., 2017; Qin et al., 2020b) . In largely separate lines of work, researchers have studied uncertainty in model's predictions. For example, models are often miscalibrated where the predicted confidence is not indicative of the true likelihood of the model being correct (Guo et al., 2017; Thulasidasan et al., 2019; Lakshminarayanan et al., 2017; Wen et al., 2020; Kull et al., 2019) . The calibration issue is exacerbated when models are asked to make predictions on data different from the training distribution (Snoek et al., 2019) , which becomes an issue in practical settings where it is important that we can trust model predictions under distributional shift. Despite robustness, in all its forms, being a popular area of research, the relationship between these perspectives has not been extensively explored previously. In this paper, we study the correlation between adversarial robustness and calibration. We discover that input data that are sensitive to small adversarial perturbations (are easily attacked) are more likely to have poorly calibrated predictions. This holds true on a number of network architectures for classification and on all the datasets that we consider: SVHN (Netzer et al., 2011 ), CIFAR-10 (Krizhevsky, 2009) , CIFAR-100 (Krizhevsky, 2009) and ImageNet (Russakovsky et al., 2015) . This suggests that the miscalibrated uncertainty estimates on those adversarially unrobust data greatly degrades the performance of a model's calibration. Based on this insight, we hypothesize and study if calibration can be improved by giving different supervision to the model depending on adversarial robustness of each training data. To this end, we propose Adversarial Robustness based Adaptive Label Smoothing (AR-AdaLS) that integrates the correlations of adversarial robustness and calibration into training by adaptively smoothing the training labels conditioned on how unrobust an input is. Our method improves label smoothing (Szegedy et al., 2014) by explicitly teaching the model to differentiate the training data according to their adversarial robustness and then adaptively smooth their labels. By giving different supervision to the training data, our method leads to better calibration over the model without an increase of latency during inference. In particular, since adversarially unrobust data can be considered as an outlier of the underlying data distribution (Carlini et al., 2019) , our method, by taking the adversarial robustness of the in-distribution data into consideration during training, can even greatly improve model's calibration on held-out shifted data. Further, we propose "AR-AdaLS of Ensemble" to combine our AR-AdaLS and deep ensembles (Lakshminarayanan et al., 2017) , which is the state-of-the-art method especially under distributional shift (Snoek et al., 2019) , to further improve the calibration performance for shifted data. Last, we find an additional benefit of AR-AdaLS is improving model stability (i.e., decreasing variance), which is valuable in practical applications where changes in predictions across runs (churn) is problematic and deploying ensembles is too costly (Milani Fard et al., 2016) . In summary, our main contributions are as follows: • Relationship among Robustness Metrics: We find a significant correlation between adversarial robustness and calibration: inputs that are unrobust to adversarial attacks are more likely to have poorly calibrated predictions. • Algorithm: We propose AR-AdaLS to automatically learn how much to soften the labels of training data based on their adersarial robustness. Further, we introduce "AR-AdaLS of Ensemble" to show how to apply AR-AdaLS to an ensemble model. • Experimental Analysis: On CIFAR-10, CIFAR-100 and ImageNet, we find that AR-AdaLS is more effective than previous label smoothing methods in improving calibration, particularly for shifted data. Further, we find that while ensembling can be beneficial, applying AR-AdaLS to adaptively calibrate ensembles offers further improvements over calibration.

2. RELATED WORK

Uncertainty Estimates How to better estimate a model's predictive uncertainty is an important research topic, since many models with a focus on accuracy may fall short in predictive uncertainty. A popular way to improve a model's predictive uncertainty is to make the model well-calibrated, e.g., post-hoc calibration by temperature scaling (Guo et al., 2017) , and multiclass Dirichlet calibration (Kull et al., 2019) . In addition, Bayesian neural networks, through learning a posterior distribution over network parameters, can also be used to quantify a model's predictive uncertainty, e.g., Graves 



;Blundell et al. (2015);Welling & Teh (2011). Dropout-based variational inference(Gal &  Ghahramani, 2016; Kingma et al., 2015)  can help DNN models make less over-confident predictions and be better calibrated.Recently, mixup training (Zhang et al., 2018)  has been shown to improve both models' generalization and calibration(Thulasidasan et al., 2019), by preventing the model from being over-confident in its predictions. Despite the success of improving uncertainty estimates over in-distribution data, Snoek et al. (2019) argue that it does not usually translate to a better performance on data that shift from the training distribution. Among all the methods evaluated by Snoek et al. (2019) under distributional shift, ensemble of deep neural networks(Lakshminarayanan et al., 2017), is shown to be most robust to dataset shift, producing the best uncertainty estimates. Cissé (2018) define adversarial robustness as the minimum distance in the input domain required to change the model's output prediction by constructing an adversarial attack. The most recent work that is close to ours,Carlini et al. (2019), makes the interesting observation that easily attackable data are often outliers in the underlying data distribution and then use adversarial robustness to determine an improved ordering for curriculum learning. Our work, instead, explores the relationship between adversarial robustness and calibration. In addition, we use adversarial robustness as an indicator to adaptively smooth the training labels to improve model's calibration.Label Smoothing Label smoothing is originally proposed inSzegedy et al. (2016)  and is shown to be effective in improving the quality of uncertainty estimates inMüller et al. (2019); Thulasidasan

