IMPROVING NEURAL NETWORK ACCURACY AND CALIBRATION UNDER DISTRIBUTIONAL SHIFT WITH PRIOR AUGMENTED DATA

Abstract

Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators. However, neural networks are often overconfident in their predictions, which leads to inaccurate and miscalibrated probabilistic predictions. The problem of overconfidence becomes especially apparent in cases where the test-time data distribution differs from that which was seen during training. We propose a solution to this problem by seeking out regions in arbitrary feature space where the model is unjustifiably overconfident, and conditionally raising the entropy of those predictions towards that of the Bayesian prior on the distribution of the labels. Our method results in a better calibrated network and is agnostic to the underlying model structure, so it can be applied to any neural network which produces a probability density as an output. We demonstrate the effectiveness of our method and validate its performance on both classification and regression problems by applying it to the training of recent state-of-the-art neural network models.

1. INTRODUCTION

While deep neural networks have achieved success on many diverse tasks due to their ability to learn highly expressive task-specific representations, they are known to be overconfident when presented with unseen inputs from unknown data distributions. Probabilistic models should be accurate in terms of both accuracy and calibration. Accuracy measures how often the model's predictions agree with the labels in the dataset. Calibration measures test the accuracy of the uncertainty around a probabilistic output. For example, an event predicted with 10% probability should be the empirical outcome 10% of the time. The probability around rare but important outlier events needs to be trustworthy for mission critical tasks such as autonomous driving. Bayesian neural networks (BNN) and ensembling methods are popular ways to achieve a predictive distribution for both classification and regression models. Since Gal & Ghahramani (2015) showed that Monte Carlo Dropout acts as a Bayesian approximation, there have been numerous advances in modeling predictive uncertainty with BNNs. As laid out by Kendall & Gal (2017) , models need to account for sources of both aleatoric and epistemic uncertainty. Epistemic uncertainty arises from uncertainty in knowledge or beliefs in a system. For parametric models such as BNNs, this presents as uncertainty in the parameters which are trained to encode knowledge about a data distribution. Aleatoric uncertainty arises from irreducible noise in the data. Correctly modeling both forms of uncertainty is essential in order to form accurate and calibrated predictions. Accuracy and calibration are negatively impacted when the data seen during deployment varies substantially from that seen during training. It has been shown that when test data has undergone a significant distributional shift from the training data, one can witness performance degradation across all models (Snoek et al., 2019) . A recurring result from Snoek et al. ( 2019) is that Deep Ensembles (Lakshminarayanan et al., 2017) show superior performance on shifted test data. Previous work has also shown that BNNs fail to accurately model epistemic uncertainty, as regions with sparse amounts of training data often lead to confident predictions even when evidence to justify such confidence is lacking (Sun et al., 2019) . Bayesian non-parametric models such as Gaussian

