LEARNABLE UNCERTAINTY UNDER LAPLACE AP-PROXIMATIONS

Abstract

Laplace approximations are classic, computationally lightweight means for constructing Bayesian neural networks (BNNs). As in other approximate BNNs, one cannot necessarily expect the induced predictive uncertainty to be calibrated. Here we develop a formalism to explicitly "train" the uncertainty in a decoupled way to the prediction itself. To this end we introduce uncertainty units for Laplaceapproximated networks: Hidden units with zero weights that can be added to any pre-trained, point-estimated network. Since these units are inactive, they do not affect the predictions. But their presence changes the geometry (in particular the Hessian) of the loss landscape around the point estimate, thereby affecting the network's uncertainty estimates under a Laplace approximation. We show that such units can be trained via an uncertainty-aware objective, making the Laplace approximation competitive with more expensive alternative uncertaintyquantification frameworks.

1. INTRODUCTION

The point estimates of neural networks (NNs)-constructed as maximum a posteriori (MAP) estimates via (regularized) empirical risk minimization-empirically achieve high predictive performance. However, they tend to underestimate the uncertainty of their predictions, leading to an overconfidence problem (Hein et al., 2019) , which could be disastrous in safety-critical applications such as autonomous driving. Bayesian inference offers a principled path to overcome this issue. The goal is to turn a "vanilla" NN into a Bayesian neural network (BNN), where the posterior distribution over the network's weights are inferred via Bayes' rule and subsequently taken into account when making predictions. Since the cost of exact posterior inference in a BNN is often prohibitive, approximate Bayesian methods are employed instead. Laplace approximations (LAs) are classic methods for such a purpose (MacKay, 1992b) . The key idea is to obtain an approximate posterior by "surrounding" a MAP estimate of a network with a Gaussian, based on the loss landscape's geometry around it. A standard practice in LAs is to tune a single hyperparameter-the prior precision-which is inflexible (Ritter et al., 2018b; Kristiadi et al., 2020) . Here, we aim at improving the flexibility of uncertainty tuning in LAs. To this end, we introduce Learnable Uncertainty under Laplace Approximations (LULA) units, which are hidden units associated with a zeroed weight. They can be added to the hidden layers of any MAP-trained network. Because they are inactive, such units do not affect the prediction of the underlying network. However, they can still contribute to the Hessian of the loss with respect to the parameters, and hence induce additional structures to the posterior covariance under a LA. LULA units can be trained via an uncertainty-aware objective (Hendrycks et al., 2019; Hein et al., 2019, etc.) , such that they improve the predictive uncertainty-quantification (UQ) performance of the Laplace-approximated BNN. Figure 1 demonstrates trained LULA units in action: They improve the UQ performance of a standard LA, while keeping the MAP predictions in both regression and classification tasks. In summary, we (i) introduce LULA units: inactive hidden units for uncertainty tuning of a LA, (ii) bring a robust training technique from non-Bayesian literature for training these units, and (iii) show empirically that LULA-augmented Laplace-approximated BNNs can yield better UQ performance compared to both previous tuning techniques and contemporary, more expensive baselines.

