EXPRESSIVE YET TRACTABLE BAYESIAN DEEP LEARNING VIA SUBNETWORK INFERENCE

Abstract

The Bayesian paradigm has the potential to solve some of the core issues in modern deep learning, such as poor calibration, data inefficiency, and catastrophic forgetting. However, scaling Bayesian inference to the high-dimensional parameter spaces of deep neural networks requires restrictive approximations. In this paper, we propose performing inference over only a small subset of the model parameters while keeping all others as point estimates. This enables us to use expressive posterior approximations that would otherwise be intractable for the full model. In particular, we develop a practical and scalable Bayesian deep learning method that first trains a point estimate, and then infers a full covariance Gaussian posterior approximation over a subnetwork. We propose a subnetwork selection procedure which aims to maximally preserve posterior uncertainty. We empirically demonstrate the effectiveness of our approach compared to point-estimated networks and methods that use less expressive posterior approximations over the full network.

1. INTRODUCTION

Deep neural networks (DNNs) still suffer from critical shortcomings that make them unfit for important applications. For instance, DNNs tend to be poorly calibrated and overconfident in their predictions, especially when there is a shift in the train and test distributions (Nguyen et al., 2015; Guo et al., 2017) . To reliably inform decision making, DNNs must be able to robustly quantify the uncertainty in their predictions, which is particularly important in safety-critical areas such as healthcare or autonomous driving (Amodei et al., 2016; Filos et al., 2019a; Fridman et al., 2019) . Bayesian modeling (Ghahramani, 2015; Gal, 2016) presents a principled way to capture predictive uncertainty via the posterior distribution over model parameters. Unfortunately, due to their nonlinearities, exact posterior inference is intractable in DNNs. Despite recent successes in the field of Bayesian deep learning (Blundell et al., 2015; Gal & Ghahramani, 2016; Osawa et al., 2019; Maddox et al., 2019; Dusenberry et al., 2020) , existing methods are only made scalable to modern DNNs with large numbers of parameters by invoking unrealistic assumptions. This severely limits the expressiveness of the inferred posterior and thus deteriorates the quality of the induced uncertainty estimates (Ovadia et al., 2019; Fort et al., 2019; Foong et al., 2019a; Ashukha et al., 2020a) . Due to the heavy overparameterization of DNNs, their accuracy is well-preserved by a small subnetwork (Cheng et al., 2017) . Additionally, recent work by Izmailov et al. ( 2019) has shown how performing inference over a low dimensional subspace of the weights can result in accurate uncertainty quantification. These observations prompt the following question for a DNN's uncertainty: Can a full DNN's model uncertainty be well-preserved by a small subnetwork's model uncertainty? We answer this question in the affirmative. We show both theoretically and empirically that the full network posterior can be well represented by a subnetwork's posterior. As a result, we can use more expensive but faithful posterior approximations over just that subnetwork. We show that this achieves better uncertainty quantification than if we use cheaper, but more crude, posterior approximations over the full network. The contributions of this paper are as follows: 1. We propose a new Bayesian deep learning approach that performs Bayesian inference over only a small subset of the model weights and keeps all other weights deterministic. This allows us to use expressive posterior approximations that are typically intractable in DNNs. 1

