GETTING A CLUE: A METHOD FOR EXPLAINING UNCERTAINTY ESTIMATES

Abstract

Both uncertainty estimation and interpretability are important factors for trustworthy machine learning systems. However, there is little work at the intersection of these two areas. We address this gap by proposing a novel method for interpreting uncertainty estimates from differentiable probabilistic models, like Bayesian Neural Networks (BNNs). Our method, Counterfactual Latent Uncertainty Explanations (CLUE), indicates how to change an input, while keeping it on the data manifold, such that a BNN becomes more confident about the input's prediction. We validate CLUE through 1) a novel framework for evaluating counterfactual explanations of uncertainty, 2) a series of ablation experiments, and 3) a user study. Our experiments show that CLUE outperforms baselines and enables practitioners to better understand which input patterns are responsible for predictive uncertainty.

1. INTRODUCTION

There is growing interest in probabilistic machine learning models, which aim to provide reliable estimates of uncertainty about their predictions (MacKay, 1992) . These estimates are helpful in highstakes applications such as predicting loan defaults or recidivism, or in work towards autonomous vehicles. Well-calibrated uncertainty can be as important as making accurate predictions, leading to increased robustness of automated decision-making systems and helping prevent systems from behaving erratically for out-of-distribution (OOD) test points. In practice, predictive uncertainty conveys skepticism about a model's output. However, its utility need not stop there: we posit predictive uncertainty could be rendered more useful and actionable if it were expressed in terms of model inputs, answering the question: "Which input patterns lead my prediction to be uncertain?" Understanding which input features are responsible for predictive uncertainty can help practitioners learn in which regions the training data is sparse. For example, when training a loan default predictor, a data scientist (i.e., practitioner) can identify sub-groups (by age, gender, race, etc.) under-represented in the training data. Collecting more data from these groups, and thus further constraining their model's parameters, could lead to accurate predictions for a broader range of clients. In a clinical scenario, a doctor (i.e., domain expert) can use an automated decision-making system to assess whether a patient should receive a treatment. In the case of high uncertainty, the system would suggest that the doctor should not rely on its output. If uncertainty were explained in terms of which features the model finds anomalous, the doctor could appropriately direct their attention. While explaining predictions from deep models has become a burgeoning field (Montavon et al., 2018; Bhatt et al., 2020b) , there has been relatively little research on explaining what leads to neural networks' predictive uncertainty. In this work, we introduce Counterfactual Latent Uncertainty Explanations (CLUE), to our knowledge, the first approach to shed light on the subset of input space features that are responsible for uncertainty in probabilistic models. Specifically, we focus on explaining Bayesian Neural Networks (BNNs). We refer to the explanations given by our method as

annex

✦ LIME [Ribeiro et. al., 2016] ✦ Integrated Grad [Sundararajan et al., 2017] ✦ FIDO [Chang et. al., 2019] Figure 1 : Workflow for automated decision making with transparency. Our probabilistic classifier produces a distribution over outputs. In cases of high uncertainty, CLUE allows us to identify features which are responsible for class ambiguity in the input (denoted by ∆ and highlighted in dark blue). Otherwise, we resort to existing feature importance approaches to explain certain decisions.CLUEs. CLUEs try to answer the question: "What is the smallest change that could be made to an input, while keeping it in distribution, so that our model becomes certain in its decision for said input?" CLUEs can be generated for tabular and image data on both classification and regression tasks.An application of CLUE is to improve transparency in the real-world deployment of a probabilistic model, such as a BNN, by complementing existing approaches to model interpretability (Ribeiro et al., 2016; Sundararajan et al., 2017; Chang et al., 2019) . When the BNN is confident in its prediction, practitioners can generate an explanation via earlier feature importance techniques. When the BNN is uncertain, its prediction may well be wrong. This potentially wrong prediction could be the result of factors not related to the actual patterns present in the input data, e.g. parameter initialization, randomness in mini-batch construction, etc. An explanation of an uncertain prediction will be disproportionately affected by these factors. Indeed, recent work on feature attribution touches on the unreliability of saliency maps when test points are OOD (Adebayo et al., 2020) . Therefore, when the BNN is uncertain, it makes sense to provide an explanation of why the BNN is uncertain (i.e., CLUE) instead of an explanation of the BNN's prediction. This is illustrated in Figure 1 . Our code is at: github.com/cambridge-mlg/CLUE. We highlight the following contributions:• We introduce CLUE, an approach that finds counterfactual explanations of uncertainty in input space, by searching in the latent space of a deep generative model (DGM). We put forth an algorithm for generating CLUEs and show how CLUEs are best displayed. • We propose a computationally grounded approach for evaluating counterfactual explanations of uncertainty. It leverages a separate conditional DGM as a synthetic data generator, allowing us to quantify how well explanations reflect the true generative process of the data. • We evaluate CLUE quantitatively through comparison to baseline approaches under the above framework and through ablative analysis. We also perform a user study, showing that CLUEs allow practitioners to predict on which new inputs a BNN will be uncertain. (1)

2. PRELIMINARIES

For BNNs, both the posterior over parameters and predictive distribution (1) are intractable. Fortunately, there is a rich literature concerning approximations to these objects (MacKay, 1992; Hernández-Lobato & Adams, 2015; Gal, 2016) . In this work, we use scale-adapted Stochastic Gradient Hamiltonian Monte Carlo (SG-HMC) (Springenberg et al., 2016) . For regression, we use

