EXPLANATION UNCERTAINTY WITH DECISION BOUNDARY AWARENESS Anonymous authors Paper under double-blind review

Abstract

Post-hoc explanation methods have become increasingly depended upon for understanding black-box classifiers in high-stakes applications, precipitating a need for reliable explanations. While numerous explanation methods have been proposed, recent works have shown that many existing methods can be inconsistent or unstable. In addition, high-performing classifiers are often highly nonlinear and can exhibit complex behavior around the decision boundary, leading to brittle or misleading local explanations. Therefore, there is an impending need to quantify the uncertainty of such explanation methods in order to understand when explanations are trustworthy. We introduce a novel uncertainty quantification method parameterized by a Gaussian Process model, which combines the uncertainty approximation of existing methods with a novel geodesic-based similarity which captures the complexity of the target black-box decision boundary. The proposed framework is highly flexible; it can be used with any black-box classifier and feature attribution method to amortize uncertainty estimates for explanations. We show theoretically that our proposed geodesic-based kernel similarity increases with the complexity of the decision boundary. Empirical results on multiple tabular and image datasets show that our decision boundary-aware uncertainty estimate improves understanding of explanations as compared to existing methods.

1. INTRODUCTION

Machine learning models are becoming increasingly prevalent in a wide variety of industries and applications. In many such applications, the best performing model is opaque; post-hoc explainability methods are one of the crucial tools by which we understand and diagnose the model's predictions. Recently, many explainability methods, termed explainers, have been introduced in the category of local feature attribution methods. That is, methods that return a real-valued score for each feature of a given data sample, representing the feature's relative importance with respect to the sample prediction. These explanations are local in that each data sample may have a different explanation. Using local feature attribution methods therefore helps users better understand nonlinear and complex black-box models, since these models are not limited to using the same decision rules throughout the data distribution. Recent works have shown that existing explainers can be inconsistent or unstable. For example, given similar samples, explainers might provide different explanations (Alvarez-Melis & Jaakkola, 2018; Slack et al., 2020) . When working in high-stakes applications, it is imperative to provide the user with an understanding of whether an explanation is reliable, potentially problematic, or even misleading. A way to guide users regarding an explainer's reliability is to provide corresponding uncertainty quantification estimates. One can consider explainers as function approximators; as such, standard techniques for quantifying the uncertainty of estimators can be utilized to quantify the uncertainty of explainers. This is the strategy utilized by existing methods for producing uncertainty estimates of explainers (Slack et al., 2021; Schwab & Karlen, 2019) . However, we observe that for explainers, this is not sufficient; because in addition to uncertainty due to the function approximation of explainers, explainers also have to deal with the uncertainty due to the complexity of the decision boundary (DB) of the blackbox model in the local region being explained. Here, two similar patients with similar predictions are given opposing feature importance scores which could result in misguided recommendations. We define a similarity based on the geometry of the decision boundary between any two given samples (red line). While the two patients are close together in the Euclidean sense, they are dissimilar under the proposed WEG kernel similarity. Using GPEC would return a high uncertainty measure for the explanations, which would flag the results for further investigation. Consider the following example: we are using a prediction model for a medical diagnosis using two features, level of physical activity and body mass index (BMI) (Fig. 1 ). In order to understand the prediction and give actionable recommendations to the patient, we use a feature attribution method to evaluate the relative importance of each feature. Because of the nonlinearity of the prediction model, patients A and B show very similar symptoms, but are given very different recommendations. Note that while this issue is related to the notion of explainer uncertainty, measures of uncertainty that only consider the explainer would not capture this phenomenon. This suggests that any notion of uncertainty is incomplete without capturing information related to the local behavior of the model. Therefore, the ability to quantify uncertainty for DB-related explanation instability is desirable. We approach this problem from the perspective of similarity: given two samples and their respective explanations, how closely related should the explanations be? From the previous intuition, we define this similarity based on a geometric perspective of the DB complexity between these two points. Specifically, we propose a novel geodesic-based kernel similarity metric, which we call the Weighted Exponential Geodesic (WEG) kernel. The WEG kernel encodes our expectation that two samples close in Euclidean space may not actually be similar if the DB within a local neighborhood of the samples is highly complex. Using our similarity formulation, we propose the Gaussian Process Explanation UnCertainty (GPEC) framework, which is an instance-wise, model-agnostic, and explainer-agnostic method to quantify the uncertainty of explanations. The proposed notion of uncertainty is complementary to existing quantification methods. Existing methods primarily estimate the uncertainty related to the choice in model parameters and fitting the explainer, which we call function approximation uncertainty, and does not capture uncertainty related to the DB. GPEC can combine the DB-based uncertainty with function approximation uncertainty derived from any local feature attribution method. In summary, we make the following contributions: • We introduce a geometric perspective on capturing explanation uncertainty and define a novel geodesic-based similarity between explanations. We prove theoretically that the proposed similarity captures the complexity of the decision boundary from a given black-box classifier. • We propose a novel Gaussian Process-based framework that combines A) uncertainty from decision boundary complexity and B) explainer-specific uncertainty to generate uncertainty estimates for any given feature attribution method and black box model. • Empirical results show GPEC uncertainty improves understanding of feature attribution methods.

2. RELATED WORKS

Explanation Methods. A wide variety of methods have been proposed for the purpose of improving transparency for pre-trained black-box prediction models (Guidotti et al., 2018; Barredo Arrieta et al., 2020) . Within this category of post-hoc methods, many methods focus on local explanations, that is, explaining individual predictions rather than the entire model. Some of these methods generate explanations through local feature selection (Chen et al., 2018; Masoomi et al., 2020) . In this 

GPEC Explanation Uncertainty

Figure 2 : Schematic for the GPEC uncertainty estimation process. GPEC can be used in conjunction with a black-box classifier and explainer to derive an estimate of explanation uncertainty. GPEC takes samples from classifier's decision boundary plus (possibly noisy) explanations from the explainer and fits a Gaussian Progress Regression model with Weighted Exponential Geodesic Kernel. The variance of the predictive distribution combines uncertainty from the black-box classifier complexity and the approximation uncertainty from explainer. work, we focus primarily on feature attribution methods, which return a real-valued score for each feature in the sample. For example, LIME (Ribeiro et al., 2016) trains a local linear regression model to approximate the black-box model. Lundberg & Lee (2017) generalizes LIME and five other feature attribution methods using the SHAP framework, which fulfill a number of desirable axioms. While some methods such as LIME and its SHAP variant, KernelSHAP, are model-agnostic, others are designed for specific model architectures, such as neural networks (Bach et al., 2015; Shrikumar et al., 2017; Sundararajan et al., 2017; Erion et al., 2021) , tree ensembles (Lundberg et al., 2020) , or Bayesian neural networks (Bykov et al., 2020) , taking advantage of those specific architectures. Another class of method involves the training of separate surrogate models to explain the black-box model (Dabkowski & Gal, 2017; Chen et al., 2018; Schwab & Karlen, 2019; Guo et al., 2018; Jethani et al., 2022) . Once trained, surrogate-based methods are typically very fast; explanation generation generally only requires a single inference step from the surrogate model. Explanation Uncertainty. One option for improving the trustworthiness of explainers is to quantify the associated explanation uncertainty. Bootstrap resampling techniques have been proposed as a way to estimate uncertainty from surrogate-based explainers (Schwab & Karlen, 2019; Schulz et al., 2022) . Guo et al. (2018) also proposes a surrogate explainer parameterized with a Bayesian mixture model. Alternatively, Bykov et al. (2020) and Patro et al. (2019) introduce methods for explaining Bayesian neural networks, which can be transferred to non-Bayesian neural networks. Covert & Lee (2021) derives an unbiased version of KernelSHAP and investigates an efficient way of estimating its uncertainty. Zhang et al. (2019) categorizes different sources of variance in LIME estimates. Several methods also investigate LIME and KernelSHAP in a Bayesian context; for example calculating a posterior over attributions (Slack et al., 2021) , investigating the use of priors for explanations (Zhao et al., 2021) , or using active learning to improve sampling (Saini & Prasad, 2022) . However, all existing methods for quantifying explanation uncertainty only consider the uncertainty of the explainer as a function approximator. This work introduces an additional notion of uncertainty for explainers that takes into account the uncertainty of the explainer due to the DB.

3. UNCERTAINTY QUANTIFICATION FOR BLACK-BOX EXPLAINERS

We now outline the GPEC framework, which we parameterize with a Gaussian Process (GP) regression model. We define a vector-valued GP which is trained on data samples as input and explanations as labels. More concretely, consider a data sample x ∈ X ⊂ R d that we want to explain in the context of a black-box prediction model F : X → [0, 1], in which the output corresponds to the probability for the positive class. For simplicity, we consider the binary classifier case; we extend to the multiclass case in Section C.3. We apply any given feature attribution method E : X → R s , where s is the dimension of the explanations e = E(x). Let us draw samples x 1 , . . . , x M from the data distribution and generate their respective explanations e 1 , . . . , e M . We assume that each explanation e m is generated from an unobserved latent function E plus some independent Gaussian noise η m . e m = E(x m ) + η m s.t. E(x m ) ∼ GP(0, k(x, x )) Decision Boundary-Aware Uncertainty η m ∼ N (0, τ -1 m ) Function Approximation Uncertainty (1) where k(•, •) is the specified kernel function and τ m > 0. We decompose each explanation into two components, E(x m ) and η m , which represent two separate sources of uncertainty: 1) a decision boundary-aware uncertainty which we capture using the kernel similarity, and 2) a function approximation uncertainty from the explainer. After specifying E(x m ) and η m , we can combine the two sources by calculating the posterior predictive distribution for the GP model; we take the 95% confidence interval of this distribution to be the GPEC uncertainty estimate. Due to space constraints, we provide an overview of GP regression in App. B.2. Function Approximation Uncertainty. The η m component of Eq. 1 represents the uncertainty stemming from explainer specification. For example, this uncertainty can be captured from increased variance from undersampling (perturbation-based explainers) or from the increased bootstrap resampling variance from model misspecification (surrogate-based explainers). We can therefore define τ based on the chosen explainer. Explainers that include some estimate of uncertainty (e.g. BayesLIME, BayesSHAP, CXPlain) can be directly used to estimate τ . For example, CXPlain returns a distribution of feature importance values for a sample x m ; the variance of this distribution can be directly used as the variance of η m . For other stochastic explanation methods that do not explicitly estimate uncertainty, we can estimate τ empirically by resampling explanations for the same data sample: τm = [ 1 |K| K i=1 (E i (x m ) -Ē(x m )) 2 ] -1 s.t. Ē(x m ) = 1 |K| K i=1 E i (x m ) where each E i (x m ) is a sampled explanation for x m . Alternatively, for deterministic explanation methods we can omit the noise and assume that the generated explanations are noiseless. Decision Boundary-Aware Uncertainty. In contrast, the E(x m ) component of Eq. 1 draws possible functions from the GP prior that could have generated the observed explanations, with the function space defined by the choice of kernel. Intuitively, the kernel encodes our a priori definition of how similar two explanations should be given the similarity of the inputs. In other words, how much information do we expect a given point x to provide for a nearby point x with respect to their explanations? We use this kernel specification to define a source of uncertainty dependent on the behavior of nearby explanations. In particular, we consider a novel kernel formulation that reflects the complexity of the DB in a local neighborhood of the samples; this is detailed in Section 4. For any given kernel, we can interpret the distribution of possible functions E(x m ) as an estimate of uncertainty based on the set of previously observed or sampled explanations.

4. WEIGHTED EXPONENTIAL GEODESIC KERNEL

Intuitively, the GP kernel encodes the assumption that each individual explanation gives some information about the explanations around it; the choice of kernel defines the neighborhood and magnitude of this shared information. In the GP framework introduced in Section 3, the kernels define the relationship or similarity between two explanations E(x) and E(x ) based soley on some function of their inputs x and x . For example, stationary kernels assign the same similarity for any two inputs regardless of where in the data manifold they are located. Instead, we want to encode the assumption that when the DB is complex, knowing an explanation E(x) gives limited information about other nearby explanations (see Fig. 3 ).

4.1. GEOMETRY AND GEODESICS OF THE DECISION BOUNDARY

We want to relate kernel similarity to the behavior of the black-box model; specifically, the complexity or smoothness of the DB. Given any two points on the DB, the relative complexity of the boundary segment between them can be approximated by the segment length. The simplest form that the DB can take is a linear boundary connecting the two points; this is exactly the minimum distance between the points. As the complexity of the DB grows, there is a general corresponding increase in segment length.  = {x ∈ R d : F (x) = 1 2 } representing the DBfoot_0 of F . Given this interpretation, we can define distances along the DB as geodesic distances in M. d geo (m, m ) = min γ 1 0 ||γ(t)||dt ∀m, m ∈ M The mapping γ : [0, 1] → M, defined such that γ(0) = m and γ(1) = m , is a parametric representation of a 1-dimensional curve on M. We can adapt geodesic distance in our kernel selection through the exponential geodesic (EG) kernel (Feragen et al., 2015) , which is a generalization of the Radial Basis Function (RBF) kernel substituting 2 distance with geodesic distance: k EG (x, x ) = exp(-λd geo (x, x )) k RBF (x, x ) = exp(-λ||x -x || 2 2 ) The EG kernel has been previously investigated in the context of Riemannian manifolds (Feragen et al., 2015; Feragen & Hauberg, 2016) . In particular, while prior work shows that the EG kernel fails to be positive definite for all values of λ in non-Euclidean space, there exists large intervals of λ > 0 for the EG kernel to be positive definite. Appropriate values can be selected through grid search and cross validation; we assume that a valid value of λ has been selected. Note that the manifold we are interested in is the DB; applying the EG kernel on the data manifold (i.e. directly using k EG in the GPEC formulation) would not capture model complexity. In addition, naïvely using the EG kernel on the DB manifold would not capture the local complexity with respect to a given explanation; the similarity would be invariant to observed explanations. We therefore need to relate the geodesic distances on M to samples in the data space X .

4.2. WEIGHTING THE DECISION BOUNDARY

Consider a probability distribution p(M ) with its support defined over M. We weight p(M ) according to the 2 distance between M and a fixed data sample x: q(M |x, ρ) ∝ exp[-ρ||x -M || 2 2 ]p(M ) We evaluate the kernel function k(x, x ) by taking the expected value over the weighted distribution. k W EG (x, x ) = exp[-λd geo (m, m )] q(m|x, ρ) q(m |x , ρ) dmdm (6) Our formulation is an example of a marginalized kernel (Tsuda et al., 2002) : a kernel defined on two observed samples x, x and taking the expected value over some hidden variable. Given that the underlying EG kernel is positive definite, it follows that the WEG kernel forms a valid kernel. To evaluate the WEG kernel, we theoretically investigate two properties of the kernel. Theorem 1 shows that the WEG kernel is an extension of the EG kernel for data samples not directly on the DB. Theorem 1. Given two points x, x ∈ M, then lim λ→∞ k W EG (x, x ) = k EG (x, x ) Proof details are in App. C.1. Intuitively, as we increase λ, the manifold distribution closest to the points x, x becomes weighted increasingly heavily. At the limit, the weighting becomes concentrated entirely on the points x, x themselves, which recovers the EG kernel. Therefore we see that the WEG kernel is simply a weighting of the EG kernel, which is controlled by λ. Theorem 2 establishes the inverse relationship between DB complexity and WEG kernel similarity. Given a black-box model with a piecewise linear DB, we show that this DB represents a local maximum with respect to WEG kernel similarity; i.e. as we perturb the DB to be nonlinear, kernel similarity decreases. We first define perturbations on DB. Note that int(S) indicates the interior of a set S, f| S represents the function f restricted to S, and id indicates the identity mapping. Definition 1. Let {U α } α∈I be charts of an atlas for a manifold P ⊂ R d , where I is a set of indices. Let P and P be differentiable manifolds embedded in R d , where P is a Piecewise Linear (PL) manifold. Let R : P → P be a diffeomorphism. We say P is a perturbation of P on the i th chart if R satisfies the following two conditions. 1) There exists a compact subset K i ⊂ U i s.t. R| P\int(Ki) = id| P\int(Ki) and R| int(Ki) = id| int(Ki) . 2) There exists a linear homeomorphism between an open subset U i ⊆ U i with R d-1 which contains K i . Theorem 2. Let P be a (d-1)-dimension PL manifold embedded in R d . Let P be a perturbation of P and define k(x, x ) and k(x, x ) as the WEG kernel defined on P and P respectively. Then k(x, x ) < k(x, x ) ∀x, x ∈ R d . Proof details are in App. C.2. Theorem 2 implies that, for any two fixed points x, x , their kernel similarity k W EG (x, x ) decreases as the black-box DB complexity increases. Within GPEC, the explanations for x, x become less informative for other nearby explanations and induce a higher explanation uncertainty estimate. To improve the interpretation of the WEG kernel, we apply a normalization to scale the similarity values to be between [0, 1]. We construct the normalized kernel k * W EG as follows: k * W EG (x, x ) = k W EG (x, x ) k W EG (x, x)k W EG (x , x )

4.3. WEG KERNEL APPROXIMATION

In practice, the integral in Eq. 6 is intractable; we approximate the expected value using Monte Carlo (MC) sampling with K samples: k W EG (x, x ) ≈ 1 Z m Z m K 2 K i=1 K j=1 exp[-λd geo (m i , m j )] exp[-ρ(||x-m i || 2 2 +||x -m j || 2 2 )] (8) Z m , Z m are the normalization constants for q(M |x, ρ) and q(M |y, ρ), respectively. We can similarly estimate these values using MC sampling: 2019)). Once GPEC is trained, training cost is amortized during inference; estimating uncertainty for test samples using a GP generally has time complexity of O(n 3 ), which can be reduced to O(n 2 ) using BBMM (Gardner et al., 2018) , and further with variational methods (e.g., Hensman et al. (2015) ). Z m = exp[-ρ||x -m|| 2 2 ]p(m) dm ≈ 1 K K i=1 exp[-ρ||x -m i || 2 2 ]

5. EXPERIMENTS

We evaluate how well GPEC captures 1) DB-aware uncertainty and 2) functional approximation uncertainty on a variety of datasets and classifiers. In section 5.2 we compare the DB-aware uncertainty and approximation uncertainty components of GPEC. Section 5.3 evaluates how GPEC captures DB complexity. Section 5.4 evaluates how well GPEC combines sources of uncertainty. Due to space constraints we have additional results in the appendix, including sensitivity analysis for GPEC parameters (F.5), execution time comparison F.1, and additional experiments on combining uncertainty (F.3). All experiments were run on an internal cluster using AMD EPYC 7302 16-Core processors, and all source code will be made public.

5.1. EXPERIMENTAL SETUP

Datasets and Models. Experiments are performed on three tabular datasets (Census, Online Shoppers (Sakar et al., 2019) , German Credit) from the UCI data repository (Dua & Graff, 2017) and two image datasets (MNIST (LeCun & Cortes, 2010) and fashion-MNIST (f-MNIST) (Xiao et al., 2017) ). GPEC can be used with any black-box model; for our experiments we use XGBoost (Chen & Guestrin, 2016) with log-loss for the tabular datasets and a 4-layer Multi-Layered Perception (MLP) model for the image datasets. Additional dataset details are outlined in App. E.1. Implementation Details. For comparison purposes, we consider GPEC using two different kernels: GPEC-WEG (WEG kernel) and GPEC-RBF (RBF Kernel). Unless otherwise stated, we choose λ = 1.0 and ρ = 0.1 (see App. F.5 for experiments on parameter sensitivity). For the Uncertainty Visualization (Sec. 5.2) and Regularization Test (Sec. 5.3) we use GPEC with the KernelSHAP explainer. We train the GP parameterizing GPEC with BBMM (Gardner et al., 2018) . Samples from the DB are drawn using DBPS (Yan & Xu, 2008) for tabular datasets and DeepDIG (Karimi et al., 2019) for image datasets. Geodesic distances are estimated using ISOMAP Tenenbaum et al. (2000) . Additional implementation details are available in App. D. Competing Methods. We compare GPEC to three competing explanation uncertainty estimation methods. BayesSHAP and BayesLIME (Slack et al., 2021) are extensions of KernelSHAP and LIME, respectively, that fit Bayesian linear regression models to perturbed data samples. After fitting, 95% credible intervals can be estimated by sampling the posterior distribution of the feature attributions. CXPlain (Schwab & Karlen, 2019) trains a separate explanation model using a causalbased objective, and applies a bootstrap resampling approach to estimate explanation uncertainty.

German Credit Census

We report the 95% confidence interval from the set of bootstrapped explanations. Unless otherwise stated, we use the default settings in the provided implementation for all three of these methods.

5.2. UNCERTAINTY VISUALIZATION

In order to visualize explanation uncertainty, we train XGBoost binary classifiers using two selected features, which we take to be the black-box model. Both GPEC and competing methods are used to quantify the uncertainty for the feature attributions of the test samples: a grid of 10,000 samples over the data domain. Uncertainty estimates for the x-axis feature are plotted in Figure 4 (results for the y-axis variable are included in App. F.4), where darker values in the heatmap indicate higher uncertainty. GPEC-WEG, GPEC-RBF, and CXPlain require training samples for their amortized uncertainty estimates; these samples, also used as the reference distribution for BayesSHAP and BayesLIME, are plotted in red. The DB is plotted as the black line. We ablate the function approximation uncertainty component of GPEC-WEG and GPEC-RBF in order to evaluate the effects of their respective kernels. Comparing GPEC-WEG and GPEC-RBF shows that using the WEG kernel attributes higher uncertainty to samples near nonlinearities in the DB. In contrast, GPEC-RBF provides uncertainty estimates that relate only to the training samples; test sample uncertainty is proportional to distance from the training samples. The competing methods BayesSHAP, BayesLIME, and CXPlain results in relatively uniform uncertainty estimates over the test samples. CXPlain shows areas of higher uncertainty for Census, however the magnitude of these estimates are small. The uncertainty estimates produced by these competing methods are unable to capture the properties of the black-box model.

5.3. REGULARIZATION TEST

In this experiment we compare the average uncertainty of explanations as the model is increasingly regularized in order to assess the impact of restricting model complexity. For XGBoost models, we vary the parameter γ, which penalizes the number of leaves in the regression tree functions (Eq. 2 in Chen & Guestrin (2016) ). For neural networks, we use two types of regularization. First, we add an 2 penalty to the weights; this penalty increases as the parameter λ increases. Second, we change the 0.1e -5 5.0e -5 8.6 e -5 5.3e -5 8.0e -5 5.4e -5 7.2e -5 4.8e -5 6.2 e -5 9.0e -5 6.6e -5 9.6e -5 Table 1: Average explanation uncertainty estimates over all features for a given classifier with varying levels of regularization. Higher regularization (increasing left to right) generally results in smoother and simpler classifiers. ReLU activation functions to Softplus; a smooth approximation of ReLU with smoothness inversely proportional to a parameter β, which we vary (Dombrowski et al., 2019) . We observe in Table 1 that the average uncertainty estimate generated by GPEC-WEG decreases as regularization increases, suggesting that the uncertainty estimates reflect the overall complexity of the underlying blackbox model. For the tabular datasets, the estimates for BayesSHAP, BayesLIME, and CXPlain stay relatively flat. Interestingly, the estimates from these methods decrease for the image datasets. We hypothesize that the regularization for the neural network model also increases overall stability of the explanations. GPEC can capture both the uncertainty from WEG kernel and also the estimated uncertainty from the function approximation for the explainer, which we demonstrate in Section 5.4.

5.4. COMBINING APPROXIMATION UNCERTAINTY

Using the WEG kernel with noisy explanation labels enables GPEC to incorporate uncertainty estimates from the explainer. Explainers such as BayesSHAP provide an estimate of uncertainty for their explanations, which can be used as fixed noise in the Gaussian likelihood for GPEC. For explainers with no such estimate, we can empirically resample explanations for the same test sample and take the explanation variance as an estimate of uncertainty. Here, we compare GPEC uncertainty results before and after applying explanation noise estimates. We combine GPEC with two different explainers: BayesSHAP and Shapley Sampling Values (SSV) (Strumbelj & Kononenko, 2013) . The two selected explainers are different methods of approximating SHAP values, however the former has an inherent method for quantifying its uncertainty while the latter does not. In Figure 5 row (A) we plot the heatmap for uncertainty estimates using GPEC using only the WEG Kernel and no added noise from the explainers. In row (B) we add the additional uncertainty estimates from BayesSHAP and SSV. The difference in quantified uncertainty is visualized in the row (C). We observe that combining the uncertainty estimates from BayesSHAP and SSV in the GPEC Gaussian Process formulation increases the overall uncertainty estimate.

6. LIMITATIONS AND CONCLUSION

Generating uncertainty estimates for feature attribution explanations are essential for building reliable explanations. We introduce a novel GP-based approach that can be used with any black-box classifier and feature attribution method. GPEC generates uncertainty estimates for explanations that capture the complexity of the black-box model. Experiments show that capturing this uncertainty improves understanding of the explanations and the black-box model itself. Regarding limitations, GPEC relies on DB estimation methods which is an ongoing area of research. Due to the time complexity of DB estimation, this can result in a tradeoff between computation time and approximation accuracy or sample bias. However, the effects DB sampling time are minimized during inference as the DB only needs to be sampled during training. Additionally, in its current implementation GPEC is limited to black-box classifiers; we leave the extension to regression as future work.

A SOCIETAL IMPACTS

As machine learning models are increasingly relied upon in a diverse set of high-impact domains ranging from health-care to financial lending (Esteva et al., 2019; Kose et al., 2021; Doshi-Velez & Kim, 2017; Sheikh et al., 2020; Singh et al., 2021) , it is crucial that users of these models can accurately interpret why predictions are made. An understanding of why a model is making a certain prediction is important for users to trust it -for instance a doctor may wish to know if a skincancer classifier's high test-set accuracy comes from the leveraging of truly diagnostic features, or a specific imaging device artifact. However, further spurred by the advent of deep learning's increasing popularity (Krizhevsky et al., 2017) , many of the models deployed in these high-stakes fields are complex black box's; producing predictions which are non-trivial to explain the reasoning behind. The development of many methods for explaining black-box predictions has arisen from this situation (Ribeiro et al., 2016; Lundberg & Lee, 2017; Covert et al., 2020; Masoomi et al., 2021) , but explanations may have varying quality and consistency. Before utilizing explanations in practice, it is essential that users know when, and when not, to trust them. Explanation uncertainty is one proxy for this notion of trust, in which more uncertain explanations may be deemed less trustworthy. In this work, we explore a new way to model explanation uncertainty, in terms of local decision-boundary complexity. In tandem with the careful consideration of domain experts, our methodology may be used to assist in determining when explanations are reliable. Our theoretical results provide new insights towards what explanation uncertainty entails, and open the door for future methods expounding upon our formulation.

B BACKGROUND B.1 RELATED WORKS: RELIABILITY OF EXPLANATIONS

While feature attribution methods have gained wide popularity, a number of issues relating to the reliability of such methods have been uncovered. Alvarez-Melis & Jaakkola (2018) investigate the notion of robustness and show that many feature attribution methods are sensitive to small changes in input. This has been further investigated in the adversarial setting for perturbation-based methods (Slack et al., 2020) and neural network-based methods. (Ghorbani et al., 2019) . Kindermans et al. (2019) show that many feature attribution methods are affected by distribution transformations such as those common in preprocessing. The generated explanations can also be very sensitive to hyperparameter choice Bansal et al. (2020) . A number of metrics have been proposed for evaluating explainer reliability, such as with respect to adversarial attack (Hsieh et al., 2021) , local perturbations (Alvarez-Melis & Jaakkola, 2018; Visani et al., 2022) , black-box smoothness (Khan et al., 2022) , fidelity to the black-box model (Yeh et al., 2019) , or combinations of these metrics (Bhatt et al., 2020) .

B.2 GAUSSIAN PROCESS REVIEW

A single-output Gaussian Process represents a distribution over functions f : X → R f (x) ∼ GP(m(x), k(x, x )). ( ) Here m : X → R and k : (X , X ) → R are the mean and kernel (or covariance) functions respectively, which are chosen a priori to encode the users assumptions about the data. The kernel function k(x, x ) reflects a notion of similarity between data points for which predictive distributions over f (x), f (x ) respect. The prior m(x) -frequently considered to be less important -is commonly chosen to be the constant m(x) = 0. Specifically, a GP is an infinite collection of R.V's f (x), each indexed by an element x ∈ X . Importantly, any finite sub-collection of these R.V's f (X tr ) = (f (x 1 ) . . . , f (x n )) ∈ R d , corresponding to some index set X tr = {x i } n i=1 ⊂ X , follows the multivariate normal (MVN) distribution, i.e. f (X tr ) ∼ N (m(X tr ), K(X tr , X tr )). (12) Here the mean vector m(X tr ) = (m(x 1 ), . . . , m(x n )) ∈ R n represents the mean function applied on each x ∈ X tr and the covariance matrix K ∈ R n×n , also known as the gram matrix, contains each pairwise kernel-based similarity value K ij = k(x i , x j ). Kernel function outputs correspond to dot products in potentially infinite dimensional expanded feature space, which allows for the encoding of nuanced notions of similarity; e.g. the exponential geodesic kernel referenced in this work (Feragen et al., 2015) . Making predictions with a GP is analogous to simply conditioning this normal distribution on our data. Considering a set of input,noise-free label pairs D = {(x i , f (x i )} n i=1 (13) we may update our posterior over any subset of the R.V's f (x) by considering the joint normal over the subset and D and conditioning on D. For instance, when choosing a singleton index set {x 0 }, the posterior over f (x 0 )|D is another normal distribution which may be written asfoot_2  f (x 0 ) ∼ N ( f (x 0 ), V[f (x 0 )]) where f (x 0 ) = K(x 0 , X tr ) T K(X tr , X tr ) -1 f (X tr ) (15) V[f (x 0 )] = k(x 0 , x 0 ) -K(x 0 , X tr ) T K(X tr , X tr ) -1 K(x 0 , X tr ) (16) and K(x 0 , X tr ) ∈ R d is defined element-wise by K(x 0 , X tr ) i = k(x 0 , x i ). Now we may consider the situation where our labels are noisy: D = {(x i , y i )} n i=1 , y i = f (x i ) + , ∼ N (0, σ 2 ), σ 2 ∈ R + . Here y i is equal to the quantity we wish to model, f (x i ), with the addition of noise variable . The conditional is still a MVN, but the mean and variance equations are slightly modified f (x 0 ) = K(x 0 , X tr ) T (K(X tr , X tr ) + σ 2 I) -1 Y (18) V[f (x 0 )] = k(x 0 , x 0 ) -K(x 0 , X tr ) T (K(X tr , X tr ) + σ 2 I) -1 K(x 0 , X tr ), where Y ∈ R n has elements Y i = y i . Notice how the variance σ 2 I is added to K(X tr , X tr ) in the quadratic form in Eq. 19, resulting in smaller eigenvalues after matrix inversion. Since this quadratic form is subtracted, the decision to model labels as noisy increases the uncertainty (variance) of estimates the GP posterior provides. This agrees with the intuition that noisy labels should result in more uncertain predictions. While GPs may also be defined over vector valued functions, in this work the independence of each output component is assumed, allowing for modeling with c ≥ 1 independent GPs. For more details see Ch.2 of Rasmussen & Williams (2005) , from which the notation and content of this section were inspired.

C PROOF OF THEOREMS AND MULTICLASS EXTENSION

C.1 THEOREM 1: RELATION TO EXPONENTIAL GEODESIC KERNEL k(x, y) = exp[-λd geo (m, m )]q(m|x, ρ)q(m |y, ρ) dm dm s.t. q(m|x, ρ) ∝ exp[-ρ||x -m|| 2 2 ]p(m) Note that ρ controls how to weight manifold samples close to x, y. We take lim ρ→∞ : lim ρ→∞ q(m|x, ρ)q(m |y, ρ) = 1 x = m and y = m 0 Otherwise Therefore the function within the integral of k(x, y) evaluates to zero at all points except x = m and y = m . Since x, y ∈ M we can evaluate the integral: k(x, y) = exp[-λd geo (x, y)] C.2 THEOREM 2: KERNEL SIMILARITY AND DECISION BOUNDARY COMPLEXITY From definition 1, given any perturbation P on P, there must exist a compact subset K i ⊂ U i s.t. R| P\int(Ki) = id| P\int(Ki) and R| int(Ki) = id| int(Ki) . Furthermore there exists a linear homeomor- phism between an open subset U i ⊆ U i with R d-1 which contains K i . We parametrize K i using a smooth function g : T → K i s.t. g(t) ∈ ∂K i ∀t ∈ ∂T . We further define g (t) = g(t) + η(t), for some perturbation ∈ R and a smooth function η : T → R d-1 We also restrict η such that η(t) = 0 ∀t ∈ ∂T and ∃ t 0 ∈ T s.t. η(t 0 ) = g(t 0 ). In other words, η is a smooth function where g (t) = g(t) ∀ > 0, ∀t ∈ ∂T , but is not identical to g for all t ∈ T . Using g (t), we define the manifold P = {g (t) : t ∈ T }. To complete the proof, we want to show that the kernel similarity between any two given points x, y ∈ R d is lower when using the manifold P for > 0 as opposed to the manifold P 0 . We therefore want to compare the two respective kernels k (x, y) and k 0 (x, y). Note that in this proof we consider the local effects of P on the kernel similarity through P 0 and P exclusively, ignoring the manifold P \ U 0 Using Euler-Lagrange, we can calculate a lower bound for d geo (g (t)), g (t ))). In particular, for any t, t ∈ T , d geo (g (t)), g (t ))) ≥ d geo (g 0 (t), g 0 (t )). d geo (g (t), g (t )) ≥ d geo (g 0 (t), g 0 (t )) (20) exp[-λd geo (g (t), g (t ))] ≤ exp[-λd geo (g 0 (t), g 0 (t ))] T T exp[-λd geo (g (t), g (t ))] dtdt ≤ T T exp[-λd geo (g 0 (t), g 0 (t ))] dtdt Note that in Eq. 22 we are integrating over all possible values of t, t , therefore the inequality is tight iff g (t) = g 0 (t) ∀t ∈ T ; i.e. = 0 (see proof in C.2.1). The case of = 0 is trivial; we instead assume > 0, in which case we can establish the following strict inequality: T T exp[-λd geo (g (t), g (t ))] dtdt < T T exp[-λd geo (g 0 (t), g 0 (t ))] dtdt Define uniform random variables T , T over the domain of g, i.e. T, T ∼ U T . Then we have: E T,T ∼U [0,1] [exp[-λd geo (g (T ), g (T ))]] < E T,T ∼U [0,1] [exp[-λd geo (g 0 (T ), g 0 (T ))]] E M,M ∼p (M ) [exp[-λd geo (M, M )]] < E M,M ∼p0(M ) [exp[-λd geo (M, M )]] We define the random variable M = g (T ) with distribution p (M ). The distribution p (M ) represents the uniform distribution U T mapped to the manifold P using g (T ). The step from Eq. 24 to Eq. 25 uses a property of distribution transformations (Eq. 2.2.5 in Casella & Berger (2001) ). Next, compare either side of Eq. 25 to our kernel formulation shown below in Eq. 26. The kernel k (x, y|ρ, λ) takes an expected value over q (M |x, ρ) and q (M |y, ρ), which are equivalent to p (M ) and p (M ) weighted with respect to x, y, and a hyperparameter ρ ≥ 0. k (x, y|ρ, λ) = E M ∼q (M |x,ρ),M ∼q (M |y,ρ) [exp[-λd geo (M, M )]] s.t. q (M |x, ρ) ∝ exp[-ρ||x -M || 2 2 ]p (M ) s.t. q (M |y, ρ) ∝ exp[-ρ||y -M || 2 2 ]p (M )

D.2 ADVERSARIAL SAMPLE FILTERING MULTI-CLASS MODELS

We elect to sample from multi-class neural network decision boundaries by running binary search on train-point adversarial example pairs. Specifically, given a test-point x 0 ∈ R d and model prediction y = argmax k∈Y F (x 0 ), decision boundary points may be generated by the following procedure: First, for each class v ∈ Y a set of M v points is randomly sampled from the set of train points on which the model predicts class v: X v ⊆ {x : argmax k∈Y F (x) = v, x ∈ X tr }, |X v | = M v (39) ∀v ∈ Y. An untargeted adversarial attack using some l p norm and radius is run on each point in X y , the set of points with the same class prediction as x 0 . Each attack output Attack un (x, ) ∈ R d is paired with its corresponding input, resulting in the set X y = {(x, Attack U n (x, )) : x ∈ X y }, where for an element (a, b) ∈ X y we have argmax k∈Y F (a) = y, argmax k∈Y F (b) = v = y, where v is an unspecified class. Likewise a targeted adversarial attack, with target class y, is run on each point in each of the sets of points that are not predicted as class y. Each attack output Attack y (x, ) ∈ R d may be paired with its input x resulting in sets In practice the entire procedure may be amortized for each class, and ran for all classes as a single post-processing step immediately after training. This results in a dictionary of boundary points which may be efficiently queried on demand via the model predicted class of any given test point. X v = {(x, Attack y (x, )) : x ∈ X v } Each adversarial attack is attempted multiple times, once using each radius value in the list: [0.0, 2e -4 , 5e -4 , 8e -4 , 1e -3 , 1e -3 , 1.5e -3 , 2e -3 , 3e -3 , 1e -2 , 1e -1 , 3e -1 , 5e -1 , 1.0]. For a given input, the output of the successful attack with smallest is used. If no attack is successful at any radius, the input is discarded from further consideration. In this work the Foolbox Rauber et al. (2017; 2020) We take the expected value of the posterior distribution as the point estimate for feature attributions, and the 95% credible interval as the estimate of uncertainty. To implement BayesLIME and BayesSHAP we use the public implementationfoot_3 . We set the number of samples to 200, disable discretization for continuous variables, and calculate the explanations over all features. Otherwise, we use the default parameters for the implementation. CXPlain. Schwab & Karlen (2019) introduces the explanation method CXPlain, which trains a surrogate explanation model based on a causal loss function. After training the surrogate model, the authors propose using a bootstrap resampling technique to estimate the variance of the predictions. In our experiments we implement the publicly available codefoot_4 . We use the default parameters, which include using a 2-layer UNet model Ronneberger et al. (2015) for the image datasets and a 2-layer MLP model for the tabular datasets. We take a 95% confidence interval from the bootstrapped results as the estimate of uncertainty. F ADDITIONAL RESULTS

F.1 EXECUTION TIME COMPARISON

In Table 2 we include an execution time comparison between the methods implemented in this paper. Results are averaged over 100 test samples. For MNIST and f-MNIST datasets, results evaluate the time to calculate uncertainty estimates with respect to all classes. All experiments were run on an internal cluster using AMD EPYC 7302 16-Core processors. We observe from the results that the methods that amortization methods (GPEC-WEG, GPEC-RBF, CXPlain) are significantly faster than perturbation methods BayesLIME and BayesSHAP.

F.2 TOY EXAMPLE: EVALUATING GPEC ON A LINEAR CLASSIFIER

In order to understand how the GPEC-WEG uncertainty estimate behaves for linear models, we use a toy example shown in Fig. F .2. In the top row, we visualize the GPEC-WEG uncertainty; we see that the uncertainty estimate is a small, constant value for the linear model (left) whereas uncertainty increases for the nonlinear model (right). Intuitively, GPEC derives its uncertainty estimate by evaluating the distribution of explanations with respect to the black-box model. Since it is generally infeasible to evaluate the space of all possible explanations, GPEC will typically estimate a "baseline" amount of uncertainty for every explanation, which depends on the sampling of the training distribution and is minimized for a linear model. In the bottom two figures we visualize the magnitude of the gradient for the two respective models, which is a noiseless estimate of feature importance. We see that even using this deterministic feature importance method, the estimates can fail to be robust due to the nonlinearity, i.e. nearby samples within the same class can have very different explanations.

F.3 VISUALIZING EFFECTS OF EXPLAINER UNCERTAINTY IN GPEC ESTIMATE

In section 5.4 we evaluate GPEC's ability to combine uncertainty from the black-box decision boundary and the uncertainty estimate from BayesSHAP and SSV explainers. In Figure 7 we extend this experiment to evaluate how well GPEC can capture the explainer uncertainty. We calculate the combined GPEC+explainer estimate using different numbers of approximation samples. Both BayesSHAP and SSV depend on sampling to generate their explanations; having fewer samples increases the variance of their estimates. As we decrease the number of samples from 200 (Row A) to 5 (Row B) we would expect that the explainer uncertainty, and consequently the combined GPEC uncertainty, would increase. We see in Row ∆ that the results follow our intuition; uncertainty increases for most of the plotted test points and uncertainty does not decrease for any points.

F.4 ADDITIONAL RESULTS FOR UNCERTAINTY VISUALIZATION EXPERIMENT

Results for Y-Axis Feature In Figure 8 we visualize the estimated explanation uncertainty as a heatmap for a grid of explanations. The generated plots only visualize the uncertainty for the feature on the x-axis. Due to space constraints, we list the results for the y-axis feature in the appendix, in Figure 8 . We can see that the results are in line with those from the x-axis figure. Black-box model output For reference, in Figure 9 we plot the probability output of the XGBoost models used in the visualization experiment (Figure 4 ).

F.5 SENSITIVITY ANALYSIS OF WEG KERNEL PARAMETERS

The WEG kernel formulation uses two parameters, ρ and λ. The parameter ρ controls the weighting between each datapoint and the manifold samples. As ρ increases, the WEG kernel places more weight on manifold samples close in 2 distance to the given datapoint. The parameter λ acts as a bandwidth parameter for the exponential geodesic kernel. Increasing λ increases the effect of the geodesic distance along the manifold. Therefore decision boundaries with higher complexity will have an increased effect on the WEG kernel similarity. In Figures 10, 11 We see that the average uncertainty changes as we decrease the number of samples, which indicates that GPEC is able to capture the uncertainty arising from BayesSHAP / SSV approximation.



Without loss of generality, we assume that the decision rule for the classifier is set to be 1 assuming prior m(x) = 0 https://github.com/dylan-slack/Modeling-Uncertainty-Local-Explainability https://github.com/d909b/cxplain



Figure 1: Illustrative example of how similar data samples can result in very different feature importance scores given a black-box model with nonlinear decision boundary. Here, two similar patients with similar predictions are given opposing feature importance scores which could result in misguided recommendations. We define a similarity based on the geometry of the decision boundary between any two given samples (red line). While the two patients are close together in the Euclidean sense, they are dissimilar under the proposed WEG kernel similarity. Using GPEC would return a high uncertainty measure for the explanations, which would flag the results for further investigation.

Figure 3: Comparison of RBF and WEG kernel similarity neighborhoods. The gray highlighted region N 0.8 (x) = {x : k(x, x ) ≥ 0.8} indicate points where the kernel similarity is greater than 0.8 with respect to the red point x. The black line is the decision boundary for the classifier f (x 1 , x 2 ) = 2 cos(10x1 ) -x 2 , where f (x 1 , x 2 ) ≥ 0 indicates class 1, and f (x 1 , x 2 ) < 0 indicates class 0. We see that for GPEC, N 0.8 (x) shrinks as x moves closer to segments of the decision boundary which are more complex.

Figure 4: Visualization of estimated uncertainty of explanations for different models and competing methods. The heatmap represents level of uncertainty for a grid of explanations for the feature on the x-axis; darker areas represent higher uncertainty. Red points represent training samples for GPEC and CXPlain as well as the reference samples for BayesSHAP and BayesLIME. The black line represents the black-box DB. The heatmap shows that GPEC-WEG is the only method that captures uncertainty from the DB, due to the WEG kernel similarity formulation.

Figure 5: Evaluation of GPEC's ability to combine DB-aware uncertainty and functional approximation uncertainty. Row (A) visualizes the GPEC-WEG uncertainty estimate using only DB-aware uncertainty. Row (B) combines the DB-aware uncertainty in (A) with the functional approximation uncertainty from the two explainers: BayesSHAP and SSV. Row (C) visualizes the change in uncertainty estimate between (A) and (B).

41) ∀v = y ∈ Y. Here, for an element (a, b) ∈ X v we have argmax k∈Y F (a) = v, argmax k∈Y F (b) = y.Thus, we have generated a diverse set of v∈Y M v pairs of points that lie on opposite sides of the decision boundary for class y. The segment between any pair from a given set X v v = y will necessarily contain a point on the class v v.s. class y decision boundary. Likewise, in the interest of further diversity, segments between any pair from the set X y will contain a point on the class v v.s. class y decision boundary, where v = y ∈ Y is unspecified. A binary search may be run on each pair to find the boundary point in the middle.

, and 12 we plot heatmaps for various combinations of ρ and λ parameters to evaluate the change in the uncertainty estimate. The black line is the decision boundary and the red points are the samples used for training GPEC. Please note that the heatmap scales are not necessarily the same for each plot.Algorithm 1 GPEC Model TrainingInput : GPEC Training Samples X ∈ R M ×d . Explainer E. Hyperparameters P (# DB Samples), J ( optional: # Samples for Functional Approximation Uncertainty estimate)Output : Explanations L ∈ R M ×S , Explanation Uncertainty U ∈ R M ×S , WEG Kernel K ∈ [0, 1] M ×M \\ Calculate Function Approximation Uncertainty Initialize L ∈ R M ×S , U ∈ R M ×S ifExplainer returns uncertainty estimate then for i = 1,2,...M do Li,:, Ui,: ← E(Xi,:) \\ Get explanations (Li,:) and explanation uncertainty (Ui,:) from Explainer end end else if Explainer is stochastic and J > 1 then for i = 1,2,...M do Initialize Q ∈ R J×S for j = 1,2,...J do Qj,: ← E(Xi,:) \\ Draw stochastic explanations for same data sample end Li,Explainer is deterministic then for i = 1,2,...M do Li,: ← E(Xi,:) end U ← 0 \\ Set functional approximation uncertainty to zero end \\ Calculate EG Kernel Matrix B ← Decision Boundary Sampler(F, P ) \\ Draw P DB samples of dimension d. B ∈ R P ×d Initialize G ∈ [0, 1] P ×P for i = 1,2,...P do for j = 1,2,...P do Gi,j ← exp(-λdgeo(Bi,:, Bj,:)) \\ Eq. 4 end end \\ Calculate WEG Kernel Matrix Initialize W ∈ [0, 1] M ×P \\ Initialize weighting matrix for i = 1,2,...M do for j = 1,2,...P do Wi,j ← exp(-ρ||Xi,: -Bj,Normalize weighting distribution (Eq. 9) end K ← W GW \\ Apply weighting to EG Kernel Matrix Return L, U, K

Figure 6: Comparison of a GPEC uncertainty estimates on linear and nonlinear toy models:f linear (x 1 , x 2 ) = x 2 and f cos (x 1 , x 2 ) = 2 cos( 10 x1 ) -x 2. TOP: Uncertainty estimate from GPEC. Applying GPEC on a linear model results in a small, relatively constant variance for the test explanations. As the DB becomes more complex, as in f cos , the uncertainty estimate increases around the nonlinearities in the DB. BOTTOM: Gradient norm, which is an estimate of feature importance. The feature importance estimate becomes unstable (i.e. nearby samples of the same class have can have very different explanations) due to the nonlinearity of f cos .

Figure7: Comparison of the change in quantified uncertainty of explanations as we change the number of samples for BayesSHAP and SSV. Row (A) visualizes the combined uncertainty estimate using GPEC and either BayesSHAP or SSV, using 200 samples for approximating the BayesSHAP / SSV explanation. In Row (B) we decrease the number of samples to 5 and recalculate the estimated uncertainty. Row (∆) represents the change in uncertainty estimate between (A) and (B). We see that the average uncertainty changes as we decrease the number of samples, which indicates that GPEC is able to capture the uncertainty arising from BayesSHAP / SSV approximation.

implementation of the Projected Gradient Descent (PGD)Madry et al. (2018) attack with the l ∞ norm was used for both targeted and untargeted attacks. The M c values used for the relevant datasets are indicated below in Appendix E.1.Online Shopper. The UCI Online Shoppers dataset consists of clickstream data from 12,330 web sessions. Each session is generated from a different individual and specifies whether a revenuegenerating transaction takes place. There are 17 other features including device information, types of pages accessed during the session, and date information. An XGBoost model is trained to predict whether a purchase occurs.German Credit. The German Credit dataset consists of 1,000 samples; each sample represents an individual who takes credit from a bank. The classification task is to predict whether an individual is

Execution time comparison for estimating the uncertainty for all features for 100 samples (in seconds). For MNIST and f-MNIST datasets, results represent execution time for calculating uncertainty estimates with respect to all ten classes. For GPEC-WEG, GPEC-RBF, and CXPlain methods, the results show inference times. considered a good or bad risk. Features include demographic information, credit history, and information about existing loans. Categorical features are converted using a one-hot encoding, resulting in 24 total features.MNIST. The MNIST dataset (LeCun & Cortes, 2010) consists of 70k grayscale images of dimension 28x28. Each image has a single handwritten numeral, from 0-9. A fully connected network with layer sizes 784-700-400-200-100-10 and ReLU activation functions was trained and validated on on 50, 000 and 10, 000 image label pairs, respectively. Training lasted for 30 epochs with initial learning rate of 2 and a learning rate decay of γ = 0.5 when training loss is plateaued. During adversarial example generation we used M y = 500 and M c = 50 ∀c = y.Fashion MNIST The Fashion MNIST dataset(Xiao et al., 2017) contains 70k grayscale images of dimension 28x28. There are 10 classes, each indicating a different article of clothing. We train a MLP model with the same architecture used for the MNIST dataset, however we increase training to 100 epochs and increase the initial learning rate to 3. During adversarial example generation we used M y = 500 and M c = 50 ∀c = y.Slack et al. (2021) extend the methods LIME and KernelSHAP to use a Bayesian Framework. BayesLIME and BayesSHAP are fit using Bayesian linear regression models on perturbed outputs of the black-box model. The posterior distribution of the model weights are taken as the feature attributions instead of the frequentist estimate that characterizes LIME and KernelSHAP.

annex

Note that when ρ is set to zero, q(M |x, 0) = p(M ) and q(M |y, 0) = p(M ). Therefore Eq. 25 is equivalent to the inequality k (x, y|0, λ) < k 0 (x, y|0, λ).We next want to prove that the inequality k (x, y|ρ, λ) < k 0 (x, y|ρ, λ) also holds for non-zero values of ρ. For convenience, defineUnder this definition, we want to prove there exists ρ 0 > 0 such that f (ρ) > 0 ∀ρ < ρ 0 . From Eq. 25, we established that f (0) > 0. Assume thatIt therefore follows that c > 0. In addition, note that f (ρ) is continuous with respect to ρ (see proof in section C.2.3). Therefore for any > 0 there exists δ > 0 s.t. ρ < δ implies |f (ρ) -c| < .We choose = c and the define the corresponding δ to be ρ 0 . Therefore:Since this result holds for any i, it follows that the piecewise linear manifold P is a local minimum under any perturbation along a specific chart or combination of charts with respect to the kernel similarity k(x, y) ∀x, y ∈ R d .C.2.1 PROOF: EQ. 23We want to prove:Consider the LHS of Eq. 31:Define h(t, t ) as the function inside the integrals in Eq. 33. From Eq. 21, h(t, t Ch.6 Rudin (1976) ).It therefore follows that:From the definition of η(t) in g (t) = g(t) + η(t), there must exist t ∈ T s.t. η(t) = 0. Therefore must be zero for Eq. 34 to hold. It follows that g (t) = g 0 (t) ∀t ∈ T .

C.2.2 PROOF: CONTINUITY OF h(t, t )

We prove that h(t, t ) is continuous with respect to t, t . First note that by definition, g (t) is a continuous parametrization of the manifold P . From Burago et al. (2001) , it follows that for any two points g (t), g (t ) ∈ P , d geo (g (t), g (t )) is continuous. Since the exponential functional preserves continuity and the sum of continuous functions are also continuous, it follows that h(t, t ) is continuous.C.2.3 PROOF: CONTINUITY OF k(x, y) WITH RESPECT TO ρWe prove that k(x, y) is continuous with respect to ρ., where ρ 0 is a fixed positive constant:We set B to be ||x-m|| 2 2 , ||y -m || 2 2 , and ||x-m|| 2 2 +||y -m || 2 2 , which shows that Z m (ρ), Z m (ρ), and Z(ρ) are also continuous, respectively. It then follows that the entirety of Eq. 36 is continuous.

C.3 EXTENDING TO MULTICLASS CLASSIFIERS

In the multiclass case we define a black-box prediction model F : X → R c . We consider the one-vsall DB for every class y ∈ Y = {1, . . . , c}, defined as, where F k indicates the model output for class k. We then apply the GPEC framework separately to each class using the respective DB. The uncertainty estimate of the GP model would be of dimension d × s.

D IMPLEMENTATION DETAILS D.1 ALGORITHM

The GPEC training algorithm is outlined in Alg. 1. GPEC is parametrized using a multi-output Gaussian Process Regression model using the explanations as labels. Once the explanations L, explanation uncertainty U , and WEG kernel matrix K are generated from Alg. 1, we can directly use these values to update the GP posterior and calculate the prediction variance for new test samples (Eq. 19). 

