ANALYZING THE EFFECTS OF CLASSIFIER LIPSCHITZ-NESS ON EXPLAINERS

Abstract

Machine learning methods are getting increasingly better at making predictions, but at the same time they are also becoming more complicated and less transparent. As a result, explainers are often relied on to provide interpretability to these black-box prediction models. As crucial diagnostics tools, it is important that these explainers themselves are reliable. In this paper we focus on one particular aspect of reliability, namely that an explainer should give similar explanations for similar data inputs. We formalize this notion by introducing and defining explainer astuteness, analogous to astuteness of classifiers. Our formalism is inspired by the concept of probabilistic Lipschitzness, which captures the probability of local smoothness of a function. For a variety of explainers (e.g., SHAP, RISE, CXPlain), we provide lower bound guarantees on the astuteness of these explainers given the Lipschitzness of the prediction function. These theoretical results imply that locally smooth prediction functions lend themselves to locally robust explanations. We evaluate these results empirically on simulated as well as real datasets.

1. INTRODUCTION

Machine learning models have improved over time at prediction and classification, especially with the advances made in deep learning and availability of large amounts of data. These gains in predictive power have often been achieved using increasingly complex and black-box models. This has led to significant interest in, and a proliferation of, explainers that provide explanations for the predictions made by these black-box models. Given the crucial importance of these explainers it is imperative to understand what makes them reliable. In this paper we focus on explainer robustness. A robust explainer is one where similar inputs results in similar explanations (Alvarez-Melis & Jaakkola, 2018) . For example, consider two patients given the same diagnosis in a medical setting. These two patients share identical symptoms and are demographically very similar, therefore a diagnostician would expect that factors influencing the model decision should be similar as well. Prior work in explainer robustness suggests that this expectation does not always hold true (Alvarez-Melis & Jaakkola, 2018; Ghorbani et al., 2019) ; small changes to the input samples can result in large shifts in explanation. For this reason we investigate the theoretical underpinning of explainer robustness. Specifically, we focus on investigating the connection between explainer robustness and smoothness of the black-box function being explained. We propose and formally define explainer astuteness -a property of explainers which captures the probability that a given method provides similar explanations to similar data points. This definition allows us to evaluate the robustness for a given explainer over the entire dataset and helps tie explainer robustness to probabilistic Lipschitzness of classifiers. We then provide a theoretical way to connect this explainer astuteness to the probabilistic Lipschitzness of the black-box function that is being explained. Since probabilistic Lipschitzness is a measure of the probability that a function is smooth in a local neighborhood, our results demonstrate how the smoothness of the black-box function itself impacts the astuteness of the explainer. This implies that enforcing smoothness on black-box functions lends them to more robust explanations. Related Work. A wide variety of explainers have been proposed in the literature (Guidotti et al., 2018; Arrieta et al., 2020) . Explainers can broadly be categorized as feature attribution or feature selection explainers. Feature attribution explainers provide continuous-valued importance scores to each of the input features, while feature selection explainers provide binary decisions on whether a feature is important or not. Some popular feature attribution explainers can be viewed through the lens of Shapley values such as SHAP (Lundberg & Lee, 2017) , LIME (Ribeiro et al., 2016) and LIFT (Shrikumar et al., 2016) . Some models such as CXPlain (Schwab & Karlen, 2019) , PredDiff (Zintgraf et al., 2017) and feature ablation explainers (Lei et al., 2018) calculate feature attributions by simulating individual feature removal, while other methods such as RISE (Petsiuk et al., 2018) calculate the mean effect of a feature's presence to attribute importance to it. In contrast, feature selection methods include individual selector approaches such as L2X (Chen et al., 2018) and INVASE (Yoon et al., 2018) , and group-wise selection approaches such as gI (Masoomi et al., 2020) . While seemingly diverse, these models have been shown to have striking underlying similarities, for example, Lundberg & Lee (2017) unify six different explainers under a single framework. Recently, Covert et al. (2020) went a step further and combined 25 existing methods under the overall class of removal-based explainers. Similarly, there has been a recent increase in research focused on analyzing the behaviour of these explainers themselves in ways similar to how classification models have been analyzed. Recent work has focused on dissecting various properties of explainers. Yin et al. (2021) propose stability and sensitivity as measures of faithfulness of explainers to the decision-making process of the blackbox model and empirically demonstrate the usefulness of these measures. Li et al. (2020) explore connections between local explainability and model generalization. Ghorbani et al. (2019) test the robustness of explainers through systemic and adversarial perturbations. Agarwal et al. (2022) define and discuss theoretical guarantees around faithfulness and stability in the context of Graph Neural Networks. Our definition of astuteness is related to what they call stability, but defined as a probability over all available instances in such a way that connection to probabilistic Lipschitzness of the classifier becomes clear. Alvarez-Melis & Jaakkola (2018) empirically show that robustness, in the sense that explainers should provide similar explanations for similar inputs, is a desirable property and how forcing this property yields better explanations. Recently, Agarwal et al. (2021) explore the robustness of LIME (Ribeiro et al., 2016) and SmoothGrad (Smilkov et al., 2017) , and prove that for these two methods their robustness is related to the maximum value of the gradient of the predictor function. Our work is closely related to Alvarez-Melis & Jaakkola (2018) and Agarwal et al. (2021) on explainer robustness. However, instead of enforcing explainers to be robust themselves (Alvarez-Melis & Jaakkola, 2018) , our theoretical results suggest that ensuring robustness of explanations also depends on the smoothness of the black-box function that is being explained. Our results are complementary to the results obtained by Agarwal et al. (2021) in that our theorems cover a wider variety of explainers as compared to only Continuous LIME and SmoothGrad (see contributions below). We further relate robustness to probabilistic Lipschitzness of black-box models, which is a quantity that can be empirically estimated. Additionally, there has been recent work estimating upper-bounds of Lipschitz constant for neural networks (Virmaux & Scaman, 2018; Fazlyab et al., 2019; Gouk et al., 2021) , and enforcing Lipschitz continuity during neural networks training, with an eye towards improving classifier robustness (Gouk et al., 2021; Aziznejad et al., 2020; Fawzi et al., 2017; Alemi et al., 2016) . Fel et al. (2022) empirically demonstrated that 1-Lipschitz networks are better suited as predictors that are more explainable and trustworthy. Our work provides crucial additional motivation for that line of research; i.e., it provides theoretical reasons to improve Lipschitzness of neural networks from the perspective of enabling more robust explanations.

Contributions:

• We formalize and define explainer astuteness which captures the probability that a given explainer provides similar explanations to similar points. This formalism allows us to theoretically analyze robustness properties of explainers. • We provide theoretical results that connect astuteness of explainers to the smoothness of the black-box function they are providing explanations on. Our results suggest that smooth black-box functions result in explainers providing more astute explanations. While this statement is intuitive, proving it is non-trivial and requires additional assumptions for different explainers (See Section 3.2). • Specifically we prove this result for astuteness of three classes of explainers: (1) Shapley value based (e.g. SHAP), (2) explainers that simulate mean effect of features (e.g. RISE), and (3) explainers that simulate individual feature removal (e.g. CXPlain). Formally, our theorems establish a lower bound on explainer astuteness that depends on the Lipschitzness Figure 1 : In this figure we visualize the implication of our theoretical results. For a black-box prediction function that is locally Lipschitz with a constant L 1 , the predictions for any two points x, x ′ such that d p (x, x ′ ) ≤ r are within L 1 d p (x, x ′ ) distance from each other. Given such a prediction function, the explanation for the same data points are also expected to be within λ 1 d p (x, x ′ ) of each other where λ 1 = CL 1 √ d where C is a constant. If we consider a second black-box function with L 2 > L 1 that results in λ 2 > λ 1 , indicating that the explanations for this black-box function can actually end up being farther apart as compared to the first prediction function. This result implies that locally smooth black-box functions lend themselves to more astute (i.e., robust) explanations. of the black-box function and square root of data dimensionality. Figure 1 summarizes this main contribution of our work. • We demonstrate experimentally that this lower bound indeed holds in practice by comparing the astuteness predicted by our theorems to the observed astuteness on simulated and real datasets. We also demonstrate experimentally that the same neural network when trained with Lipschitz constraints lends itself to more astute explanations compared to when it is trained with no constraints. 2 BACKGROUND AND NOTATIONS (Schwab & Karlen, 2019) , PredDiff (Zintgraf et al., 2017) , permutation tests (Strobl et al., 2008) , and feature ablation explainers (Lei et al., 2018) . All of these methods simulate feature removal either explicitly or implicitly. For example, SHAP explicitly considers effect of using subsets that include a feature as compared to the effect of removing that feature from the subset. RISE removes subsets of features while always keeping the feature that is being evaluated, and estimates the average effect of keeping that feature when other features are randomly removed. CXPlain explicitly considers the impact of removing a feature on the loss function used in training the predictor function.

2.2. NOTATION

We denote d-dimensional input data as x ∈ R d , from a data distribution D. The black-box predictor function is denoted by f , where f (x) is the prediction given x, this function is assumed to have been trained on the training samples from D. The explainer is represented by a function ϕ where ϕ(x) ∈ R d is the feature attribution vector representing attributions for all features in x while ϕ i (x) ∈ R is the attribution for the i th feature. To simulate the presence or absence of features in a given subset of features, we use an indicator vector z ∈ {0, 1} d , where z i = 1 when the i th feature is present in the subset. To indicate we are only using subsets where feature z i = 1, we use z +i ; and to indicate only using subsets where feature z i = 0, we use z -i . Lastly, the p-norm induced distance between any two points x, x ′ is denoted by d p (x, x ′ ) = ||x -x ′ || p , where ||.|| p is the p-norm.

3. EXPLAINER ASTUTENESS

Our main interest is in defining a metric that can capture the difference in explanations provided by an explainer to points that are close to each other in the input space. The same question has been asked for classifiers. Bhattacharjee & Chaudhuri (2020) came up with the concept of Astuteness of classifiers, which captures the probability that similar points are assigned the same label by a classifier. Formally they provide the following definition: Definition 1. Astuteness of classifiers (Bhattacharjee & Chaudhuri, 2020) : The astuteness of a classifier f over D, denoted as A r (f, D) is the probability that ∀x, x ′ ∈ D such that d(x, x ′ ) ≤ r the classifier will predict the same label. Ar(f, D) = P x,x ′ ∼D [f (x) = f (x ′ )|d(x, x ′ ) ≤ r] The obvious difference in trying to adapt this definition of astuteness to explainers is that explanations for nearby points do not have to be exactly the same. Keeping this in mind, we propose and formalize explainer astuteness, as the probability that the explainer assigns similar explanations to similar points. The formal definition is as follows: Definition 2. Explainer astuteness: The explainer astuteness of an explainer E over D, denoted as A r,λ (E, D) is the probability that ∀x, x ′ ∈ D such that d p (x, x ′ ) ≤ r the explainer E will provide explanations ϕ(x), ϕ(x ′ ) that are at most λ • d p (x, x ′ ) away from each other, where λ ≥ 0 A r,λ (E, D) = P x,x ′ ∼D [dp(ϕ(x), ϕ(x ′ )) ≤ λ • dp(x, x ′ ) dp(x, x ′ ) ≤ r] A critical observation about definition 2 is that it not only relates to the previously defined notion of classifier astuteness, but also connects to the concept of probabilistic Lipschitzness. Probabilistic Lipschitzness captures the probability of a function being locally smooth given a radius r. It is specially useful for capturing a notion of smoothness of complicated neural network functions for which enforcing global and deterministic Lipschitzness is difficult. Mangal et al. (2020) formally defined probabilistic Lipschitzness as follows: Definition 3. Probabilistic Lipschitzness (Mangal et al., 2020) : Given 0 ≤ α ≤ 1, r ≥ 0, a function f : X → R is probabilistically Lipschitz with a constant L ≥ 0 if P x,x ′ ∼D [dp(f (x), f (x ′ )) ≤ L • dp(x, x ′ ) dp(x, x ′ ) ≤ r] ≥ 1 -α (3)

3.1. THEORETICAL BOUNDS OF ASTUTENESS

A cursory comparison between equation 2 and equation 3 hints at the two concepts being related to each other. In fact, explainer astuteness can be viewed as probabilistic Lipschitzness of the explainer when it is viewed as a function with a Lipschitz constant λ. However, a much more interesting question to explore is how the astuteness of explainers is connected to the Lipschitzness of the black-box model they are trying to explain. We introduce and prove the following theorems which provide theoretical bounds that connect the 2017) unify 6 existing explanation approaches within the SHAP framework. Each of these explanation approaches (such as DeepLIFT and kernelSHAP) can be viewed as approximations of SHAP, since SHAP in its theoretical form is difficult to calculate. However, in this section we use the theoretical definition of SHAP to establish bounds on astuteness. For a given data point x ∈ X and a prediction function f , the feature attribution provided by SHAP for the i th feature is given by: ϕi(x) = z -i |z-i|!(d -|z-i| -1)! d! [f (x ⊙ z+i) -f (x ⊙ z-i)] Before moving on to the actual theorem, we introduce and prove the following Lemma which is necessary for the proof of Theorem 3.1. Lemma 1. If, P x,x ′ ∼D [dp(f (x), f (x ′ )] ≤ L * dp(x, x ′ ) dp(x, x ′ ) ≤ r] ≥ 1 -α then for y = x ⊙ z +i , y ′ = x ′ ⊙ z +i , i.e. y, y ′ ∈ ∪N k = {y|y ∈ R d , ||y|| 0 = k, y i ̸ = 0} for k = 1, . . . , d P x,x ′ ∼D [dp(f (y), f (y ′ )) ≤ L * dp(y, y ′ ) dp(y, y ′ ) ≤ r] ≥ 1 -β where β ≥ α assuming that the distribution D is defined for all x and y and the equality is approached if the probability of sampling points from the set N k = {y|y ∈ R d , ||y|| 0 = k, y i ̸ = 0} approaches zero for k = 2, . . . , d relative to the probability of sampling points from N 1 . Proof. (Sketch, full proof in Appendix A) Assume p k is the probability of occurrence of the set N k = {x|x ∈ R d , ||x|| 0 = k, x i ̸ = 0} in the input space and γ k is the probability of the set of points that violate Lipschitzness in the set N k . In finite case each set N k can be mapped to a set N ′ k of cardinality 2 d-k |N k | after masking with all possible z +i . In probability terms, the probability of N ′ k can be written as p ′ k = 2 d-k p k d j=1 2 -j pj . Let β be the the proportion of points in all N ′ k that also violate Lipschitzness in their unmasked form then β can be written as β = d k=1 2 -k p k γ k d j=1 2 -j pj Considering worse case β requires solving the following equation, β * = max γ 1 ,...,γ d d k=1 2 -k p k γ k d j=1 2 -j pj , d i=1 piγi = α, 0 ≤ α ≤ 1, 0 ≤ γi ≤ 1, ∀i = 1, . . . , d The result of this maximization will be β * ≥ α. In the specific case where p k → 0 for k = 2, . . . , d (i.e., where the probability of sampling any x with a 0 valued element is 0), β → α. Theorem 3.1. (Astuteness of SHAP) Consider a given r ≥ 0 and 0 ≤ α ≤ 1, and a trained predictive function f that is probabilistic Lipschitz with a constant L, radius r measured using d p (., .) and with probability at least 1 -α. Then for SHAP explainers we have astuteness A r,λ ≥ 1 -β for λ = 2 p √ dL. Where β ≥ α, and β → α under conditions specified in Lemma 1. Proof. Given input x and another input x ′ s.t. d(x, x ′ ) ≤ r. And letting |z-i|!(d-|z-i|-1)! d! = C z . Using equation 4 we can write, dp(ϕi(x), ϕi(x ′ )) = || z -i Cz[f (x ⊙ z+i) -f (x ⊙ z-i)] - z -i Cz[f (x ′ ⊙ z+i) -f (x ′ ⊙ z-i)]||p (6) Combining the two sums and re-arranging the R.H.S, dp(ϕi(x), ϕi(x ′ )) = || z -i Cz[f (x ⊙ z+i) -f (x ′ ⊙ z+i) + f (x ′ ⊙ z-i) -f (x ⊙ z-i)]||p Using triangular inequality on the R.H.S twice, dp(ϕi(x), ϕi(x ′ )) ≤ || z -i Cz[f (x ⊙ z+i) -f (x ′ ⊙ z+i)]||p + || z -i Cz[f (x ′ ⊙ z-i) -f (x ⊙ z-i)]||p ≤ z -i Cz||f (x ⊙ z+i) -f (x ′ ⊙ z+i)||p + z -i Cz||f (x ′ ⊙ z-i) -f (x ⊙ z-i)||p We can replace each value inside the sums in equation 8 with the maximum value across either sums. Doing so would still preserve the inequality in equation 8, as the sum of n values is always less than the maximum among those summed n times. Without loss of generality let us assume this maximum is |f (x ⊙ z * +i ) -f (x ′ ⊙ z * +i ) | for some particular z * . This gives us: dp(ϕi(x), ϕi(x ′ )) ≤ ||f (x ⊙ z * +i ) -f (x ′ ⊙ z * +i )||p z -i Cz + ||f (x ⊙ z * +i ) -f (x ′ ⊙ z * +i )||p z -i Cz (9) However, z-i C z = z-i |z-i|!(d-|z-i|-1)! d! = 1, which gives us, dp(ϕi(x), ϕi(x ′ )) ≤ 2||f (x ⊙ z * +i ) -f (x ′ ⊙ z * +i )||p = 2dp(f (x ⊙ z * +i ), f (x ′ ⊙ z * +i )) Using the fact that f is probabilistic Lipschitz with a given constant L ≥ 0, d p (x, x ′ ) ≤ r, d p (x ⊙ z * +i , x ′ ⊙ z * +i ) ≤ d p (x, x ′ ) and Lemma 1. We get: P [2dp(f (x ⊙ z * +i ), f (x ′ ⊙ z * +i )) ≤ 2L • dp(x, x ′ )] ≥ 1 -β Since equation 10 establishes that d p (ϕ i (x), ϕ i (x ′ )) ≤ 2d p (f (x ⊙ z * +i ), f (x ′ ⊙ z * +i ) ), the below inequality can be now established: P [dp(ϕi(x), ϕi(x ′ )) ≤ 2L • dp(x, x ′ )] ≥ 1 -β (11) Note that equation 11 is true for each feature i ∈ {1, ..., d}. To conclude our proof, we note that dp(x, y) = p d i |xi -yi| p ≤ p d i max i |xi -yi| p = p √ d max i dp(xi, yi) Utilizing this in equation 11, without loss of generality assuming d p (ϕ i (x), ϕ i (x ′ )) corresponds to the maximum, gives us: P [dp(ϕ(x), ϕ(x ′ )) ≤ 2 p √ dL • dp(x, x ′ )] ≥ 1 -β Since P [d p (ϕ(x), ϕ(x ′ )) ≤ 2 p √ dL • d p (x, x ′ )] in equation 12 defines A λ,r for λ = 2 p √ dL, this concludes the proof. Corollary 1. If the prediction function f is locally deterministically L-Lipschitz (α = 0) at radius r then Shapley explainers are λ-astute for radius r ≥ 0 for λ = 2 p √ dL Proof. Note that definition 3 reduces to the definition of deterministic Lipschitz if α = 0. Which means equation 12 will be true with probability 1. Which concludes the proof.

3.1.2. ASTUTENESS OF "REMOVE INDIVIDUAL" EXPLAINERS

Within the framework of feature removal explainers, a sub-category is the explainers that work by removing a single feature from the set of all features and calculating feature attributions based on change in prediction that result from removing that feature. This category includes Occlusion, CXPlain (Schwab & Karlen, 2019) , PredDiff (Zintgraf et al., 2017) Permutation tests (Strobl et al., 2008) , and feature ablation explainers (Lei et al., 2018) . "Remove individual" explainers determine feature explanations for the i th feature by calculating the difference in prediction with and without that feature included for a given point x. Let z -i ∈ {0, 1} d represent a binary vector with z i = 0, then the explanation for feature i can be written as: ϕ(xi) = f (x) -f (x ⊙ z-i) Theorem 3.2. (Astuteness of Remove individual explainers) Consider a given r ≥ 0 and 0 ≤ α ≤ 1 and a trained predictive function f that is locally probabilistic Lipschitz with a constant L, radius r measured using d p (., .) and probability at least 1 -α. Then for Remove individual explainers, we have the astuteness A r,λ ≥ 1 -α, for λ = 2 p √ dL, where d is the dimensionality of the data. Proof. (Sketch, full proof in Appendix A) By considering another point x ′ such that d p (x, x ′ ) ≤ r and equation 13 we get, dp(ϕ(xi), ϕ(x ′ i )) = dp(f (x) -f (x ⊙ z-i), f (x ′ ) -f (x ′ ⊙ z-i)) then following the exact same steps as the proof for Theorem 1 i.e. writing the right hand side in terms of p-norm, utilizing triangular inequality, and the definition of probabilistic Lipschitzness leads us to the desired result. Corollary 2. If the prediction function f is locally L-Lipschitz at radius r ≥ 0, then remove individual explanations are λ-astute for radius r and λ = 2 p √ dL. Proof. Same as proof for Corollary 2.1.

3.1.3. ASTUTENESS OF RISE

RISE determines feature explanation for the i th feature by sampling subsets of features and then calculating the mean value of the prediction function when feature i is included in the subset. RISE feature attribution for a given point x and feature i for a prediction function f can be written as: ϕi(x) = E p(z|z i =1) [f (x ⊙ z)] The following theorem establishes the bound on λ for explainer astuteness of RISE in relation to the Lipschitzness of black-box prediction function. Theorem 3.3. (Astuteness of RISE) Consider a given r ≥ 0 and 0 ≤ α ≤ 1, and a trained predictive function f that is locally deterministically Lipschitz with a constant L (α = 0), radius r measured using d p (., .) and probability at least 1 -α. Then for RISE explainer is λ-astute for radius r and λ = p √ dL. Proof. (Sketch, full proof in Appendix A) Given input x and another input x ′ s.t. d(x, x ′ ) ≤ r, using equation 15 we can write dp(ϕi(x), ϕi(x ′ )) = dp(E p(z|z i =1) [f (x ⊙ z)], E p(z|z i =1) [f (x ′ ⊙ z)]) = ||E p(z|z i =1) [f (x ⊙ z)] -E p(z|z i =1) [f (x ′ ⊙ z)]||p = ||E p(z|z i =1) [f (x ⊙ z) -f (x ′ ⊙ z)]||p Using Jensen's inequality on R.H.S followed by the fact that E[f ] ≤ max f dp(ϕi(x), ϕi(x ′ )) ≤ max z dp(f (x ⊙ z), f (x ′ ⊙ z)) Using the fact that f is is deterministically Lipschitz and d p (ϕ(x), ϕ(x ′ )) ≤ p √ d * max i d p (ϕ i (x), ϕ i (x ′ )) gives us, P [dp(ϕ(x), ϕ(x ′ ) ≤ p √ dL • dp(x, x ′ )] ≥ 1 (18) Since P [d p (ϕ(x), ϕ(x ′ ) ≤ p √ dL•d p (x, x ′ )] defines A λ,r for λ = p √ dL, this concludes the proof.

3.2. IMPLICATIONS

The above theoretical results all provide the same critical implication, that is, explainer astutness is lower bounded by the Lipschitzness of the prediction function. This means that black-box classifiers that are locally smooth (have a small L at a given radius r) lend themselves to probabilistically more robust explanations. This work provides the theoretical support on the importance of enforcing smoothness of classifiers to astuteness of explanations. Note that while this implication makes intuitive sense, proving it for specific explainers is non-trivial as demonstrated by the three theorems above. The statement holds true for all three explainers when the classifier can be assumed to be deterministically Lipschitz, the conditions under which it is still true for probabilistic Lipschitzness vary in each case. For Theorem 3.1 we have to assume that distribution D is defined over masked data in addition to the input data and ideally the probability of sampling of masked data from is significantly smaller compared to probability of sampling points with no value exactly equal to 0. For Theorem 3.2 the statement is true without additional assumptions. For Theorem 3.3 we can only prove the statement to be true for the detereminsitic case.

4. EXPERIMENTS

To demonstrate the validity of our theoretical results, we perform a series of experiments. We train four different classifiers on each of five datasets, and then explain the decisions of these classifiers using three explainers. repository (Asuncion & Newman, 2007) namely Rice (Cinar & Koklu, 2019) and Telescope (Ferenc et al., 2005) . Details for these datasets can be found in Appendix B. For each dataset we train the following four classifiers; 2layer: A two-layer MLP with ReLU activations. For simulated datasets each layer has 200 neurons, while for the 2 real datasets we use 32 neurons in each layer. 4layer: A four-layer MLP with ReLU activations, with the same number of neurons per layer as 2layer. linear: A linear classifier. svm: A support vector machine with Gaussian kernel. The idea here is that each of these classifiers will have different Lipschitz behavior, and that should lower bound the explainer astuteness when explaining each of these classifiers according to our theoretical results. We evaluate 3 explainers here that are representative of our 3 theorems. 2021)'s proposal we constrain the Lipschitz constant for each layer by adding a projection step during training where after each update the weight matrices are projected to a feasible set if they violate the constraints on the Lipschitz constant, the constraints can be controlled via a hyperparameter. We use this method to train a four layer MLP with high, low and no Lipschitz constraint. We then calculate astuteness of each of our explainers for all three versions of this neural network. Figure 2 shows the results. The goal of this set of experiments is to demonstrate the relationship between Lipschitz regularity of a NN and the astuteness of explainers. As the same NN is trained on the same data but with different levels of Lipschitz constraints enforced, the astuteness of explainers varies accordingly. In all cases we see astuteness reaching 1 for smaller values of λ for the same NN when it is highly constrained (lower lipschitz constant L) vs less constrained or unconstrained. The results provide empirical evidence in support of the main conclusion that can be drawn from our work: i.e., enforcing Lipschitzness on classifiers lends them to more astute post-hoc explanations.

4.2. ESTIMATING PROBABILISTIC LIPSCHITZNESS AND LOWER BOUND FOR ASTUTENESS

To demonstrate the connection between explainer astuteness and probabilistic Lipschitzness as alluded to by our theory we need to estimate probabilistic Lipschitzness for classifiers. In our experiments we achieve this by by empirically estimating the P x,x ′ ∼D (equation 3) for a range of values of L ∈ (0, 1) incremented at 0.1. We do this for each classifier and for each dataset D and set r as median of pairwise distance for all training points. According to equation 3 this gives us an upperbound on 1 -α i.e. we can say that for a given L, r the classifier is Lipschitz with probability at least 1 -α. We can use the estimates for probabilistic Lipschitzness to predict the lower bound of astuteness using our theorems. We do this by noting that our theorems imply that for λ = CL √ d, explainer astuteness is ≥ 1 -α. This means we can guarantee that for λ ≥ LC √ d explainer astuteness should be lower bounded by 1 -α. For each dataset-classifier-explainer combination we can plot two curves. One, that represents the predicted lower bound on explainer astuteness given a classifier, as described in the previous paragraph. Second, the actual estimations of explainer astuteness using Definition 2. According to our theoretical results, at a given λ the estimated explainer astuteness should stay above the predicted astuteness based on the Lipschitzness of classifiers. We show these curves in Appendix Figure 3 but summarize them in tabular form in Table 1 to conserve space. The table shows the difference between the AUC under the estimated astuteness curves (AUC) and the AUC under the predicted lower bound (AUC lb ). This number captures the average difference of the lowerbound over a range of λ values. Note that the values are all positive supporting our result as a lower bound. 

5. CONCLUSION, LIMITATIONS AND BROADER IMPACT

In this paper we formally defined explainer astuteness which captures the probability that a given explainer will assign similar explanations to similar points. We theoretically prove that this explainer astuteness is proportional to the probabilistic Lipschitzness of the black-box function that is being explained. As probabilistic Lipschitzness captures local smoothness properties of a function, this result suggests that enforcing smoothness on black-box models can lend these models to more robust explanations. In terms of limitations, we observe that our empirical results suggest that our predicted lower bound can be tightened further. One possible conjecture here is that the tightness of this bound depends on how different explainers calculate attribution scores, e.g. empirically we observe RISE and SHAP (that both depend on expectations over subsets) behave similarly to each other but different from CXPlain. Some explainers such as LIME for tabular data have the option to use a discretization step prior to calculating feature attributions. As a consequence, two observations with all features belonging to the same bins would receive exactly the same explanation, whereas two arbitrarily close inputs may receive completely different explanations (when the number of perturbed sample is large (Garreau & von Luxburg, 2020) ). In that sense, tabular LIME would not be astute by our formulation, regardless of classifier Lipschitzness. Robustness is also only one property of a reliable explainer; there are other properties that are investigated in recent literature, as we outline in Section 1. These other properties, e.g. faithfulness (Agarwal et al., 2022) may also be theoretically probed in very similar ways as we did for robustness. Additionally, robustness can sometimes be at odds with with correctness (See for example Zhou et al. (2022) and "Logic Trap 3" in Ju et al. (2022) ) and is best viewed as one part of explanation reliability and trustworthiness (Zhou et al., 2022) . From a broader societal impact perspective, we would like to make it clear that just enforcing Lipschitzness on blackbox classifiers should not be considered as doing enough in terms of making them more transparent and interpretable. Our work is intended to be a call to action for the field to concentrate more on improving blackbox models for explainability purposes when they are conceptualized and trained and provides one of possibly many ways to achieve that goal.

A DETAILED PROOFS

We include the detailed proofs for Lemma 1, Theorems 3.3 and 3.2 here. Proof. (Proof for Lemma 1) Let us assume, p k = P [N k ], s.t.N k = {x | x ∈ R d , ||x|| 0 = k, x i ̸ = 0} and let L be the set of points that violate Lipschitzness, then assume, γ k = P [ L | N k ] given that α is the probability of the set of points that violate Lipschitzness across D, we can use Bayes' rule to write, α = P [ L] = d k=1 p k γ k If we consider the case where the sets N k are finite, each N k can be mapped to a set N ′ k of cardinality, |N ′ k | = d-k b=0 d -k b |N k | = 2 d-k |N k | In more general terms, the probability of N ′ k can be written as, p ′ k = P [N ′ k ] = 2 d-k p k d j=1 2 d-j p j = 2 -k p k d j=1 2 -j p j Let us define β as the proportion of points in all N ′ k that also violate Lipschitzness in their unmasked form. This leads us to the following equation for β β = d k=1 2 -k p k γ k d j=1 2 -j pj The worse case β would then be obtained by considering a maximization over γ k , β * = max γ 1 ,...,γ d d k=1 2 -k p k γ k d j=1 2 -j pj , d i=1 piγi = α, 0 ≤ α ≤ 1, 0 ≤ γi ≤ 1, ∀i = 1, . . . , d This constrained optimization problem can be solved by assigning γ k = 1 for the largest p k until the budget α is exhausted where only a fractional value of γ can be assigned, and 0 for the remaining values of k. This β * ≥ α in general. In the specific case where p k → 0 for k = 2, . . . , d, when compared to p 1 (i.e. where the probability of sampling a point from D such that any of the values are exactly 0 is very small compared to the probability of sampling points with all non-zero values which would generally be the case for sampling real data), β * → α Proof. (For Theorem 3.2) By considering another point x ′ such that d p (x, x ′ ) ≤ r and equation 13 we get, dp(ϕ(xi), ϕ(x ′ i )) = dp(f (x) -f (x ⊙ z-i), f (x ′ ) -f (x ′ ⊙ z-i)) using the fact that dp(x, y) = ||x -y||p where ||.||p is the p-norm, the RHS gives us, dp(ϕi(x), ϕi(x ′ )) = ||f (x) -f (x ⊙ z-i) -f (x ′ ) + f (x ′ ⊙ z-i)||p using triangular inequality, dp(ϕi(x), ϕi(x ′ )) ≤ ||f (x) -f (x ′ )||p + ||f (x ′ ⊙ z-i) -f (x ⊙ z-i)||p w.l.o.g assuming the first term on the right is bigger than the second term dp(ϕi(x), ϕi(x ′ )) ≤ 2||f (x) -f (x ′ )||p = 2dp(f (x), f (x ′ )) using the fact that f is probabilistic Lipschitz get us, P [dp(ϕi(x), ϕi(x ′ )) ≤ 2Ldp(x, x ′ )] ≥ 1 -α to conclude the proof note that dp(ϕ(x), ϕ(x ′ )) ≤ p √ d * maxi dp(ϕi(x), ϕi(x ′ )), which gives us, P [dp(ϕ(x), ϕ(x ′ )) ≤ 2 p √ dL • dp(x, x ′ )] ≥ 1 -α Proof. (For Theorem 3.3) Given input x and another input x ′ s.t. d(x, x ′ ) ≤ r, using equation 15 we can write dp(ϕi(x), ϕi(x ′ )) = dp(E p(z|z i =1) [f (x ⊙ z)], Ep(z|zi = 1)[f (x ′ ⊙ z)]) = ||E p(z|z i =1) [f (x ⊙ z)] -Ep(z|zi = 1)[f (x ′ ⊙ z)]||p = ||E p(z|z i =1) [f (x ⊙ z) -f (x ′ ⊙ z)]||p Using Jensen's inequality on R.H.S, dp(ϕi(x), ϕi(x ′ )) ≤ E p(z|z i =1) [||f (x ⊙ z) -f (x ′ ⊙ z)||p] Using the fact that E [f ] ≤ max f , dp(ϕi(x), ϕi(x ′ )) ≤ max z ||f (x ⊙ z) -f (x ′ ⊙ z)||p = max z dp(f (x ⊙ z), f (x ′ ⊙ z)) Using the fact that f is deterministically Lipschitz with some constant L ≥ 0, and dp(x ⊙ z, x ′ ⊙ z) ≤ dp(x, x ′ ), ∀z. Then using the definition of probabilistic Lipschitz with α = 0 we get, P (max z dp(f (x ⊙ z), f (x ′ ⊙ z)) ≤ L * d(x, x ′ ) ≥ 1 Using this in equation 28 gives us, P [dp(ϕi(x), ϕi(x ′ )) ≤ L * d(x, x ′ )] ≥ 1 (30) Note that equation 30 is true for each feature i ∈ {1, ..., d}. To conclude the proof note that dp(ϕ(x), ϕ(x ′ ) ≤ p √ d * maxi dp(ϕi(x), ϕi(x ′ )). Utilizing this with equation 30 leads us to P [dp(ϕ(x), ϕ(x ′ ) ≤ p √ dL • dp(x, x ′ )] ≥ 1 (31) Since P [dp(ϕ(x), ϕ(x ′ ) ≤ p √ dL • dp(x, x ′ )] defines A λ,r for λ ≥ p √ dL, this concludes the proof.

B DATASET DETAILS

• Orange-skin: The input data is again generated from a 10-dimensional standard Gaussian distribution. The ground truth class probabilities are proportional to exp{ 4 i=1 X 2 i -4}. In this case the first 4 features are important globally for all data points. • Nonlinear-additive: Similar to Orange-skin dataset except the ground trugh class probabilities are proportional to exp{-100 sin 2X 1 + 2|X 2 | + X 3 + exp{-X 4 }}, and therefore each of the 4 important features for prediction are nonlinearly related to the prediction itself. • Switch: This simulated dataset is specifically for instancewise feature explanations. For the input data feature X 1 is generated by a mixture of Gaussian distributions centered at ±3. If X 1 is generated from the Gaussian distribution centered at +3, X 2 to X 5 are used to generate the prediction probabilities according to the Orange skin model. Otherwise X 6 to X 9 are used to generate the prediction probabilities according to the Nonlinear-additive model. • Rice (Cinar & Koklu, 2019) :This dataset consists of 3810 samples of rice grains of two different varieties (Cammeo and Osmancik). 7 morphological features are provided for each sample. • Telescope (Ferenc et al., 2005) : This dataset consists of 19000+ Monte-Carlo generated samples to simulate registration of high energy gamma particles in a ground-based atmospheric Cherenkov gamma telescope using the imaging technique. Each sample is labelled as either background or gamma signal and consists of 10 features.

C TRAINING DETAILS

Training splits and hyperparameter choices have relatively little effect on our experiments. Regardless, the details used in results shown are provided here for completeness: • Train/Test Split: For all synthetic datasets we use 10 6 training points and 10 3 test points. The neural networks classifiers were trained with a batch size of 1000 for 2 epochs. While SVM was trained with default parameters used in https://scikit-learn.org/ stable/modules/generated/sklearn.svm.SVC.html. For Telescope and Rice datasets test set sizes of 5% and 33% were used, with a batch size of 32 trained for 100 epochs. SVM was again trained with default parameters. • radius r: For all experiments we used radius equal to the median of pairwise distance. This is standard practice and also allows for a big enough r where we can sample enough points to provide empirical estimates.

D ADDITIONAL RESULTS

Table 2 shows the normalized AUC for the estimated explainer astuteness and the predicted AUC based on the predicted lower bound curve. As expected the predicted AUC lower bounds the estimated AUC. Figure 3 shows the same plots as shown in Figure ?? but includes all datasets. 1 . Given each combination of dataset, classifier and explainer we observe that the estimated explainer astuteness for SHAP, RISE and CXPLAIN is lower bounded by the astuteness predicted by our theoretical results given a value of λ. The predicted lower bound is depicted by dashed lines, while solid lines depict the actual estimate of explainer astuteness. 



SHAP: https://github.com/slundberg/shap, RISE: https://github.com/eclique/ RISE, CXPLAIN: https://github.com/d909b/cxplain



Figure2: Regularizing the Lipschitness of a neural network during training results in higher astuteness for the same value of λ. Higher regularization results in lower Lipschitz constant(Gouk et al., 2021). Astuteness reaches 1 for smaller values of λ with Lipschitz regularized training, as expected from our theorems. The errorbars represent results across 5 runs to account for randomness in explainer runs.

Figure 3: This figure experimentally shows the implication of our theoretical results. It corresponds to the AUC values shown in Table1. Given each combination of dataset, classifier and explainer we observe that the estimated explainer astuteness for SHAP, RISE and CXPLAIN is lower bounded by the astuteness predicted by our theoretical results given a value of λ. The predicted lower bound is depicted by dashed lines, while solid lines depict the actual estimate of explainer astuteness.

Figure 4: Regularizing the Lipschitness of a neural network during training results in higher astuteness for the same value of λ. Higher regularization results in lower Lipschitz constant (Gouk et al., 2021). Astuteness reaches 1 for smaller values of λ with Lipschitz regularized training, as expected from our theorems. The errorbars represent results across 5 runs to account for randomness in training.

Lipschitz constant L of the black-box model to the astuteness of various explainers including SHAP (Lundberg & Lee, 2017), RISE(Petsiuk et al., 2018), and methods that simulate individual feature removal such as CXPlain(Schwab & Karlen, 2019).

Gradient based approximation and kernel shap approximation of SHAP(Lundberg & Lee, 2017) for the NN classifiers and SVM respectively serve as representative of Theorem 3.1. Both are included in the implementation provided by the authors. We modify the implementation of RISE(Petsiuk et al., 2018) provided by the authors for image datasets to work with tabular datasets, this serves as representative for Theorem 3.3.

AUC -AUC lb (↓). The observed AUC is lower bounded by the predicted AUC. As expected, The difference between the two is always ≥ 0.

Observed AUC and (Predicted AUC). The observed AUC is lower bounded by the predicted AUC and so the observed AUC should always be higher than the predicted AUC. The AUC values are normalized between 0 and 1. Datasets SHAP RISE CXP (LB) SHAP RISE CXP (LB) SHAP RISE CXP (LB) SHAP RISE CXP (LB)

acknowledgement

We utilize three simulated datasets introduced by (Chen et al., 2018) namely Orange Skin(OS), Nonlinear Additive(NA) and Switch, and two real world datasets from UCI Machine Learning

