AXIOMATIC EXPLAINER LOCALITY WITH OPTIMAL TRANSPORT

Abstract

Explainability methods have been notoriously difficult to evaluate and compare. Because of this, practitioners are often left guessing as to which explainer they should use for their task. Locality is one critical property of explainers which grants insight into the diversity of produced explanations. In this paper, we define a set of axioms which align with natural intuition regarding globalness, the inverse of locality. We then introduce a novel measure of globalness, Wasserstein Globalness, which uses optimal transport to quantify how local or global a given explainer is. Finally, we provide theoretical results describing the sample complexity of Wasserstein Globalness, and experimentally demonstrate how globalness can be used to effectively compare explainers. These results illustrate connections between both explainer fidelity and explainer robustness.

1. INTRODUCTION

Machine Learning (ML) models are increasingly complex and capable of impressive performance in several domains. However, as models become more complex, they also become less interpretable. For this reason, researchers have begun to explore the topic of explainability, where model decisions are assigned an explanation. These explanations come in many forms, but often indicate how important each feature is towards the model prediction. Explainers need to be trustworthy for their explanations to be valuable. However, ML practitioners have very little information at their disposal when deciding which explainer is right for them. Unlike traditional ML models, whose accuracy can be computed on held-out testing datasets, there is no obvious metric by which we can compare explainers. Ground-truth explanations are rarely known, meaning we cannot directly compute the accuracy of an explainer. Some authors have tried to argue that their explainer is best by proposing a variety of pseudo-accuracy metrics. When explaining image classifiers, for example, one may follow the lead of Zhang et al. (2018) and Wang et al. (2020) , by evaluating explainers based on how well they concentrate their saliency map's energy around the object of interest. This kind of evaluation-metric is far from perfect, as it penalizes explainers from using scene context and is heavily biased towards concentrated saliency. In addition, these psuedo-accuracy metrics are only applicable for the specific task of feature-attribution for image data, when in reality, there are many other types of explanations and many other types of data. Clearly, it would be of great interest to the explainability community to be able to compare and contrast general-purpose explainers for any ML task. Globalness is a property that can be used to compare and contrast explainers. When one explanation fully explains the model's behavior, we call this a global explanation. In the past, models were typically explained globally. For example, feature selection would be done globally, meaning it would generate a single group of salient features for the entire dataset (Song et al., 2010; Yu & Liu, 2004; John et al., 1994; Dy & Brodley, 2004) . More recently, in order to explain complex black-box models, it has become common to generate instance-wise explanations rather than a single explanation for the entire model. In this case, the explainer outputs local explanations which apply to only a subset of the model inputs. Since this distinction between local explainers and global explainers emerged, researchers have begun to acknowledge locality/globalness as a property of explainers. is also related to the concept of stability and robustness. The robustness of an explainer is limited by its globalness, and we can even use an explainer's globalness to measure its local-robustness, since an explainer that is robust to small changes in the input will be near-global in a local neighborhood of the input space. This is another way in which a measure of globalness can facilitate a better understanding of our explainers, and allow us to compare/contrast their behavior. In this paper, we study the property of globalness. We provide a novel way to measure it, and present several examples of how to use it in practice. To our knowledge, this is the first work that has provided a formal measure for explainer locality/globalness. We believe that this advances the field by enabling a more thorough analysis of various explanation techniques. With our measure of globalness, one can compare and contrast explainers in a way that was previously difficult and heuristic. The contributions of the paper are as follows. First, we introduce intuitive axiomatic properties which align with human intuition surrounding the notion of locality/globalness. Second, we propose Wasserstein Globalness, a novel measure of globalness, that satisfies all these properties. We also present theoretical results regarding the sample complexity of estimating Wasserstein Globalness. Finally, through our experiments, we demonstrate how explainers can be differentiated by their globalness, and make a connection between globalness and adversarial robustness.

2. RELATED WORK

As mentioned before, the community has struggled to effectively evaluate explainers, since real-world data rarely comes with ground-truth explanations. For this reason, several authors have turned to synthetic data, where the relevance of features are known (Chen et al., 2018; 2017) . Instead of accuracy, others seek to describe explainer properties like transparency, sparsity, or robustness, in order to inform ML practitioners' choice of explainer (Zhang et al., 2021) . Because of the interaction between the data and explainer, we often want to describe properties of an explainer when applied to a specific data sample, like locality. A measure of globalness which accounts for both the data and explainer would provide a quantitative method for comparing and contrasting different explainers. Several authors have discussed the locality/globalness of explainers. For example, Doshi-Velez & Kim (2017) describe two types of explainers: local and global. They claim that global explainers are useful for scientific understanding or bias detection, and local explainers are useful for understanding specific model predictions. Zhang et al. (2021) surveys the interpretability literature and categorizes explainers along several axes. One of these axes is related to the locality of the explainer, where explainers are either "global", "local", or "semi-local". The distinctions provided by Doshi-Velez & Kim (2017); Zhang et al. (2021) are categorical rather than continuous. Though the community has begun to consider this property, there is still no formal continuous measure of locality/globalness. One common explainer, LIME, requires the user to specify a kernel width which is roughly related to locality/globalness (Ribeiro et al., 2016) . Anchors are a rule-based explanation given by a constrained optimization problem, where the objective is the "coverage" of the anchor (Ribeiro et al., 2018) . An anchor's coverage indicates how broadly applicable the rule is, thus this is also directly related to locality/globalness, but is tied directly into the objective of the explainer. Some authors, like Ribeiro et al. (2016) , offer ways to construct a single global explanation from many local explanations, thereby granting a higher-level understanding of the model. While this offers explanations at two levels of globalness, we emphasize that this is again only a binary distinction. While works like this acknowledge the property of locality and its importance to explainability, they offer no way to quantify the locality of an explainer. The need for a measure like this is becoming increasingly apparent as more researchers study the theory of explainability. For example, Li et al. ( 2020) define an object called the neighborhood disjointedness factor in order to study the generalization of finite-sample based local approximation explainers. Neighborhood disjointedness measures roughly how far apart points are from one another, and could be applied to explanations as a measure of globalness. However, this is limited to a small class of explainers, and does not apply to general explanation frameworks. We advance the existing literature by formalizing the property of explainer globalness, and proposing a method for measuring it in practice. This is a new tool that the machine learning community can

