GLOBAL COUNTERFACTUAL EXPLANATIONS ARE RELIABLE OR EFFICIENT, BUT NOT BOTH

Abstract

Counterfactual explanations have been widely studied in explainability, with a range of application dependent methods emerging in fairness, recourse and model understanding. The major shortcoming associated with these methods, however, is their inability to provide explanations beyond the local or instance-level. While many works touch upon the notion of a global explanation, typically suggesting to aggregate masses of local explanations in the hope of ascertaining global properties, few provide frameworks that are both reliable and computationally tractable. Meanwhile, practitioners are requesting more efficient and interactive explainability tools. We take this opportunity to investigate existing methods, improving the efficiency of Actionable Recourse Summaries (AReS), one of the only known global recourse frameworks, and proposing Global & Efficient Counterfactual Explanations (GLOBE-CE), a novel and flexible framework that tackles the scalability issues associated with current state-of-the-art, particularly on higher dimensional datasets and in the presence of continuous features. Furthermore, we provide a unique mathematical analysis of categorical feature translations, utilising it in our method. Experimental evaluation with real world datasets and user studies verify the speed, reliability and interpretability improvements of our framework.

1. INTRODUCTION

Counterfactual explanations (CEs) construct input perturbations that result in desired predictions from machine learning (ML) models (Verma et al., 2020; Karimi et al., 2020; Stepin et al., 2021) . A key benefit of these explanations is their ability to offer recourse to affected individuals in certain settings (e.g., automated credit decisioning). Recent years have witnessed a surge of subsequent research, identifying desirable properties of CEs (Wachter et al., 2018; Barocas et al., 2020; Venkatasubramanian & Alfano, 2020) , developing the methods to model those properties (Poyiadzi et al., 2020; Ustun et al., 2019; Mothilal et al., 2020; Pawelczyk et al., 2021) , and understanding the weaknesses and vulnerabilities of the proposed methods (Dominguez-Olmedo et al., 2021; Slack et al., 2021; Upadhyay et al., 2021; Pawelczyk et al., 2022) . Importantly, however, the research efforts thus far have largely centered around local analysis, generating explanations for individual inputs. Such analyses can vet model behaviour at the instance-level, though it is seldom obvious that any of the resulting insights would generalise globally. For example, a local CE may suggest that a model is not biased against a protected attribute (e.g., race, gender), despite net biases existing. A potential way to gain such insights is to aggregate local explanations (Lundberg et al., 2020; Pedreschi et al., 2019; Gao et al., 2021) , but since the generation of CEs is generally computationally expensive, it is not evident that such an approach would scale well or lead to reliable conclusions about a model's behaviour. Be it during training or post-hoc evaluation, global understanding ought to underpin the development of ML models prior to deployment, and reliability and efficiency play important roles therein. We seek to address this in the context of global counterfactual explanations (GCEs).

1.1. CONTRIBUTIONS: INVESTIGATIONS, IMPLEMENTATIONS & IMPROVEMENTS

Given the current lack of a precise definition, we posit in this work that a GCE should apply to multiple inputs simultaneously, while maximising accuracy across such inputs. For clarity, we distinguish counterfactuals (the altered inputs) from counterfactual explanations (the extension of counterfactuals to any of their possible representations, e.g., translation vectors, rules denoting fixed values, etc.). Investigations Section 2 summarises GCE research, introducing the recent Actionable Recourse Summaries (AReS) framework in Rawal & Lakkaraju (2020). We then discuss motivations, defining reliability and justifying our claim that current GCE methods are reliable or efficient, but not both. Our Framework Section 3 proceeds to introduce our framework Global & Efficient Counterfactual Explanations (GLOBE-CE). Though not strictly bound to recourse, our framework has the ability, as in AReS, to seek answers to the big picture questions regarding a model's recourses (Figure 1 ), namely the potential disparities between affected subgroups, i.e. Do race or gender biases exist in the recourses of a model? Can we reliably convey these in an interpretable manner? Our major contribution is a shift in paradigm; all research so far assumes GCEs to be fixed. We represent each GCE with a fixed translation vector δ, multiplied by an input-dependent, scalar variable k (Figure 1 ). To determine the direction of each translation, our framework deploys a general CE generation scheme, flexible to various desiderata. These include but are not limited to sparsity, diversity, actionability and model-specific CEs (Section 2.1). However, the novelty of our method lies mainly in a) varying k input-wise and b) proving that arbitrary translations on one-hot encodings can be expressed using If/Then rules. To the best of our knowledge, this is the first work that addresses mathematically the direct addition of translation vectors to one-hot encodings in the context of CEs. AReS Implementations Section 4 subsequently outlines our AReS and Fast AReS implementations, where in the latter we propose amendments to the algorithm and demonstrate that these lead to significant speed and performance improvements on four benchmarked financial datasets. Both implementations are thereafter utilised as baselines in the experimental evaluation of GLOBE-CE. Improvements Section 5 evaluates the efficacy of our Fast AReS and GLOBE-CE frameworks along three fundamental dimensions: accuracy (the percentage of inputs with successfully altered predictions), average cost (the difficulty associated with executing successful GCEs) and speed (the time spent computing GCEs). We argue that GCEs that fail to attain maximum accuracy or minimum cost can be misleading, raising concerns around the safety of ML models vetted by such explanations. We target these metrics, demonstrating significant speedups at concurrently higher accuracies and lower costs. User studies comprising ML practitioners additionally demonstrate the ability of the GLOBE-CE framework to more reliably detect recourse biases where previous methods fall short. 2018) is one of the earliest introductions of CEs in the context of understanding black box ML models, defining CEs as points close to the query input (w.r.t. some distance metric) that result in a desired prediction. This inspired several follow-up works proposing desirable properties of CEs and presenting approaches to generate them. Mothilal et al. (2020) argues the importance of diversity, while other approaches aim to generate plausible CEs by considering proximity to the data manifold (Poyiadzi et al., 2020; Van Looveren & Klaise, 2021; Kanamori et al., 2020) or by accounting for causal relations among input features (Mahajan et al., 2019) . Actionability of recourse is another important desideratum, suggesting certain features be excluded or limited (Ustun et al., 2019) . In another direction, some works generate CEs for specific model categories, such as tree-based (Lucic et al., 2022; Tolomei et al., 2017; Parmentier & Vidal, 2021) or differentiable (Dhurandhar et al., 2018) models. Detailed surveys on CEs naturally follow (Karimi et al., 2020; Verma et al., 2020) .



Figure 1: Left: GLOBE-CE scaled translations. We argue that, while many translation directions cannot be interpreted, we can optimise GCEs by allowing variable magnitudes per input. Right: Example comparisons with synthetic ForeignWorker subgroups. Left to right: Accuracy-cost trade-offs (covering more inputs requires larger magnitudes), minimum costs per input, and the mean translation direction for each subgroup.

LOCAL COUNTERFACTUAL EXPLANATIONS: INSTANCE-LEVEL MODEL INSIGHTS Wachter et al. (

