GLOBAL COUNTERFACTUAL EXPLANATIONS ARE RELIABLE OR EFFICIENT, BUT NOT BOTH

Abstract

Counterfactual explanations have been widely studied in explainability, with a range of application dependent methods emerging in fairness, recourse and model understanding. The major shortcoming associated with these methods, however, is their inability to provide explanations beyond the local or instance-level. While many works touch upon the notion of a global explanation, typically suggesting to aggregate masses of local explanations in the hope of ascertaining global properties, few provide frameworks that are both reliable and computationally tractable. Meanwhile, practitioners are requesting more efficient and interactive explainability tools. We take this opportunity to investigate existing methods, improving the efficiency of Actionable Recourse Summaries (AReS), one of the only known global recourse frameworks, and proposing Global & Efficient Counterfactual Explanations (GLOBE-CE), a novel and flexible framework that tackles the scalability issues associated with current state-of-the-art, particularly on higher dimensional datasets and in the presence of continuous features. Furthermore, we provide a unique mathematical analysis of categorical feature translations, utilising it in our method. Experimental evaluation with real world datasets and user studies verify the speed, reliability and interpretability improvements of our framework.

1. INTRODUCTION

Counterfactual explanations (CEs) construct input perturbations that result in desired predictions from machine learning (ML) models (Verma et al., 2020; Karimi et al., 2020; Stepin et al., 2021) . A key benefit of these explanations is their ability to offer recourse to affected individuals in certain settings (e.g., automated credit decisioning). Recent years have witnessed a surge of subsequent research, identifying desirable properties of CEs (Wachter et al., 2018; Barocas et al., 2020; Venkatasubramanian & Alfano, 2020) , developing the methods to model those properties (Poyiadzi et al., 2020; Ustun et al., 2019; Mothilal et al., 2020; Pawelczyk et al., 2021) , and understanding the weaknesses and vulnerabilities of the proposed methods (Dominguez-Olmedo et al., 2021; Slack et al., 2021; Upadhyay et al., 2021; Pawelczyk et al., 2022) . Importantly, however, the research efforts thus far have largely centered around local analysis, generating explanations for individual inputs. Such analyses can vet model behaviour at the instance-level, though it is seldom obvious that any of the resulting insights would generalise globally. For example, a local CE may suggest that a model is not biased against a protected attribute (e.g., race, gender), despite net biases existing. A potential way to gain such insights is to aggregate local explanations (Lundberg et al., 2020; Pedreschi et al., 2019; Gao et al., 2021) , but since the generation of CEs is generally computationally expensive, it is not evident that such an approach would scale well or lead to reliable conclusions about a model's behaviour. Be it during training or post-hoc evaluation, global understanding ought to underpin the development of ML models prior to deployment, and reliability and efficiency play important roles therein. We seek to address this in the context of global counterfactual explanations (GCEs).

1.1. CONTRIBUTIONS: INVESTIGATIONS, IMPLEMENTATIONS & IMPROVEMENTS

Given the current lack of a precise definition, we posit in this work that a GCE should apply to multiple inputs simultaneously, while maximising accuracy across such inputs. For clarity, we distinguish counterfactuals (the altered inputs) from counterfactual explanations (the extension of counterfactuals to any of their possible representations, e.g., translation vectors, rules denoting fixed values, etc.). Investigations Section 2 summarises GCE research, introducing the recent Actionable Recourse Summaries (AReS) framework in Rawal & Lakkaraju (2020). We then discuss motivations, defining reliability and justifying our claim that current GCE methods are reliable or efficient, but not both.

