COUNTERNET: END-TO-END TRAINING OF PREDIC-TION AWARE COUNTERFACTUAL EXPLANATIONS

Abstract

Counterfactual (or CF) explanations are a type of local explanations for Machine Learning (ML) model predictions, which offer a contrastive case as an explanation by finding the smallest changes (in feature space) to the input data point, which will lead to a different prediction by the ML model. Existing CF explanation techniques suffer from two major limitations: (i) all of them are post-hoc methods designed for use with proprietary ML models -as a result, their procedure for generating CF explanations is uninformed by the training of the ML model, which leads to misalignment between model predictions and explanations; and (ii) most of them rely on solving separate time-intensive optimization problems to find CF explanations for each input data point (which negatively impacts their runtime). This work makes a novel departure from the prevalent post-hoc paradigm (of generating CF explanations) by presenting CounterNet, an end-to-end learning framework which integrates predictive model training and the generation of counterfactual (CF) explanations into a single pipeline. We adopt a block-wise coordinate descent procedure which helps in effectively training CounterNet's network. Our extensive experiments on multiple real-world datasets show that CounterNet generates highquality predictions, and consistently achieves 100% CF validity and low proximity scores (thereby achieving a well-balanced cost-invalidity trade-off) for any new input instance, and runs 3X faster than existing state-of-the-art baselines.

1. INTRODUCTION

A counterfactual (CF) explanation offers a contrastive case -to explain the prediction made by a Machine Learning (ML) model on data point x, CF explanation methods find a new counterfactual example x ′ , which is similar to x but gets a different (or opposite) prediction from the ML model. From an end-user perspective, CF explanation methods 1 (Wachter et al., 2017) may be more preferable (as compared to other methods of explaining ML models), as they can be used to offer recourse to vulnerable groups. For example, if a person applies for a loan and gets rejected by a bank's ML algorithm, CF explanation methods can suggest corrective measures to the loan applicant, which can be incorporated in a future loan application to improve their chances of getting an approved loan. Generating high-quality CF explanations is a challenging problem because of the need to balance the cost-invalidity trade-off (Rawal et al., 2020) between: (i) the invalidity, i.e., the probability that a CF example is invalid, or it does not achieve the desired (or opposite) prediction from the ML model; and (ii) the cost of change, i.e., the L 1 norm distance between input instance x and CF example x ′ . Figure 1 illustrates this trade-off by showing three different CF examples for an input instance x. If invalidity is ignored (and optimized only for cost of change), the generated CF example can be trivially set to x itself. Conversely, if cost of change is ignored (and optimized only for invalidity), the generated CF example can be set to x ′ 2 (or any sufficiently distanced instance with different labels). More generally, CF examples with high (low) invalidities usually imply low (high) cost of change. To optimally balance this trade-off, it is critical for CF explanation methods to have access to the



CF explanations are closely related to algorithmic recourse(Ustun et al., 2019) and contrastive explanations(Dhurandhar et al., 2018). Although these terms are proposed under different contexts, their differences from CF explanations have been blurred(Verma et al., 2020; Stepin et al., 2021), i.e. these terms are used interchangeably.

