ON THE TRADE-OFF BETWEEN ACTIONABLE EXPLANATIONS AND THE RIGHT TO BE FORGOTTEN

Abstract

As machine learning (ML) models are increasingly being deployed in high-stakes applications, policymakers have suggested tighter data protection regulations (e.g., GDPR, CCPA). One key principle is the "right to be forgotten" which gives users the right to have their data deleted. Another key principle is the right to an actionable explanation, also known as algorithmic recourse, allowing users to reverse unfavorable decisions. To date, it is unknown whether these two principles can be operationalized simultaneously. Therefore, we introduce and study the problem of recourse invalidation in the context of data deletion requests. More specifically, we theoretically and empirically analyze the behavior of popular state-of-the-art algorithms and demonstrate that the recourses generated by these algorithms are likely to be invalidated if a small number of data deletion requests (e.g., 1 or 2) warrant updates of the predictive model. For the setting of differentiable models, we suggest a framework to identify a minimal subset of critical training points which, when removed, maximize the fraction of invalidated recourses. Using our framework, we empirically show that the removal of as little as 2 data instances from the training set can invalidate up to 95 percent of all recourses output by popular state-of-the-art algorithms. Thus, our work raises fundamental questions about the compatibility of "the right to an actionable explanation" in the context of the "right to be forgotten", while also providing constructive insights on the determining factors of recourse robustness.

1. INTRODUCTION

Machine learning (ML) models make a variety of consequential decisions in domains such as finance, healthcare, and policy. To protect users, laws such as the European Union's General Data Protection Regulation (GDPR) (GDPR, 2016) or the California Consumer Privacy Act (CCPA) (OAG, 2021) constrain the usage of personal data and ML model deployments. For example, individuals who have been adversely impacted by the predictions of these models have the right to recourse (Voigt & Von dem Bussche, 2017), i.e., a constructive instruction on how to act to arrive at a more desirable outcome (e.g., change a model prediction from "loan denied" to "approved"). Several approaches in recent literature tackled the problem of providing recourses by generating instance level counterfactual explanations (Wachter et al., 2018; Ustun et al., 2019; Karimi et al., 2020; Pawelczyk et al., 2020a) . Complementarily, data protection laws provide users with greater authority over their personal data. For instance, users are granted the right to withdraw consent to the usage of their data at any time (Biega & Finck, 2021) . These regulations affect technology platforms that train their ML models on personal user data under the respective legal regime. Law scholars have argued that the continued use of ML models relying on deleted data instances could be deemed illegal (Villaronga et al., 2018) . Irrespective of the underlying mandate, data deletion has raised a number of algorithmic research questions. In particular, recent literature has focused on the efficiency of deletion (i.e., how to delete individual data points without retraining the model (Ginart et al., 2019; Golatkar et al., 2020a) ) and model accuracy aspects of data deletion (i.e., how to remove data without compromising model accuracy (Biega et al., 2020; Goldsteen et al., 2021) ). An aspect of data deletion which has not been examined before is whether and how data deletion may impact model explanation frameworks. Thus, there is a need to understand and systematically characterize the limitations of recourse algorithms when personal user data may need to be deleted from trained ML models. Indeed, deletion of certain data instances might invalidate actionable model explanations -both for the deleting user and, critically, unsuspecting other users. Such invalidations can be especially problematic in cases where users have already started to take costly actions to change their model outcomes based on previously received explanations. In this paper, we formally examine the problem of algorithmic recourse in the context of data deletion requests. We consider the setting where a small set of individuals has decided to withdraw their data and, as a consequence of the deletion request, the model needs to be updated (Ginart et al., 2019) . In particular, this work tackles the subsequent pressing question: What is the worst impact that a deleted data instance can have on the recourse validity? We approach this question by considering two distinct scenarios. The first setting considers to what extent the outdated recourses still lead to a desirable prediction (e.g., loan approval) on the updated model. For this scenario, we suggest a robustness measure called recourse outcome instability to quantify the fragility of recourse methods. Second, we consider the setting where the recourse action is being updated as a consequence of the prediction model update. In this case, we study what maximal change in recourse will be required to maintain the desirable prediction. To quantify the extent of this second problem, we suggest the notion of recourse action instability. Given these robustness measures, we derive and analyze theoretical worst-case guarantees of the maximal instability induced for linear models and neural networks in the overparameterized regime, which we study through the lens of neural tangent kernels. We furthermore define an optimization problem for empirically quantifying recourse instability under data deletion. For a given trained ML model, we identify small sets of data points that maximize the proposed instability measures when deleted. Since the resulting brute-force approach (i.e., retraining models for every possible removal set) is NP-hard, we propose two relaxations for recourse instability maximization that can be optimized using (i) end-to-end gradient descent or (ii) via a greedy approximation algorithm. To summarize, in this work we make the following key contributions: • Novel recourse robustness problem. We introduce the problem of recourse invalidation under the right to be forgotten by defining two new recourse instability measures. • Theoretical analysis. Through rigorous theoretical analysis, we identify the factors that determine the instability of recourses when users whose data is part of the training set submit deletion requests. • Tractable algorithms. Using our instability measures, we present an optimization framework to identify a small set of critical training data points which, when removed, invalidates most of the issued recourses. • Comprehensive experiments. We conduct extensive experiments on multiple real-world data sets for both regression and classification tasks with our proposed algorithms, showing that the removal of even one point from the training set can invalidate up to 95 percent of all recourses output by state-of-the-art methods Our results also have practical implications for system designers. First, our analysis and algorithms help identify parameters and model classes leading to higher stability when a trained ML model is subjected to deletion requests. Furthermore, our proposed methods can provide an informed way towards practical implementations of data minimization (Finck & Biega, 2021) , as one could argue that data points contributing to recourse instability could be minimized out. Hence, our methods could increase designer's awareness and the compliance of their trained models.

2. RELATED WORK

Algorithmic Approaches to Recourse. Several approaches in recent literature have been suggested to generate recourse for users who have been negatively impacted by model predictions (Tolomei et al., 2017; Laugel et al., 2017; Dhurandhar et al., 2018; Wachter et al., 2018; Ustun et al., 2019;  

