ON THE IMPACT OF ADVERSARIALLY ROBUST MODELS ON ALGORITHMIC RECOURSE

Abstract

The widespread deployment of machine learning models in various high-stakes settings has underscored the need for ensuring that individuals who are adversely impacted by model predictions are provided with a means for recourse. To this end, several algorithms have been proposed in recent literature to generate recourses. Recent research has also demonstrated that the recourses generated by these algorithms often correspond to adversarial examples. This key finding emphasizes the need for a deeper understanding of the impact of adversarially robust models (which are designed to guard against adversarial examples) on algorithmic recourse. In this work, we make one of the first attempts at studying the impact of adversarially robust models on algorithmic recourse. We theoretically and empirically analyze the cost (ease of implementation) and validity (probability of obtaining a positive model prediction) of the recourses output by state-of-the-art algorithms when the underlying models are adversarially robust. More specifically, we construct theoretical bounds on the differences between the cost and the validity of the recourses generated by various state-of-the-art algorithms when the underlying models are adversarially robust vs. non-robust. We also carry out extensive empirical analysis with multiple real-world datasets to not only validate our theoretical results, but also analyze the impact of varying degrees of model robustness on the cost and validity of the resulting recourses. Our theoretical and empirical analyses demonstrate that adversarially robust models significantly increase the cost and reduce the validity of the resulting recourses, thereby shedding light on the inherent trade-offs between achieving adversarial robustness in predictive models and providing easy-to-implement and reliable algorithmic recourse.

1. INTRODUCTION

As machine learning (ML) models are increasingly being deployed in high-stakes domains such as banking, healthcare, and criminal justice, it becomes critical to ensure that individuals who have been adversely impacted (e.g., loan denied) by the predictions of these models are provided with a means for recourse. To this end, several techniques have been proposed in recent literature to provide recourses to affected individuals by generating counterfactual explanations which highlight what features need to be changed and by how much in order to flip a model's prediction (Wachter et al., 2017; Dhurandhar et al., 2018; Ustun et al., 2019a; Pawelczyk et al., 2020a; Karimi et al., 2021b; a; Verma et al., 2020) foot_0 . For instance, Wachter et al. ( 2017) proposed a gradient-based approach which returns the nearest counterfactual resulting in the desired prediction. Ustun et al. (2019a) proposed an integer programming-based approach to obtain actionable recourses for linear classifiers. More recently, Karimi et al. (2021b; 2020c) leveraged the causal structure of the underlying data for generating recourses (Barocas et al., 2020; Mahajan et al., 2019; Pawelczyk et al., 2020a) . Prior research has also theoretically and empirically analyzed the properties of the recourses generated by state-of-the-art algorithms. For instance, several recent works (Rawal et al., 2021; Pawelczyk et al., 2020a; Dominguez-Olmedo et al., 2022; Upadhyay et al., 2021b) demonstrated that the recourses output by state-of-the-art algorithms are not robust to small perturbations to input instances, underlying model parameters, and to the recourses themselves. More recently, Pawelczyk et al. (2022a) demonstrated that the recourses output by state-of-the-art algorithms are very similar to adversarial examples. This finding is critical because there have been several efforts in the literature on adversarial ML (Huang et al., 2011; Kurakin et al., 2016; Biggio & Roli, 2018) to build adversarially robust models that are not susceptible to adversarial examples. However, the impact of such models on the quality and the correctness of the recourses output by state-of-the-art algorithms remains unexplored. The aforementioned connections between adversarial examples and recourses underscore the need for a deeper investigation of the impact of adversarially robust models (which are designed to guard against adversarial examples) on algorithmic recourse. Such an investigation becomes particularly critical as the need for adversarial robustness of predictive models as well as the ability to obtain easy-to-implement and reliable recourses have often been touted as the cornerstones of trustworthy and safe ML both by prior research as well as recent regulations Voigt & Von dem Bussche (2017); Hamon et al. (2020) . However, there is no prior work that investigates the relationship and/or the trade-offs between these two critical pillars of trustworthy and safe ML. In this work, we address the aforementioned gaps by making the first ever attempt at studying the impact of adversarially robust models on algorithmic recourse. We theoretically and empirically analyze the cost (ease of implementation) and validity (probability of obtaining a positive model prediction) of the recourses output by state-of-the-art algorithms when the underlying models are adversarially robust. More specifically, we construct theoretical bounds on the differences between the cost and the validity of the recourses generated by various state-of-the-art algorithms (e.g., gradient-based (Wachter et al., 2017; Laugel et al., 2017) and manifold-based (Pawelczyk et al., 2020b) methods) when the underlying models are adversarially robust vs. non-robust (See Section 4). To this end, we first derive theoretical bounds on the differences between the weights (parameters) of adversarially robust vs. non-robust models and then leverage these to bound the differences in the costs and validity of the recourses corresponding to these two sets of models. We also carried out extensive empirical analysis with multiple real-world datasets from diverse domains. This analysis not only validated our theoretical bounds, but also unearthed several interesting insights pertaining to the relationship between adversarial robustness of predictive models and algorithmic recourse. More specifically, we found that the cost differences between the recourses corresponding to adversarially robust vs. non-robust models increase as the degree of robustness of the adversarially robust models increases. We also observed that the validity of recourses worsens as the degree of robustness of the underlying models increases. We further probed these insights by visualizing the resulting recourses in low dimensions using t-SNE plots, and found that the number of valid recourses around a given instance reduces as the degree of robustness of the underlying model increases.

2. RELATED WORK

Algorithmic Recourse. Several approaches have been proposed in recent literature to provide recourses to affected individuals (Dhurandhar et al., 2018; Wachter et al., 2017; Ustun et al., 2019a; Van Looveren & Klaise, 2019; Pawelczyk et al., 2020a; Mahajan et al., 2019; Karimi et al., 2020a; c; Dandl et al., 2020) . These approaches can be broadly categorized along the following dimensions Verma et al. ( 2020): type of the underlying predictive model (e.g., tree based vs. differentiable classifier), type of access they require to the underlying predictive model (e.g., black box vs. gradient access), whether they encourage sparsity in counterfactuals (i.e., only a small number of features should be changed), whether counterfactuals should lie on the data manifold, whether the underlying causal relationships should be accounted for when generating counterfactuals, and whether the output produced by the method should be multiple diverse counterfactuals or a single counterfactual. In addition, Rawal & Lakkaraju (2020) also studied how to generate global, interpretable summaries of counterfactual explanations. Some recent works also demonstrated that the recourses output by stateof-the-art techniques might not be robust, i.e., small perturbations to the original instance (Dominguez-Olmedo et al., 2021; Slack et al., 2021) , the underlying model (Upadhyay et al., 2021a; Rawal et al., 2021) , or the recourse (Pawelczyk et al., 2022c) itself may render the previously prescribed recourse(s) invalid. These works also formulated and solved minimax optimization problems to find robust recourses to address the aforementioned challenges. Adversarial Examples and Robustness. Prior works have shown that complex machine learning models, such as deep neural networks, are vulnerable to small changes in input (Szegedy et al., 2013) . This behavior of predictive models allows for generating adversarial examples (AEs) by



The terms counterfactual explanations(Wachter et al., 2017), contrastive explanations (Karimi et al., 2020b), and recourse (Ustun et al., 2019a) have often been used interchangeably in prior literature.

