ON THE IMPACT OF ADVERSARIALLY ROBUST MODELS ON ALGORITHMIC RECOURSE

Abstract

The widespread deployment of machine learning models in various high-stakes settings has underscored the need for ensuring that individuals who are adversely impacted by model predictions are provided with a means for recourse. To this end, several algorithms have been proposed in recent literature to generate recourses. Recent research has also demonstrated that the recourses generated by these algorithms often correspond to adversarial examples. This key finding emphasizes the need for a deeper understanding of the impact of adversarially robust models (which are designed to guard against adversarial examples) on algorithmic recourse. In this work, we make one of the first attempts at studying the impact of adversarially robust models on algorithmic recourse. We theoretically and empirically analyze the cost (ease of implementation) and validity (probability of obtaining a positive model prediction) of the recourses output by state-of-the-art algorithms when the underlying models are adversarially robust. More specifically, we construct theoretical bounds on the differences between the cost and the validity of the recourses generated by various state-of-the-art algorithms when the underlying models are adversarially robust vs. non-robust. We also carry out extensive empirical analysis with multiple real-world datasets to not only validate our theoretical results, but also analyze the impact of varying degrees of model robustness on the cost and validity of the resulting recourses. Our theoretical and empirical analyses demonstrate that adversarially robust models significantly increase the cost and reduce the validity of the resulting recourses, thereby shedding light on the inherent trade-offs between achieving adversarial robustness in predictive models and providing easy-to-implement and reliable algorithmic recourse.

1. INTRODUCTION

As machine learning (ML) models are increasingly being deployed in high-stakes domains such as banking, healthcare, and criminal justice, it becomes critical to ensure that individuals who have been adversely impacted (e.g., loan denied) by the predictions of these models are provided with a means for recourse. To this end, several techniques have been proposed in recent literature to provide recourses to affected individuals by generating counterfactual explanations which highlight what features need to be changed and by how much in order to flip a model's prediction (Wachter et al., 2017; Dhurandhar et al., 2018; Ustun et al., 2019a; Pawelczyk et al., 2020a; Karimi et al., 2021b; a; Verma et al., 2020) foot_0 . For instance, Wachter et al. ( 2017) proposed a gradient-based approach which returns the nearest counterfactual resulting in the desired prediction. Ustun et al. (2019a) proposed an integer programming-based approach to obtain actionable recourses for linear classifiers. More recently, Karimi et al. (2021b; 2020c) leveraged the causal structure of the underlying data for generating recourses (Barocas et al., 2020; Mahajan et al., 2019; Pawelczyk et al., 2020a) . Prior research has also theoretically and empirically analyzed the properties of the recourses generated by state-of-the-art algorithms. For instance, several recent works (Rawal et al., 2021; Pawelczyk et al., 2020a; Dominguez-Olmedo et al., 2022; Upadhyay et al., 2021b) demonstrated that the recourses output by state-of-the-art algorithms are not robust to small perturbations to input instances, underlying model parameters, and to the recourses themselves. More recently, Pawelczyk et al.



The terms counterfactual explanations(Wachter et al., 2017), contrastive explanations (Karimi et al., 2020b), and recourse (Ustun et al., 2019a) have often been used interchangeably in prior literature.

