PROBABILISTICALLY ROBUST RECOURSE: NAVIGATING THE TRADE-OFFS BETWEEN COSTS AND ROBUSTNESS IN ALGORITHMIC RECOURSE

Abstract

As machine learning models are increasingly being employed to make consequential decisions in real-world settings, it becomes critical to ensure that individuals who are adversely impacted (e.g., loan denied) by the predictions of these models are provided with a means for recourse. While several approaches have been proposed to construct recourses for affected individuals, the recourses output by these methods either achieve low costs (i.e., ease-of-implementation) or robustness to small perturbations (i.e., noisy implementations of recourses), but not both due to the inherent trade-offs between the recourse costs and robustness. Furthermore, prior approaches do not provide end users with any agency over navigating the aforementioned trade-offs. In this work, we address the above challenges by proposing the first algorithmic framework which enables users to effectively manage the recourse cost vs. robustness trade-offs. More specifically, our framework Probabilistically ROBust rEcourse (PROBE) lets users choose the probability with which a recourse could get invalidated (recourse invalidation rate) if small changes are made to the recourse i.e., the recourse is implemented somewhat noisily. To this end, we propose a novel objective function which simultaneously minimizes the gap between the achieved (resulting) and desired recourse invalidation rates, minimizes recourse costs, and also ensures that the resulting recourse achieves a positive model prediction. We develop novel theoretical results to characterize the recourse invalidation rates corresponding to any given instance w.r.t. different classes of underlying models (e.g., linear models, tree based models etc.), and leverage these results to efficiently optimize the proposed objective. Experimental evaluation with multiple real world datasets demonstrates the efficacy of the proposed framework.

1. INTRODUCTION

Machine learning (ML) models are increasingly being deployed to make a variety of consequential decisions in domains such as finance, healthcare, and policy. Consequently, there is a growing emphasis on designing tools and techniques which can provide recourse to individuals who have been adversely impacted by the predictions of these models (Voigt & Von dem Bussche, 2017) . For example, when an individual is denied a loan by a model employed by a bank, they should be informed about the reasons for this decision and what can be done to reverse it. To this end, several approaches in recent literature tackled the problem of providing recourse by generating counterfactual explanations (Wachter et al., 2018; Ustun et al., 2019; Karimi et al., 2020a) . which highlight what features need to be changed and by how much to flip a model's prediction. While the aforementioned approaches output low cost recourses that are easy to implement (i.e., the corresponding counterfactuals are close to the original instances), the resulting recourses suffer from a severe lack of robustness as demonstrated by prior works (Pawelczyk et al., 2020b; Rawal et al., 2021) . For example, the aforementioned approaches generate recourses which do not remain Figure 1 : Pictorial representation of the recourses (counterfactuals) output by various state-of-the-art recourse methods and our framework. The blue line is the decision boundary, and the shaded areas correspond to the regions of recourse invalidation. Fig. 1a shows the recourse output by approaches such as Wachter et al. (2018) where both the recourse cost as well as robustness are low. Fig. 1c shows the recourse output by approaches such as Dominguez-Olmedo et al. (2022) where both the recourse cost and robustness are high. Fig. 1b shows the recourse output by our framework PROBE in response to user input requesting an intermediate level of recourse robustness. valid (i.e., result in a positive model prediction) if/when small changes are made to them (See Figure 1a ). However, recourses are often noisily implemented in real world settings as noted by prior research (Björkegren et al., 2020) . For instance, an individual who was asked to increase their salary by $500 may get a promotion which comes with a raise of $505 or even $499.95. Prior works by Upadhyay et al. (2021) and Dominguez-Olmedo et al. (2022) proposed methods to address some of the aforementioned challenges and generate robust recourses. While the former constructed recourses that are robust to small shifts in the underlying model, the latter constructed recourses that are robust to small input perturbations. These approaches adapted the classic minimax objective functions commonly employed in adversarial robustness and robust optimization literature to the setting of algorithmic recourse, and used gradient descent style approaches to optimize these functions. In an attempt to generate recourses that are robust to either small shifts in the model or to small input perturbations, the above approaches find recourses that are farther away from the underlying model's decision boundaries (Tsipras et al., 2018; Raghunathan et al., 2019) , thereby increasing the recourse costs i.e., the distance between the counterfactuals (recourses) and the original instances. Higher cost recourses are harder to implement for end users as they are farther away from the original instance vectors (current user profiles). Putting it all together, the aforementioned approaches generate robust recourses that are often high in cost and are therefore harder to implement (See Figure 1c ), without providing end users with any say in the matter. In practice, each individual user may have a different preference for navigating the trade-offs between recourse costs and robustness -e.g., some users may be willing to tolerate additional cost to avail more robustness to noisy responses, whereas other users may not. In this work, we address the aforementioned challenges by proposing a novel algorithmic framework called Probabilistically ROBust rEcourse (PROBE) which enables end users to effectively manage the recourse cost vs. robustness trade-offs by letting users choose the probability with which a recourse could get invalidated (recourse invalidation rate) if small changes are made to the recourse i.e., the recourse is implemented somewhat noisily (See Figure 1b ). To the best of our knowledge, this work is the first to formulate and address the problem of enabling users to navigate the tradeoffs between recourse costs and robustness. Our framework can ensure that a resulting recourse is invalidated at most r% of the time when it is noisily implemented, where r is provided as input by the end user requesting recourse. To operationalize this, we propose a novel objective function which simultaneously minimizes the gap between the achieved (resulting) and desired recourse invalidation rates, minimizes recourse costs, and also ensures that the resulting recourse achieves a positive model prediction. We develop novel theoretical results to characterize the recourse invalidation rates corresponding to any given instance w.r.t. different classes of underlying models (e.g., linear models, tree based models etc.), and leverage these results to efficiently optimize the proposed objective. We also carried out extensive experimentation with multiple real-world datasets. Our empirical analysis not only validated our theoretical results, but also demonstrated the efficacy of our proposed framework. More specifically, we found that our framework PROBE generates recourses that are not only three times less costly than the recourses output by the baseline approaches (Upadhyay et al., 2021; Dominguez-Olmedo et al., 2022) , but also more robust (See Table 1 ). Further, our framework PROBE reliably identified low cost recourses at various target recourse invalidation rates r in case of both linear and non-linear classifiers (See Table 1 and Figure 4 ). On the other hand, the baseline approaches were not only ill-suited to achieve target recourse invalidation rates but also had trouble finding recourses in case of non-linear classifiers.

2. RELATED WORK

Algorithmic Approaches to Recourse. As discussed earlier, several approaches have been proposed in literature to provide recourse to individuals who have been negatively impacted by model predictions (Tolomei et al., 2017; Laugel et al., 2017; Wachter et al., 2018; Ustun et al., 2019; Van Looveren & Klaise, 2019; Pawelczyk et al., 2020a; Mahajan et al., 2019; Mothilal et al., 2020; Karimi et al., 2020a; Rawal & Lakkaraju, 2020; Karimi et al., 2020b; Dandl et al., 2020; Antorán et al., 2021; Spooner et al., 2021) . These approaches can be roughly categorized along the following dimensions (Verma et al., 2020) : type of the underlying predictive model (e.g., tree based vs. differentiable classifier), whether they encourage sparsity in counterfactuals (i.e., only a small number of features should be changed), whether counterfactuals should lie on the data manifold and whether the underlying causal relationships should be accounted for when generating counterfactuals, All these approaches generate recourses assuming that the prescribed recourses will be correctly implemented by users. Robustness of Algorithmic Recourse. Prior works have focused on determining the extent to which recourses remain robust to the choice of the underlying model (Pawelczyk et al., 2020b; Black et al., 2021; Pawelczyk et al., 2023) , shifts or changes in the underlying models (Rawal et al., 2021; Upadhyay et al., 2021) , or small perturbations to the input instances (Artelt et al., 2021; Dominguez-Olmedo et al., 2022; Slack et al., 2021) . To address these problems, these works have primarily proposed adversarial inimax objectives to minimize the worst-case loss over a plausible set of instance perturbations for linear models to generate robust recourses (Upadhyay et al., 2021; Dominguez-Olmedo et al., 2022) , which are known to generate overly costly recourse suggestions. In contrast to the aforementioned approaches our work focuses on a user-driven framework for navigating the trade-offs between recourse costs and robustness to noisy responses by suggesting a novel probabilistic recourse framework. To this end, we present several algorithms that enable us to handle both linear and non-linear models (e.g., deep neural networks, tree based models) effectively, resulting in better recourse cost/invalidation rate tradeoffs compared to both Upadhyay et al. (2021) and Dominguez-Olmedo et al. (2022) .

3. PRELIMINARIES

Here, we first discuss the generic formulation leveraged by several state-of-the-art recourse methods including Wachter et al. (2018) . We then define the notion of recourse invalidation rate formally.

3.1. ALGORITHMIC RECOURSE: GENERAL FORMULATION

Notation Let h ∶ X → Y denote a classifier which maps features x ∈ X ⊆ R d to labels Y. Let Y = {0, 1} where 0 and 1 denote an unfavorable outcome (e.g., loan denied) and a favorable outcome (e.g., loan approved), respectively. We also define h(x)=g(f (x)), where f ∶ X → R is a differentiable scoring function (e.g., logit scoring function) and g ∶ R → Y an activation function that maps logit scores to binary labels. Throughout the remainder of this work we will use g(u ) = I[u > ξ], where ξ is a decision rule in logit space. W.l.o.g. we will set ξ = 0. Counterfactual (CF) explanation methods provide recourses by identifying which attributes to change for reversing an unfavorable model prediction. Since counterfactuals that propose changes to features such as gender are not actionable, we restrict the search space to ensure that only actionable changes are allowed. Let A denote the set of actionable counterfactuals. For a given predictive model h , and a predefined cost function d c ∶ R d → R + , the problem of finding a counterfactual explanation x = x + δ for an instance x ∈ R d is expressed by the following optimization problem: x = arg min x ′ ∈A ℓ(h(x ′ ), 1)) + λ ⋅ d c (x, x ′ ), where λ ≥ 0 is a trade-off parameter, and ℓ(⋅, ⋅) is the mean-squared-error (MSE) loss. The first term on the right-hand-side ensures that the model prediction corresponding to the counterfactual i.e., h(x ′ ) is close to the favorable outcome label 1. The second term encourages low-cost recourses; for example, Wachter et al. (2018) propose ℓ 1 or ℓ 2 distances to ensure that the distance between the original instance x and the counterfactual x is small.

3.2. DEFINING THE RECOURSE INVALIDATION RATE

In order to enable end users to effectively navigate the trade-offs between recourse costs and robustness, we let them choose the probability with which a recourse could get invalidated (recourse invalidation rate) if small changes are made to it i.e., the recourse is implemented somewhat noisily. To this end, we formally define the notion of Recourse Invalidation Rate (IR) in this section. We first introduce two key terms, namely, prescribed recourses and implemented recourses. A prescribed recourse is a recourse that was provided to an end user by some recourse method (e.g., increase salary by $500). An implemented recourse corresponds to the recourse that the end user finally implemented (e.g., salary increment of $505) upon being provided with the prescribed recourse. With this basic terminology in place, we now proceed to formally define the Recourse Invalidation Rate (IR) below. Definition 1 (Recourse Invalidation Rate). For a given classifier h, the recourse invalidation rate corresponding to the counterfactual xE = x + δ E output by a recourse method E is given by: ∆(x E ) = E ε [h(x E ) CF class -h(x E + ε) class after response ], where the expectation is taken with respect to a random variable ε with probability distribution p ε which captures the noise in human responses. -0. Since the implemented recourses do not typically match the prescribed recourses xE (Björkegren et al., 2020) , we add ε to model the noise in human responses. As we primarily compute recourses for individuals x such that h(x) = 0, the label corresponding to the counterfactual is given by h(x E )=1 and therefore ∆ ∈ [0, 1]. For example, the following cases help understand our recourse invalidation rate metric better: When ∆=0, then the prescribed recourse and the recourse implemented by the user agree all the time; when ∆=0.5, the prescribed recourse and the implemented recourse agree half of the time, and finally, when ∆=1 then the prescribed recourse and the recourse implemented by the user never agree. To illustrate our ideas, we will use our IR measure with a Gaussian probability distribution (i.e., ε ∼ N (0, σ 2 I)) to model the noise in human responses.

4. OUR FRAMEWORK: PROBABILISTICALLY ROBUST RECOURSE

Below we present our objective function, which is followed by a discussion on how to operationalize it efficiently.

4.1. RECOURSE INVALIDATION RATE AWARE OBJECTIVE

The core idea is to find a recourse x whose prediction at any point y within some set around x belongs to the positive class with probability 1 -r. Hence, our goal is to devise an algorithm that reliably guides the recourse search towards regions of low invalidation probability while maintaining low cost recourse (see Fig. 2 for a practical example). For a fixed model, our objective reads: L = λ 1 R(x ′ ; σ 2 I) + λ 2 ℓ(f (x ′ ), s)) + λ 3 d c (x ′ , x), ( ) where s is the target score for the input x, R(x ′ ; r, σ 2 I) = max(0, ∆(x ′ ; σ 2 I) -r) with r being the target IR, ∆(x ′ ; σ 2 I) is the recourse invalidation rate from equation 1, λ 1 to λ 3 are the balance parameters, and d c quantifies the distance between the input and the prescribed recourse. To arrive at a output probability of 0.5, the target score for f (x) for a sigmoid function is s = 0, where the score corresponds to a 0.5 probability for y = 1. The new component R is a Hinge loss encouraging that the prescribed recourse has a low probability of invalidation, and the parameter σ 2 is the uncertainty magnitude and controls the size of the neighbourhood in which the recourse has to be robust. The middle term encourages the score at the prescribed recourse f (x) to be close to the target score s, while the last term promotes the distance between the input x and the recourse x to be small. In practice, the choice of r depends on the risk-aversion of the end-user. If the end-user is not confident about achieving a 'precision landing', then a rather low invalidation target should be chosen (i.e., r < 0.5).

4.2. OPTIMIZING THE RECOURSE INVALIDATION RATE AWARE OBJECTIVE

Algorithm 1 PROBE Input: x s.t. f (x) < 0, f , σ 2 , λ > 0, α, r > 0 Init.: x ′ = x; Compute ∆(x ′ ) ▷ from Theorem 1 while ∆(x ′ ) > r and f (x ′ ) < 0 do ∆ = ClosedFormIR(f, σ 2 , x ′ ) ▷ from Theorem 1 x ′ = x ′ -α ⋅ ∇ x ′ L(x ′ ; σ 2 , r, λ) ▷ Opt. equation 3 end while Return: x = x ′ In order for the objective in equation 3 to guide us reliably towards recourses with low target invalidation rate r, we need to approximate the invalidation rate ∆(x ′ ) at any x ′ ∈ R d . However, such an approximation becomes non-trivial since the recourse invalidation rate, which depends on the classifier h, is generally non-differentiable since the classifier h(x) = I(f (x) > ξ) as defined in Section 3 involves an indicator function acting on the score f . To circumvent this issue, we derive a closed-form expression for the IR using a local approximation of the predictive model f . The procedure suggested here remains generalizable even for non-linear models since the local behavior of a given non-linear model can often be well approximated by fitting a locally linear model (Ribeiro et al., 2016; Ustun et al., 2019) . Theorem 1 (Closed-Form Recourse Invalidation Rate). A first-order approximation ∆ to the recourse invalidation rate ∆ in equation 2 under Gaussian distributed noise in human responses ε ∼ N (0, σI) is given by: ∆(x E ; σ 2 I) = 1 -Φ( f (x E ) √ ∇f (x E ) ⊺ σ 2 I∇f (x E ) ), where Φ is the CDF of the univariate standard normal distribution N (0, 1), f (x E ) denotes the logit score at xE which is the recourse output by a recourse method E, and h(x E ) ∈ {0, 1}. All theoretical proofs along with the proof to the above proposition can be found in Appendix D. In Algorithm 1, we show pseudo-code of our optimization procedure. Using gradient descent we update the recourse repeatedly until the class label flips from 0 to 1 and the IR ∆ is smaller than the targeted invalidation rate r. In essence, the result in Theorem 1 serves as our regularizer since it steers recourses towards low-invalidation regions. For example, when f (x E ) = 0, then ∆ = 0.5 since Φ(0) = 1 2 . This means that the prescribed recourse and the recourse implemented by the user agree 50% of the time. On the other hand, when f (x E )→ + ∞, then ∆→0 since Φ→1, which means that the prescribed recourse and the recourse implemented by the user always agree. Figure 3 demonstrates how PROBE finds recourses relative to a standard low-cost algorithm (Wachter et al., 2018) . We now leverage the recourse invalidation rate derived in Theorem 1 to show how the recourses output by Wachter et al. (2018) can be made more robust. Pawelczyk et al. (2022) provide a closed-form solution for the recourse output by Wachter et al. (2018) w.r.t. the special case of a logistic regression classifier when d c = ∥x -x ′ ∥ 2 and the MSE-loss is used. This solution takes the following form: xWachter (s) = x + s-f (x) ∥∇f (x)∥ 2 2 ∇f (x) , where s is the target logit score. More specifically, to arrive at the desired class with probability of 0.5, the target score for a sigmoid function is s = 0, where the logit corresponds to a 0.5 probability for y = 1. The next statement quantifies the IR of recourses output by Wachter et al. (2018) . Proposition 1 (Exact Recourse IR). For logistic regression, consider the recourse output by Wachter et al. (2018)  : xWachter (s) = x + s-f (x) ∥∇f (x)∥ 2 2 ∇f (x). Then the recourse invalidation rate is given by: ∆(x Wachter (s); σ 2 I) = 1 -Φ( s σ∥∇f (x)∥ 2 ), ( ) where s is the target logit score. A recourse generated by Wachter et al. (2018) such that f (x Wachter ) = s = 0 will result in ∆ = 0.5. To obtain recourse that is more robust to noisy responses from users, i.e., ∆ → 0, the decision maker can choose a higher logit target score of s ′ > s ≥ 0 since this decreases the recourse invalidation rate, i.e., ∆(x Wachter (s)) > ∆(x Wachter (s ′ )). The next statement makes precise how s should be chosen to achieve a desired robustness level. Corollary 1. Under the conditions of Proposition 1, choosing s r = σ∥∇f (x)∥ 2 Φ -1 (1-r) guarantees a recourse invalidation rate of r, i.e., ∆(x Wachter (s r ); σ 2 I) = r. On extensions to general noise distributions, and tree-based classifiers. In Appendix A we present extensions of our framework to obtain (i) reliable recourses for general noise distributions and (ii) tree-based classifiers. These two cases pose non-trivial difficulties as the recourse invalidation rate is generally non-differentiable. As for the more general noise distributions, we develop a Monte-Carlo approach in appendix A.1, which relies on a differentiable approximation of the indicator function required to obtain a Monte-Carlo estimate of the invalidation rate. For tree-based classifiers, we develop a closed-form solution for the recourse invalidation rate (see Theorem 2). In this section, we leverage the recourse invalidation rate expression derived in the previous section to theoretically show i) that an additional cost has to be incurred to generate robust recourses in the face of noisy human responses, and ii) we derive a general upper bound on the IR which is applicable to any valid recourse provided by any method with the underlying classifier being a differentiable model.

4.3. ADDITIONAL THEORETICAL RESULTS

Next, we show that there exists a tradeoff between robustness to noisy human responses and cost. To this end, we fix the target invalidation rate r, and ask what costs are needed to achieve a fixed level r: Proposition 2 (General Cost of Recourse). For a linear classifier, let r ∈ (0, 1) and let xE = x + δ E be the output produced by some recourse method E such that h(x E ) = 1. Then the cost required to achieve a fixed invalidation target r is: ∥δ E ∥ 2 = σ ω (Φ -1 (1 -r) -c), where c = f (x) σ⋅∥∇f (x)∥2 is a constant, and ω > 0 is the cosine of the angle between ∇f (x) and δ E . From Proposition 2, we see that the target invalidation rate r decreases as the recourse cost increases for a given uncertainty magnitude σ 2 . To make this more precise the next statement demonstrates the cost-robustness tradeoff. Proposition 3 (Cost-Robustness Tradeoff). Under the same conditions as in Proposition 2, we have ∂∥δ E ∥2 ∂(1-r) = σ ω 1 ϕ(Φ -1 (1-r)) > 0, i.e. , an infinitesimal increase in robustness (i.e.,1 -r) increases the cost of recourse by σ ω 1 ϕ(Φ -1 (1-r)) . Now, we derive a general upper bound on the recourse invalidation rate. This bound is applicable to any method E that provides recourses resulting in a positive outcome. Proposition 4 (Upper Bound). Let xE be the output produced by some recourse method E such that h(x E ) = 1. Then, an upper bound on ∆ from equation 4 is given by: ∆(x E ; σ 2 I) ≤ 1 -Φ(c + ω σ ∥∇f (x)∥ 2 ∥∇f (x E )∥ 2 ∥δ E ∥ 1 √ ∥δ E ∥ 0 ), where c = f (x) σ⋅∥∇f (x)∥2 , δ E = xE -x, and ω > 0 is the cosine of the angle between ∇f (x) and δ E . The right term in the inequality entails that the upper bound depends on the ratio of the ℓ 1 and ℓ 0 -norms of the recourse action δ E provided by recourse method E. The higher the ℓ 1 /ℓ 0 ratio of the recourse actions, the tighter the bound. The bound is tight when ∥δ E ∥ 0 assumes minimum value i.e., ∥δ E ∥ 0 = 1 since at least one feature needs to be changed to flip the model prediction.

5. EXPERIMENTAL EVALUATION

We now present our empirical analysis. First, we validate our theoretical results on the recourse invalidation rates across various recourse methods. Second, we study the effectiveness of PROBE at finding robust recourses in the presence of noisy human responses. Real-World Data and Noisy Responses. Regarding real-world data, we use the same data sets as provided in the recourse and counterfactual explanation library CARLA (Pawelczyk et al., 2021) . The Adult data set Dua & Graff (2017) originates from the 1994 Census database, consisting of 14 attributes and 48,842 instances. The class label indicates whether an individual has an income greater than 50,000 USD/year. The Give Me Some Credit (GMC) data set Kaggle-Competition ( 2011) is a credit scoring data set, consisting of 150,000 observations and 11 features. The class label indicates if the corresponding individual will experience financial distress within the next two years (SeriousDlqin2yrs is 1) or not. The COMPAS data set Angwin et al. (2016) contains data for more than 10,000 criminal defendants in Florida. It is used by the jurisdiction to score defendant's likelihood of re-offending. The class label indicates if the corresponding defendant is high or low risk for recidivism. All the data sets were normalized so that x ∈ [0, 1] d . Across all experiments, we add noise ε to the prescribed recourse xE , where ε ∼ N (0, σ 2 ⋅ I) and σ 2 = 0.01. Methods. We compare the recourses generated by PROBE to four different baseline methods which aim to generate low-cost recourses using fundamentally different principles: AR (-LIME) uses an integer-programming-based objective Ustun et al. (2019) , Wachter uses a gradient-based objective (Wachter et al., 2018) , DICE uses a diversity-based objectve (Mothilal et al., 2020) , and GS is based on a random search algorithm (Laugel et al., 2017) . Further, we compare with methods that use adversarial minmax objectives to generate robust recourse (Dominguez-Olmedo et al., 2022; Upadhyay et al., 2021) . We used the recourse implementations from CARLA (Pawelczyk et al., 2021) . Following Upadhyay et al. (2021) , all methods search for counterfactuals over the same set of balance parameters λ ∈ {0, 0.25, 0.5, 0.75, 1} when applicable. Prediction Models. For all data sets, we trained both ReLU-based NN models with 50 hidden layers (App. B) and a logistic regerssion (LR). All recourses were generated with respect to these classifiers. Measures. We consider three measures in our evaluation: 1) We measure the average cost (AC) required to act upon the prescribed recourses where the average is taken with respect to all instances in the test set for which a given method provides recourse. Since all our algorithms are optimizing for the ℓ 1 -norm we use this as our cost measure. 2) We use recourse accuracy (RA) defined as the fraction of instances in the test set for which acting upon the prescribed recourse results in the desired prediction. 3) We compute the average IR across every instance in the test set. To do that, we sample 10,000 points from ε ∼ N (0, σ 2 I) for every instance and compute IR in equation 2. Then the average IR quantifies recourse robustness where the individual IRs are averaged over all instances from the test set for which a given method provides recourse.

5.1. VALIDATING OUR THEORETICAL BOUNDS

Computing Bounds. We empirically validate the theoretical upper bounds derived in Section 4.3. To do that, we first estimate the bounds for each instance in the test set according to Proposition 4, and compare them with the empirical estimates of the IR. The empirical IR, in turn, we obtain from Monte-Carlo estimates of the IR in equation 2; we used 10,000 samples to get a stable estimate of IR. Results. In Figure 5 , we validate the bounds obtained in Proposition 4 for the GMC data sets. We relegated results for the Compas and Adult data set and other values of σ 2 to Appendix C. Note that the trivial upper bound is 1 since ∆ ≤ 1, and we see that our bounds usually lie well below this value, which suggests that our bounds are meaningful. We observe that these upper bounds are quite tight, thus providing accurate estimates of the worst case recourse invalidation rates. It is noteworthy that GS tends to provide looser bounds, since its recourses tend to have lower ℓ 1 /ℓ 0 ratios; for GS, its random search procedure increases the ℓ 0 -norms of the recourse relative to the recourses output by other recourse methods. This contributes to a looser bound saying that the randomly sampled recourses by GS tend to provide looser worst-case IR estimates relative to all the other methods, which do use gradient information (e.g., Wachter , AR and PROBE).

5.2. EVALUATING THE PROBE FRAMEWORK

Results. Here, we evaluate the robustness, costs and recourse accuracy of the recourses generated by our framework PROBE relative to the baselines. We consider a recourse robust if the recourse remains valid (i.e., results in positive outcome) even after small changes are made to it (i.e., humans implement it in a noisy manner). Table 1b ). We also consider if the robustness achieved by our framework is coming at an additional cost i.e., by sacrificing recourse accuracy (RA) or by increasing the average recourse cost (AC). We compute AC of the recourses output by all the algorithms and find that PROBE usually has the highest or second highest recourse costs, while the RA is at 100% across classifiers and data sets. Finally, we provide a more detailed comparison between PROBE and the adversarially robust recourse methods ARAR and ROAR. To do so, we plot pareto frontiers in Figure 4 which demonstrate the inherent tradeoffs between the average cost of recourse and the average recourse invalidation rate computed over all recousre seeking individuals for different uncertainty magnitudes σ 2 , ϵ ∈ {0.005, 0.01, 0.15}. For ARAR and ROAR we expect to see AIRs close to 0 (by construction). However, this is only the case for the linear classifiers. Moreover, ROAR provide recourses with up to 3 times higher cost relative to our method PROBE. Note also that ARAR and ROAR have trouble finding recourses for non-linear classifiers, resulting in RA scores of around 5% in the worst case, while not being able to maintain low invalidation scores. This is likely due to the local linear approximation used by these methods. In summary, PROBE finds recourses for 100% of the test instances in line with the promise of having an invalidation probability of at most r, while being less costly than ROAR and ARAR. Relegated results. The relegated experiments in Appendix C (i) demonstrate that baseline recourse methods are not robust to noisy human responses (Figures 8 9 ), (ii) verify that the targeted invalidation rates match the empirical recourse invalidation rates (Figures 13 14 15 ) and (iii) demonstrate the trade-off between recourse costs and robustness verifying Corollary 3 (Figures 16 17 ).

6. CONCLUSION

In this work, we proposed a novel algorithmic framework called Probabilistically ROBust rEcourse (PROBE) which enables end users to effectively manage the recourse cost vs. robustness trade-offs by letting users choose the probability with which a recourse could get invalidated (recourse invalidation rate) if small changes are made to the recourse i.e., the recourse is implemented somewhat noisily. To the best of our knowledge, this work is the first to formulate and address the problem of enabling users to navigate the trade-offs between recourse costs and robustness. Our framework can ensure that a resulting recourse is invalidated at most r% of the time when it is noisily implemented, where r is provided as input by the end user requesting recourse. To operationalize this, we proposed a novel objective function which simultaneously minimizes the gap between the achieved (resulting) and desired recourse invalidation rates, minimizes recourse costs, and also ensures that the resulting recourse achieves a positive model prediction. We developed novel theoretical results to characterize the recourse invalidation rates corresponding to any given instance w.r.t. different classes of underlying models (e.g., linear models, tree based models etc.), and leveraged these results to efficiently optimize the proposed objective. Experimental evaluation with multiple real world datasets not only demonstrated the efficacy of the proposed framework, but also validated our theoretical findings. Our work also paves the way for several interesting future research directions in the field of algorithmic recourse. For instance, it would be interesting to build on this work to develop approaches which can generate recourses that are simultaneously robust to noisy human responses, noise in the inputs, as well as shifts in the underlying models.

A EXTENSIONS TO OTHER NOISE DISTRIBUTIONS AND TREE BASED CLASSIFIERS

A.1 EXTENSIONS TO GENERAL NOISE DISTRIBUTIONS A.1.1 A MONTE-CARLO APPROACH FOR GENERAL NOISE DISTRIBUTIONS Algorithm 2 PROBE-MC Input: x s.t. f (x) < 0, f , σ 2 , λ > 0, t, α, r > 0 Init.: x ′ = x; Compute ∆MC (x ′ ) ▷ from equation 11 while ∆MC (x ′ ) > r and f (x ′ ) < 0 do Compute ∆MC (x ′ ) ▷ from equation 11 x ′ = x ′ -α ⋅ ∇ x ′ L(x ′ ; σ 2 , r, λ) ▷ Opt. equation 3 end while Return: x = x ′ In section 4 we have introduced our PROBE framework, which enables us to guide the search for counterfactual explanations towards regions with a targeted low invalidation rate. Recall that the optimization procedure in Section 4 relied on a first-order approximation to the recourse invalidation rate under Gaussian distributed noisy human responses. In this section, we develop an algorithm that is agnostic to the specifics of the parameterized noise distribution. To this end, we suggest a Monte Carlo estimator of the recourse IR from Def. 1, i.e., ∆MC = 1 K K ∑ k=1 (1 -h(x ′ + ε k )). We highlight that the estimator ∆MC allows for a flexible specification of various noise distributions, and thus does not depend on specific distributional assumptions of ε. The following result suggests that we can estimate the true IR ∆(x ′ ) to desired precision using the Monte-Carlo estimator ∆MC (x ′ ). Proposition 5. The mean-squared-error (MSE) between the true IR ∆(x ′ ) and the empirical Monte-Carlo estimate ∆MC (x ′ ) is upper bounded such that: E ε [(∆(x ′ ) -∆MC (x ′ )) 2 ] ≤ 1 4K . ( ) Since it is up to us to choose K, we can make the MSE arbitrarily small and reliably estimate the true invalidation rate ∆(x ′ ). A problem with the estimator ∆MC is that it is not amenable to automatic differentiation required for our gradient based algorithm to operate. This is due to the discontinuity at the threshold ξ introduced by the indicator function which, in turn, is applied to the logit score when computing the recourse invalidation rate (i.e., h(x) = I(f (x) > ξ) and see Definition 1). To mitigate this issue, we suggest to use a sigmoid function with appropriate temperature t to approximate the indicator at the threshold ξ:

A.1.2 A DIFFERENTIABLE APPROXIMATION TO ∆MC

S((x -ξ) ⋅ t) = 1 1 + exp ( -(x -ξ) ⋅ t) . Therefore, as t → ∞ the sigmoid S converges to the indicator function I(x > ξ). We illustrate this behaviour in Figure 6 for different temperature levels t ∈ {1, 2, 10, 25, 100} when the threshold is ξ = 0. Using the differentiable approximation to the indicator function, we are now ready to state a differentiable estimator for the recourse invalidation rate, which we can use to guide our gradient descent procedure to low recourse invalidation regions: ∆MC (x ′ ; 0, t) = 1 K K ∑ k=1 (1 -S(t ⋅ f (x ′ + ε k ))).

A.2 EXTENSIONS TO TREE BASED CLASSIFIERS

The recourse literature commonly considers consequential decision problems which heavily rely on the usage of tabular data. For this data modality, ensembles of decision trees such as Random Forest (RF) (Breiman, 2001) or Gradient Boosted Boosted Decision Trees (GBDT) (Friedman, 2001) are considered among the state-of-the-art models (Borisov et al., 2021) . As a consequence, some recourse methods were developed to find recourses for tree ensembles (Tolomei et al., 2017; Lucic et al., 2022) where the non-differentiability prevents a direct application of the recourse objective in equation 1. To extend our method to tree-based classifiers, we also derive an IR expression for tree ensembles, and develop a method which computes low IR recourses for these models. Tree Ensemble Classifiers An object of interest is the predicted output of a decision tree: T (x) = ∑ R∈R T c T (R) ⋅ I(x ∈ R), where c T (R) ∈ {0, 1} is the constant prediction assigned in region R ∈ R T for tree T . Moreover, a decision forest is formed by a set of M T decision trees, and forms the probabilistic output: f Forest (x) = 1 M T M T ∑ m=1 T m (x). The predicted class of an input x is formed via a vote by the trees where each tree assigns a probability estimate to the input. That is, the predicted class is the one with highest mean probability estimate across the trees. After the trees are combined, the multiple models form a single model again (Domingos, 1997) . Thus, the corresponding predicted class of equation 13 is given by: F(x) = ∑ R∈R F c F (R) ⋅ I(x ∈ R), where c F (R) ∈ {0, 1} is the constant prediction assigned in region R ∈ R F for the ensemble of trees F. Furthermore, note that for each ensemble, there is an active subset of ensemble-specific features S F ⊆ {1, . . . , d} on which axis-aligned splits took place. Finally, we note that this formulation is quite general as it subsumes a large class of popular tree-based models such as Random Forests (RF) and Gradient Boosted Decision Trees (GBDT).

A.3 THE RECOURSE IR FOR TREE ENSEMBLE CLASSIFIERS

Theorem 2 (IR for Tree-Ensemble Classifiers). Consider the decision forest classifier in equation 14. The recourse invalidation rate under Gaussian distributed response inconsistencies ε ∼ N (0, σ 2 I) is given by: ∆(x E ; Σ) = 1 -∑ R∈R F c F (R) ∏ j∈S F d j,R (x E,j ), where d j,R (x E,j ) = [Φ( tj,R -xE,j σ j ) -Φ( t j,R -xE,j σ j )], and where Φ is the Gaussian CDF, tj,R and t j,R are the upper and lower points corresponding to feature j ∈ S F that define the hypercube formed by region R. Proof Sketch. The proof uses the insight that a decision forest based on trees with axis-aligned splits partions the input space into hypercubes where the prediction is either 0 or 1. It then remains to evaluate Gaussian integrals subject to the constrains set by the hypercubes. The full proof is given in Appendix D.3. Our proof of Theorem 2 assumed that the split points tj,R and t j,R , corresponding to the tree-ensemble, are readily available. However, the hypercubes formed by the tree-ensemble, for which the prediction is constant, is a function of all individual trees, and of how they are combined. Thus, the clear-cut division into hypecrubes present in each of the trees got lost in the process of model averaging. Model Distillation to Evaluate IR We suggest a solution to this problem by using a technique called model distillation (Domingos, 1997; Bucilua et al., 2006; Hinton et al., 2015; Phuong & Lampert, 2019) . In a nutshell: We wish to change the form of the model (to a simpler decision tree) while keeping the same knowledge (from our tree ensemble) (Hinton et al., 2015) . Thus, the goal of this technique is to distil the knowledge of a larger model (possibly an ensemble) into a single, small (and interpretable) model. In our case, the ensemble is formed by decision trees, and the target model is a decision tree as well. Second, the method is simple to operationalize: let h be your complex model, and g denotes the simple model. Then we use our data {x i , y i } n i=1 to train and validate the model h. The target model, however, is trained on samples from {x i , h(x i )} n i=1 to mimic the behaviour of the complex model. We refer to panels 1 to 3 in Figure 7 to gain some intuition on how this technique works on a non-linear 2-dimensional data set.

B EXPERIMENTAL DETAILS

In this section, we describe the hyperparameter choices and how the classification models were fitted. We have used CARLA's built-in functionality to fit classifiers using PyTorch (Paszke et al., 2019) and treat all variables as continuous. We set λ 1 = 2, λ 2 = 1 and search over λ 3 in the usual way (Wachter et al., 2018) 8 and 9 that recourses generated by state-of-the-art approaches are, on average, invalidated up to 50% of the time when small changes are made to them. It is worth highlighting that the maximum invalidation scores can become as high as 61%, which motivates the need for a recourse method that rightly controls the invalidation rate. LR RA (↑) 0.98 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 AIR (↓) 0.5 ± 0.01 0.48 ± 0.01 0.4 ± 0.08 0.28 ± 0.02 0.49 ± 0.03 0.48 ± 0.02 0.36 ± 0.14 0.31 ± 0.01 0.48 ± 0.04 0.47 ± 0.02 0.49 ± 0.02 0.3 ± 0.01 AC (↓) 0.55 ± 0.4 0.62 ± 0.43 2.06 ± 1.03 2.21 ± 3.17 0.16 ± 0.17 0.22 ± 0.17 0.73 ± 0.45 0.68 ± 0.28 0.29 ± 0.27 0.49 ± 0.51 0.28 ± 0.32 1.22 ± 2.29 NN RA(↑) 0.38 1.0 1.0 1.0 0.84 1.0 1.0 1.0 0.4 1.0 1.0 1.0 AIR (↓) 0.51 ± 0.02 0.51 ± 0.01 0.5 ± 0.02 0.33 ± 0.01 0.39 ± 0.06 0.46 ± 0.02 0.41 ± 0.07 0.25 ± 0.02 0.37 ± 0.05 0.42 ± 0.03 0.44 ± 0.02 0.34 ± 0.02 AC (↓) 1.05 ± 0.22 0.3 ± 0.19 3.11 ± 1.62 1.98 ± 2.35 1.15 ± 0.52 0.2 ± 0.16 1.0 ± 0.17 0.84 ± 0.34 0.2 ± 0.16 0.26 ± 0.18 0.11 ± 0.09 0.41 ± 0.23 Figure 13 : Verifying that the invalidation rate for our framework PROBE (blue line) is at most equal to the invalidation target r on the Adult data set for different σ 2 ∈ {0.01, 0.025} across both classifiers. We compute the mean IR across every instance in the test set. To do that, we sample 10,000 points from ε ∼ N (0, σ 2 I) for every instance and compute IR in equation 2. Then the mean IR quantifies recourse robustness where the individual IRs are averaged over all instances from the test set. The shaded regions indicate the corresponding standard deviations.

C.4 DEMONSTRATING THE COST-ROBUSTNESS TRADEOFF

In Figures 16 and 17 we demonstrate that there exists a tradeoff between recourse costs and the robustness of recourse to noisy response.

C.5 DETAILED COMPARISON WITH ROAR AND ARAR

In this section we compare our method with two approaches that aim at generating robust algorithmic recourse in different settings. We further report results by DICE, which does not generate robust recourse. Thus, we PROBE the cost performance (i.e., AC) by DICE to serve as a lower bound, while its robustness performance would serve as an upper bound (i.e., AIR). Regarding the methods that suggest robust recourse we refer to Upadhyay et al. (2021) We compute the mean IR across every instance in the test set. To do that, we sample 10,000 points from ε ∼ N (0, σ 2 I) for every instance and compute IR in equation 2. Then the mean IR quantifies recourse robustness where the individual IRs are averaged over all instances from the test set. The shaded regions indicate the corresponding standard deviations. inputs (ARAR). Moreover, on a high-level, these objectives differ from our approach since the epsilon neighborhoods that PROBE constructs are probabilistic. Discussion. The AIR for PROBE should be at most 0.35, in line with our results. For ARAR and ROAR, we should expect AIRs close to 0, which is only the case for the linear classifiers. Additionally, ARAR and ROAR provide recourses with up to 10 times higher cost relative to our method PROBE. Note also that ARAR and ROAR have trouble finding recourses for non-linear classifiers, resulting in RA scores of around 5% in the worst case, while not being able to maintain low invalidation scores. This is likely due to the local linear approximation that needs to be used by these methods. For ARAR, only up to 5 percent of all recourse are found (i.e., it only finds recourse with low cost to the decision boundary), and for those identified recourses the average invalidation rate is close to a random coin flip. In summary, PROBE finds recourses for 100% of the test instances in line with the promise of having an invalidation probability of at most 0.35, while being substantially less costly than ROAR. 2021), all recourse methods search for the optimal counterfactuals over the same set of balance parameters λ ∈ {0, 0.25, 0.5, 0.75, 1}.

D PROOFS

D.1 PROOF OF PROPOSITION 5 Proposition 5. The mean-squared-error (MSE) between the true IR ∆(x ′ ) and the empirical Monte-Carlo estimate ∆MC (x ′ ) is upper bounded such that: E ε [(∆(x ′ ) -∆MC (x ′ )) 2 ] ≤ 1 4K . Proof. First, recall that the empirical Monte-Carlo estimator is given by: ∆MC = 1 K K ∑ k=1 (1 -h(x ′ + ε k )). (18) Next, note E ε [1 -h(x ′ + ε)] = ∆(x ′ ). Further, the mean-squared error between the nominal invalidation rate ∆(x ′ ) and the Monte-Carlo estimate ∆MC is given by: E ε [(∆(x ′ ) -∆MC ) 2 ] = V ε ( ∆MC ) + E ε [ ∆MC -∆(x ′ )] 2 , which gives the bias-variance decomposition. We first compute the squared bias term: E ε [ ∆MC -∆(x ′ )] 2 = [ 1 K ⋅ K ⋅ E ε [1 -h(x + ε)] -∆(x ′ )] 2 (20) = 0, where we have used that the εs are identically distributed. We now turn to the variance term for which we find the following expression: V ε ( ∆MC ) = 1 K 2 ⋅ K ⋅ V ε [1 -h(x + ε)] = 1 K ⋅ V ε [h(x + ε)]. It remains to identify an upper bound for V ε (h(x + ε)). Since h(x + ε) is binary, a simple upper bound is given by: V ε (h(x + ε)) ≤ 1 4 . Combining the expression for the squared bias and the upper bound on the variance yields the desired result. D.2 PROOFS OF THEOREM 1, PROPOSITIONS 1 -4 AND COROLLARY 1 Theorem 1. A first-order approximation ∆ to the recourse invalidation rate ∆ in equation 2 under a Gaussian distribution ε ∼ N (0, Σ) capturing the noise in human responses is given by: ∆(x E ; Σ) = 1 -Φ( f (x E ) √ ∇f (x E ) ⊺ Σ∇f (x E ) ), ( ) where Φ is the CDF of the univariate standard normal distribution N (0, 1), f (x E ) denotes the logit score at xE which is the recourse output by a recourse method E, and h(x E ) ∈ {0, 1}. Proof. Let the random variable ε follow a multivariate normal distribution, i.e., ε ∼ N (µ, Σ). The following result is a well-known fact: v ⊺ ϵ ∼ N (v ⊺ µ, vΣv T ) where v ∈ R d . Let x denote the input sample for which we wish to find a counterfactual xE = x + δ E . Recall from Definition 1 that we have to evaluate: ∆ = E ε [h(x E ) CE class -h(x E + ε) class after response ] = 1 -E ε [h(x E + ε)], where we have used that the first term is a constant and evaluates to 1 by the definition of a counterfactual explanation. It remains to evaluate the expectation: E ε [h(x E + ε)]. Next, we note that equation 25 can equivalently be expressed in terms of the logit outcomes: ∆ = E ε [I[f (x E ) > 0] CE class -I[f (x E + ε) > 0] class after perturbation ] = (1 -E ε [I[f (x E + ε) > 0]]). Again, we are interested in the second term, which evaluates to: E ε [I[f (x E + ε) > 0]] = 0 ⋅ P(f (x E + ε) < 0) + 1 ⋅ P(f (x E + ε) > 0). Next, consider the first-order Taylor approximation: f (x E + ε) ≈ f (x E ) + ∇f (x E ) ⊺ ε. Hence, we know ∇f (x E ) ⊺ ε approximately follows N (0, ∇f (x E )Σ∇f (x E ) ⊺ ). Now, the second term can be computed as follows: P(f (x E + ε) > 0) ≈ P(f (x E ) > -∇f (x E ) ⊺ ε) = P( -f (x E ) < ∇f (x E ) ⊺ ε) (28) = 1 -P( -f (x E ) > ∇f (x E ) ⊺ ε) (29) = 1 -P( ∇f (x E ) ⊺ ε √ ∇f (x E ) ⊺ Σ∇f (x E ) Mean 0 Gaussian RV < - f (x E ) √ ∇f (x E ) ⊺ Σ∇f (x E ) Constant ) = 1 -Φ( - f (x E ) √ ∇f (x E ) ⊺ Σ∇f (x E ) ) = Φ( f (x E ) √ ∇f (x E ) ⊺ Σ∇f (x E ) ), where the last line follows due to symmetry of the standard normal distribution (i.e., Φ(-u) = 1 -Φ(u)). Putting the pieces together, we have: E ε [I[f (x E + ε) > 0]] = 0 ⋅ P(f (x E + ε) < 0) + 1 ⋅ P(f (x E + ε) ≥ 0) (31) = Φ( f (x E ) √ ∇f (x E ) ⊺ Σ∇f (x E ) ). Thus, we have: ∆ ≈ ∆ = 1 -Φ( f (x E ) √ ∇f (x E ) ⊺ Σ∇f (x E ) ), which completes our proof. Note that this is equivalent to P(f (x E + ε) < 0), and thus we are "counting" how often perturbations to xE sampled from ε ∼ N (0, Σ) result in flips back to the undesired class. Proposition 2. For a linear classifier, let r ∈ (0, 1) and let xE = x + δ E be the output produced by some recourse method E such that h(x E ) = 1. Then the cost required to achieve a fixed invalidation target r is given by: ∥δ E ∥ 2 = σ ω (Φ -1 (1 -r) -c), where c = f (x) σ⋅∥∇f (x)∥2 is a constant, and ω > 0 is the cosine of the angle between the vectors ∇f (x) and δ E . Proof. Under a logistic classifier, the result immediately follows by setting the expression from Theorem 1 equal to r, using the identity ∇f (x) ⊺ δ E = ω ⋅ ∥∇f (x)∥ 2 ⋅ ∥δ E ∥ 2 where ω is the cosine of the angle between the vectors ∇f (x) and δ E , and rearranging for ∥δ E ∥ 2 . Proposition 3. Under the same conditions as in Proposition 2, we have ∂∥δ E ∥2 ∂(1-r) = σ ω 1 ϕ(Φ -1 (1-r)) > 0, i.e., an infinitesimal increase in robustness (i.e.,1 -r) increases the cost of recourse by σ ω 1 ϕ(Φ -1 (1-r)) . Proof. We will compute the derivative of ∥δ E ∥ 2 = σ ω (Φ -1 (1 -r) -c) with respect to 1 -r and show that it is positive for all r ∈ (0, 1): ∂∥δ E ∥ 2 ∂(1 -r) = σ ω 1 ϕ(Φ -1 (1 -r)) > 0, ( ) where ϕ is the probability density function (PDF) of the standard Gaussian distribution. Since the PDF must be positive, we have that ϕ(Φ -1 (1 -r)) > 0, and we know that σ, ω > 0. Thus, the results follows. Proposition 4. Let xE be the output produced by some recourse method E such that h(x E ) = 1. Then, an upper bound on ∆ from equation 4 is given by: ∆(x E ; σ 2 I) ≤ 1 -Φ(c + ω σ ∥∇f (x)∥ 2 ∥∇f (x E )∥ 2 ∥δ E ∥ 1 √ ∥δ E ∥ 0 ), where c = f (x) σ⋅∥∇f (x)∥2 is a constant, δ E = xEx, and ω > 0 is the cosine of the angle between the vectors ∇f (x) and δ E . Proof. We start by noting the following basic inequality: ∥z∥ 1 ≤ √ ∥z∥ 0 ⋅ ∥z∥ 2 . Going forward, we will refer to these inequalities as basic inequalities. Moreover, note that Φ is a monotonic function. Thus, we have Φ(a) ≤ Φ(a ′ ) for a ≤ a ′ . Note that f (x E ) ≈ f (x) + ∇f (x) ⊺ δ E . Thus we obtain the following approximation: ∆ = 1 -Φ( f (x) + ∇f (x) ⊺ δ E √ ∇f (x E )Σ ⊺ ∇f (x E ) ). (37) Next, we will find upper bounds for the term on the right: Before we will do that, we will express the above expression more conveniently to highlight the impact of the counterfactual action δ E more explicitly. To do that, note that ∇f (x) ⊺ δ E = ω ⋅ ∥∇f (x)∥ 2 ⋅ ∥δ E ∥ 2 where ω is the cosine of the angle between the vectors ∇f (x) and δ E . Using Σ = σ 2 I, we obtain: Φ( f (x) + ∇f (x) ⊺ δ E σ∥∇f (x E )∥ 2 ) = Φ(c + ∥∇f (x)∥ 2 ∥∇f (x E )∥ 2 ⋅ ω σ ⋅ ∥δ E ∥ 2 ), where we defined a constant c = f (x) σ∥∇f (x E )∥2 using quantities that we will keep fixed in our analysis, namely x, ∇f (x) and σ. Also note that x is the factual input, and thus its logit score satisfies: f (x) < 0. Since δ E is a valid perturbation, we must have that ω > 0 for the perturbation to change the class prediction. Note that the following lower bound holds by the basic inequality stated above: Φ(c + ∥∇f (x)∥ 2 ∥∇f (x E )∥ 2 ⋅ ω σ ⋅ ∥δ E ∥ 2 ) ≥ Φ(c + ∥∇f (x)∥ 2 ∥∇f (x E )∥ 2 ⋅ ω σ ⋅ ∥δ E ∥ 1 √ ∥δ E ∥ 0 ). ( ) As a consequence we obtain the following upper bound on the IR: ), (41 ∆ ≤ 1 -Φ(c + ∥∇f (x)∥ 2 ∥∇f (x E )∥ 2 ⋅ ω σ ⋅ ∥δ E ∥ 1 √ ∥δ E ∥ 0 ), ) where s is the target logit score. Proof. Since we are in the linear case, we have: ∇f (x E ) = ∇f (x). Also, note that f (x E ) = f (x) + ∇f (x) ⊺ δ E . Using Σ = σ 2 I, we obtain the following exact expression: ∆ = 1 -Φ( f (x) + ∇f (x) ⊺ δ E σ∥∇f (x)∥ 2 ). From Pawelczyk et al. (2022) , we have: δ Wachter = s -f (x) ∥∇f (x)∥ 2 2 ∇f (x). Plugging equation 43 into equation 42 we obtain: ∆ = 1 -Φ( f (x) σ∥∇f (x)∥ 2 + ∇f (x) ⊺ δ E σ∥∇f (x)∥ 2 ) (44) = 1 -Φ( f (x) σ∥∇f (x)∥ 2 + 1 σ∥∇f (x)∥ 2 ⋅ ∇f (x) ⊺ ∇f (x) s -f (x) ∥∇f (x)∥ 2 2 ) = 1 -Φ( f (x) σ∥∇f (x)∥ 2 + s -f (x) σ∥∇f (x)∥ 2 ) = 1 -Φ( s σ∥∇f (x)∥ 2 ), which concludes the proof. Published as a conference paper at ICLR 2023 Corollary 1. Under the conditions of Proposition 1, choosing s r = σ∥∇f (x)∥ 2 Φ -1 (1-r) guarantees a recourse invalidation rate of r, i.e., ∆(x Wachter (s r ); σ 2 I) = r. Proof. The result directly follows from plugging in s r = σ∥∇f (x)∥ 2 Φ -1 (1 -r) into the optimal recourse from δ Wachter from equation 43 and subsequently evaluating the recourse invalidation rate from equation 5.

D.3 PROOF OF THEOREM 2

Proof. From Definition 1 we know: ∆ Forest = E ε [F(x E ) CE class -F(x E + ε) class after response ] (46) = 1 -E ε [F(x E + ε)]. It remains to evaluate: E ε [F(x E + ε)]. Using equation 14, we have: (Since ε is Gaussian) E ε [F(x E + ε)] = E ε [ ∑ R∈R F c F (R) ⋅ I(x E + ε ∈ R)] (48) = ∑ R∈R F c F (R) ⋅ E ε [I(x E + ε ∈ R)] (Linearity of Expectation) = ∑ R∈R F c F (R) ⋅ ∫ R p Using our Definition of robustness, we have ∆ Forest = 1 -∑ R∈R F c F (R) ∏ j∈S F [Φ( tj,R -xE,j σ j ) -Φ( t j,R -xE,j σ j )], which completes the proof.



Figure 2: Practical view on navigating the cost/robustness tradeoff for a credit loan example.

Figure3: Navigating between high and low invalidation recourses. The circles around PROBE's recourses have radius 2σ, i.e., this is the region where 95% of recourse inaccuracies fall when σ 2 = 0.05. For instance, on the left we set an invalidation target of r = 0.35, i.e., 35% of the recourse responses would fail under spherical inaccuracies ε ∼ N (0, 0.05 ⋅ I).

Figure4: Comparing PROBE to adversarially robust recourse methods using pareto plots that show the tradeoff between average costs and average invalidation rate (towards bottom left indicates a better performance). For PROBE, the invalidation target is r ∈ {0.35, 0.3, 0.25, 0.20, 0.15}, and we generated recourses by setting σ 2 , ϵ ∈ {0.005, 0.01, 0.015}. The latter are used for ARAR and ROAR.

Figure 5: Verifying the theoretical upper bound from Proposition 4 on the logistic regression model. The red boxplots show the empirical recourse invalidation rates for AR(-LIME), Wachter, GS, DICE, ARAR (ϵ = 0.01), ROAR (ϵ = 0.01) and PROBE (r = 0.35, σ 2 = 0.01). The blue boxplots show the distribution of upper bounds evaluated by plugging in the corresponding quantities (i.e., σ 2 , ω, etc.) into the bound. The results show no violations of our theoretical bounds. See appendix C for the full set of results.

Figure 6: Differentiable approximations of the indicator function I(x > 0) using the sigmoid function S(y) = 1 1+exp(-y) evaluated at different temperatures t ∈ {1, 2, 10, 25, 100} when ξ = 0.

Distilling a RF classifier (left) into a single tree (right). In the left panel, the RF classifier averages 30 decision trees, indicating that the final axis-aligned regions (not shown) are complicated functions of all 30 decision trees.

Computing recourse for the RF model (right) based on the hypercubes (left). The circle has radius 2σ, i.e., it shows the region where 95% of recourse inaccuracies fall when σ 2 = 0.025. The input x has IR ≈ 0.5. The CE has IR ≈ 0.05.

Figure 7: Computing certified recourses on the 2d Moon data set (Pedregosa et al., 2011) for a RF classifier. Figure a): Distilling a RF classifier (left panel) into a single decision tree (right panel) using knowledge distillation (Domingos, 1997). Figure b): Using the distilled tree, we form the hypercubes (left panel) required to compute IR according to Theorem 2. We then optimize equation 3 to find certified recourses for the RF model (right panel).

Figure8: Boxplots of recourse invalidation probabilities across sucessfully generated recourses x for logistic regression on three data sets. Recourses were generated by four different explanation methods (i.e., AR, Wachter, and GS, DICE), which use different techniques (i.e., integer programming, gradient search, random search, diverse recourse) to find minimum cost recourses. We perturbed the recourses by adding small normally distributed response inaccuracies ε ∼ N (0, σ 2 ⋅ I) to x.

Figure12: Verifying the theoretical upper bound from Lemma 4 for the logistic regression and artificial neural network classifiers on all data sets when σ 2 = 0.025. The green boxplots show the empirical recourse IRs for AR(-LIME), Wachter, GS, and PROBE. The blue boxplots show the distribution of upper bounds, which we evaluated by plugging in the corresponding quantities (i.e., σ 2 , ω, etc.) into the upper bound from Lemma 4. The results show no violations of our bounds.

Figure Verifying that the invalidation rate for our framework PROBE (blue line) is at most equal to the invalidation target r on the GMC data set for different σ 2 ∈ {0.01, 0.025} across both classifiers. We compute the mean IR across every instance in the test set. To do that, we sample 10,000 points from ε ∼ N (0, σ 2 I) for every instance and compute IR in equation 2. Then the mean IR quantifies recourse robustness where the individual IRs are averaged over all instances from the test set. The shaded regions indicate the corresponding standard deviations.

Figure 16: Trading off recourse costs against robustness by choosing the invalidation target r in our PROBE framework. We generated recourses by setting r ∈ {0.20, 0.25, 0.30, 0.35.0.40} and σ 2 = 0.01 for the logistic regression classifier.

For the logistic regression classifier, consider the recourse output by Wachter et al. (2018): xWachter (s) = x + s-f (x) ∥∇f (x)∥ 2 2 ∇f (x). Then the recourse invalidation rate has the following closed-form: ∆(x Wachter (s); σ 2 I) = 1 -Φ( s σ∥∇f (x)∥ 2

y)dy (p(y) = N (x E , σ 2 I))

± 0.01 0.46 ± 0.02 0.35 ± 0.11 0.34 ± 0.02 0.48 ± 0.04 0.47 ± 0.02 0.3 ± 0.18 0.28 ± 0.02 0.47 ± 0.06 0.45 ± 0.03 0.48 ± 0.04 0.24 ± 0.01 AC (↓) 0.55 ± 0.4 0.62 ± 0.43 2.12 ± 1.05 1.56 ± 0.92 0.16 ± 0.17 0.22 ± 0.17 0.73 ± 0.45 0.63 ± 0.39 0.29 ± 0.27 0.49 ± 0.51 0.28 ± 0.31 0.60 ± 0.56 ± 0.02 0.48 ± 0.02 0.35 ± 0.01 0.34 ± 0.09 0.46 ± 0.02 0.43 ± 0.07 0.33 ± 0.02 0.34 ± 0.07 0.43 ± 0.03 0.45 ± 0.03 0.25 ± 0.03 Comparing PROBE to recourse methods from literature using recourse accuracy (RA), average recourse invalidation rate (AIR) for σ 2 = 0.01 and average cost (AC) across different recourse methods. For PROBE, we generated recourses by setting r = 0.35 and σ 2 = 0.01. (a): Recourses that use our framework PROBE are more robust compared to those produced by existing baselines.

. All models use a 80 -20 train-test split for model training and evaluation. We evaluate model quality based on the model accuracy. All models are trained with the same architectures across the data sets:

Recourse accuracy (RA), average recourse invalidation rate (AIR) for σ 2 = 0.025 and average cost (AC) across different recourse methods. Recourses that use our framework PROBE are more robust compared to those produced by existing baselines. For PROBE, we generated recourses by setting r = 0.35. Thus, the AIR should be at most 0.35, in line with our results.C.3 VERIFYING THE VALIDITY OF THE EMPIRICAL INVALIDATION RATEIn Figures13, 14, and 15 we show that the IRs of the recourses by our framework can be controlled setting r to desired values.

who proposed a minimax objective to generate recourses that are robust to model updates (ROAR), whileDominguez-Olmedo et al. (2022) use a slight variation of this objective to find recourses that are robust to uncertainty in the Verifying that the invalidation rate for our framework PROBE (blue line) is at most equal to the invalidation target r on the Compas data set for different σ 2 ∈ {0.01, 0.025} across both classifiers. We compute the mean IR across every instance in the test set. To do that, we sample 10,000 points from ε ∼ N (0, σ 2 I) for every instance and compute IR in equation 2. Then the mean IR quantifies recourse robustness where the individual IRs are averaged over all instances from the test set. The shaded regions indicate the corresponding standard deviations.

Pareto plots showing the tradeoff between average costs and average invalidation rate when the underlying model is linear. For PROBE, the invalidation target r (dotted line) is set to 0.3, and we generated recourses by setting σ 2 ∈ {0.005, 0.01, 0.015}, and for ARAR and ROAR we set ϵ ∈ {0.005, 0.01, 0.015}. Following the suggestion byUpadhyay et al. (2021), all recourse methods search for the optimal counterfactuals over the same set of balance parameters λ ∈ {0, 0.25, 0.5, 0.75, 1}. Pareto plots showing the tradeoff between average costs and average invalidation rate when the underlying model is a neural network. For PROBE, the invalidation target r (dooted line) is set to 0.35, and we generated recourses by setting σ 2 ∈ {0.005, 0.01, 0.015}, and for ARAR and ROAR we set ϵ ∈ {0.005, 0.01, 0.015}. Following the suggestion byUpadhyay et al. (

ACKNOWLEDGEMENTS

We would like to thank the anonymous reviewers for their insightful feedback. This work is supported in part by the NSF awards #IIS-2008461 and #IIS-2040989, and research awards from Google, JP Morgan, Amazon, Bayer, Harvard Data Science Initiative, and Dˆ3 Institute at Harvard. HL would like to thank Sujatha and Mohan Lakkaraju for their continued support and encouragement. The views expressed here are those of the authors and do not reflect the official policy or position of the funding agencies.

