MODEL TRANSFERABILITY WITH RESPONSIVE DECI-SION SUBJECTS

Abstract

This paper studies model transferability when human decision subjects respond to a deployed machine learning model. In our setting, an agent or a user corresponds to a sample (X, Y ) drawn from a distribution D and will face a model h and its classification result h(X). Agents can modify X to adapt to h, which will incur a distribution shift on (X, Y ). Therefore, when training h, the learner will need to consider the subsequently "induced" distribution when the output model is deployed. Our formulation is motivated by applications where the deployed machine learning models interact with human agents, and will ultimately face responsive and interactive data distributions. We formalize the discussions of the transferability of a model by studying how the model trained on the available source distribution (data) would translate to the performance on the induced domain. We provide both upper bounds for the performance gap due to the induced domain shift, as well as lower bounds for the trade-offs that a classifier has to suffer on either the source training distribution or the induced target distribution. We provide further instantiated analysis for two popular domain adaptation settings with covariate shift and target shift.

1. INTRODUCTION

Decision makers are increasingly required to be transparent on their decision making to offer the "right to explanation" (Goodman & Flaxman, 2017; Selbst & Powles, 2018; Ustun et al., 2019) 1 . Being transparent also invites potential adaptations from the population, leading to potential shifts. We are motivated by settings where the deployed machine learning models interact with human agents, which will ultimately face data distributions that reflect how human agents respond to the models. For instance, when a model is used to decide loan applications, candidates may adapt their features based on the model specification in order to maximize their chances of approval; thus the loan decision classifier observes a data distribution caused by its own deployment (e.g., see Figure 1 for a demonstration). Similar observations can be articulated for application in insurance sector (i.e. developing policy s.t. customers' behaviors might adapt to lower premium (Haghtalab et al., 2020 )), education sector (i.e. developing courses when students are less incentivized to cheat (Kleinberg & Raghavan, 2020) ) and so on. This paper investigates model transferability when the underlying distribution shift is induced by the deployed model. What we would like to have is some guarantee on the transferability of a classifier -that is, how training on the available source distribution D S translates to performance on the induced domain D(h), which depends on the model h being deployed. A key concept in our setting is the induced risk, defined as the error a model incurs on the distribution induced by itself: Induced Risk : Err D(h) (h) := P D(h) (h(X) = Y ) (1) Most relevant to the above formulation is the strategic classification literature (Hardt et al., 2016a; Chen et al., 2020b) . In this literature, agents are modeled as rational utility maximizers and game theoretical solutions were proposed to characterize the induced risk. However, our results are motivated by the following challenges in more general scenarios: • Modeling assumptions being restrictive In many practical situations, it is often hard to faithfully characterize agents' utilities. Furthermore, agents might not be fully rational when they response. All the uncertainties can lead to a far more complicated distribution change in (X, Y ), as compared to often-made assumptions that agents only change X but not Y (Chen et al., 2020b) . • Lack of access to response data Another relevant literature to our work is performative prediction (Perdomo et al., 2020) . In performative prediction, one would often require knowing D(h) or having samples observed from D(h) through repeated experiments. We posit that machine learning practitioners may only have access to data from the source distribution during training, and although they anticipate changes in the population due to human agents' responses, they cannot observe this new distribution until the model is actually deployed. • Retraining being costly Even when samples from the induced data distribution are available, retraining the model from scratch may be impractical due to computational constraints. The above observations motivate us to understand the transferability of a model trained on the source data to the domain induced by the deployment of itself. We study several fundamental questions: • Source risk ⇒ Induced risk For a given model h, how different is Err D(h) (h), the error on the distribution induced by h, from Err D S (h) := P D S (h(X) = Y ), the error on the source? • Induced risk ⇒ Minimum induced risk How much higher is Err D(h) (h), the error on the induced distribution, than min h Err D(h ) (h ), the minimum achievable induced error? • Induced risk of source optimal ⇒ Minimum induced risk Of particular interest, and as a special case of the above, how does Err D(h * S ) (h * S ), the induced error of the optimal model trained on the source distribution h * S := arg min h Err D S (h), compare to h * T := arg min h Err D(h) (h)? • Lower bound for learning tradeoffs What is the minimum error a model must incur on either the source distribution Err D S (h) or its induced distribution Err D(h) (h)? For the first three questions, we prove upper bounds on the additional error incurred when a model trained on a source distribution is transferred over to its induced domain. We also provide lower bounds for the trade-offs a classifier has to suffer on either the source training distribution or the induced target distribution. We then show how to specialize our results to two popular domain adaptation settings: covariate shift (Shimodaira, 2000; Zadrozny, 2004; Sugiyama et al., 2007; 2008; Zhang et al., 2013b) and target shift (Lipton et al., 2018; Guo et al., 2020; Zhang et al., 2013b) . All omitted proofs can be found in the Appendix. 1.1 RELATED WORKS Most relevant to us are three topics: strategic classification (Hardt et al., 2016a; Chen et al., 2020b; Dekel et al., 2010; Dong et al., 2018; Chen et al., 2020a; Miller et al., 2020; Kleinberg & Raghavan, 2020) , a recently proposed notion of performative prediction (Perdomo et al., 2020; Mendler-Dünner et al., 2020) , and domain adaptation (Jiang, 2008; Ben-David et al., 2010; Sugiyama et al., 2008; Zhang et al., 2019; Kang et al., 2019; Zhang et al., 2020) . Hardt et al. (2016a) pioneered the formalization of strategic behavior in classification based on a sequential two-player game between agents and classifiers. Subsequently, Chen et al. (2020b) addressed the question of repeatedly learning linear classifiers against agents who are strategically trying to game the deployed classifiers. Most of the existing literature focuses on finding the optimal classifier by assuming fully rational agents (and by characterizing the equilibrium response). In contrast, we do not make these assumptions and primarily study the transferability when only having knowledge of source data. Our result was inspired by the transferability results in domain adaptations (Ben-David et al., 2010; Crammer et al., 2008; David et al., 2010) . Later works examined specific domain adaptation models, such as covariate shift (Shimodaira, 2000; Zadrozny, 2004; Gretton et al., 2009; Sugiyama et al., 2008; Zhang et al., 2013b; a) and target/label shift (Lipton et al., 2018; Azizzadenesheli et al., 2019) . A commonly established solution is to perform reweighted training on the source data, and robust and efficient solutions have been developed to estimate the weights accurately (Sugiyama et al., 2008; Zhang et al., 2013b; a; Lipton et al., 2018; Guo et al., 2020) . Our work, at the first sight, looks similar to several other area of studies. For instance, the notion of observing an "induced distribution" resembles similarity to the adversarial machine learning literature (Lowd & Meek, 2005; Huang et al., 2011; Vorobeychik & Kantarcioglu, 2018) . One of the major differences between us and adversarial machine learning is the true label Y stays the same for the attacked feature while in our paper, both X and Y might change in the adapted distribution D(h). In Appendix A.2, we provide detailed comparisons with some areas in domain adaptations, including domain generalization, adversarial attack and test-time adaptation. In particular, similar to domain generalization, one of the biggest challenge for our setting is the lack of access to data from the target distribution during training.

2. FORMULATION

Suppose we are learning a parametric model h ∈ H for a binary classification problem. Its training data set S := {x i , y i } N i=1 is drawn from a source distribution D S , where x i ∈ R d and y i ∈ {-1, +1}. However, h will then be deployed in a setting where the samples come from a test or target distribution D T that can differ substantially from D S . Therefore instead of minimizing the prediction error on the source distribution Err D S (h) := P D S (h(X) = Y ), the goal is to find h * that minimizes Err D T (h) := P D T (h(X) = Y ). This is often referred to as the domain adaptation problem, where typically, the transition from D S to D T is assumed to be independent of the model h being deployed. We consider a setting in which the distribution shift depends on h, or is thought of as being induced by h. We will use D(h) to denote the induced domain by h: D S → encounters model h → D(h) Strictly speaking, the induced distribution is a function of both D S and h and should be better denoted by D S (h). To ease the notation, we will stick with D(h), but we shall keep in mind of its dependency of D S . For now, we do not restrict the dependency of D(h) of D and h, but later in Section 4 and 5 we will further instantiate D(h) under specific domain adaptation settings. The challenge in the above setting is that when training h, the learner needs to carry the thoughts that D(h) should be the distribution it will be evaluated on and that the training cares about. Formally, we define the induced risk of a classifier h as the 0-1 error on the distribution h induces: Induced risk : Err D(h) (h) := P D(h) (h(X) = Y ) Denote by h * T := arg min h∈H Err D(h) (h) the classifier with minimum induced risk. More generally, when the loss may not be the 0-1 loss, we define the induced -risk as Induced -risk : Err ,D(h) (h) := E z∼D(h) [ (h; z)] The induced risks will be the primary quantities that we are interested in minimizing. The following additional notation will also be helpful: 

2.1. EXAMPLES OF DISTRIBUTION SHIFTS INDUCED BY MODEL DEPLOYMENT

We provide two example models to demonstrate the use cases for the distribution shift models described in our paper. We provide more details in Section 4.3 and Section 5.3.

Strategic Classification

An example of distribution shift is when decision subjects perform strategic response to a decision rule. It is well-known that when human agents are subject to a decision rule, they will adapt their features so as to get a favorable prediction outcome. In the literature of strategic classification, we say the human agents perform strategic adaptation (Hardt et al., 2016a) . It is natural to assume that the feature distribution before and after the human agents' best response satisfies covariate shift: namely the feature distribution P(X) will change, but P(Y |X), the mapping between Y and X, remain unchanged. Notice that this is different from the assumption made in the classic strategic classification setting Hardt et al. (2016a) , which is to assume that the changes in the feature X does not change the underlying true qualification Y . In our paper, we assume that changes in feature X could potential lead to changes in the true qualification Y , and that the mapping between Y and X remains the same before and after the adaptation. This is a commonly assumption made in a recent line of work on incentivizing improvement behaviors from human agents(see, e.g. Chen et al. (2020a) ; Shavit et al. (2020) ). We use Figure 2 (Left) as a demonstration of how distribution might shift for strategic response setting. In Section 4.3, we will use the strategic classification setup to verify our obtained results. Replicator Dynamics Replicator dynamics is a commonly used model to study the evolution of an adopted "strategy" in evolutionary game theory (Tuyls et al., 2006; Friedman & Sinervo, 2016; Taylor & Jonker, 1978; Raab & Liu, 2021) . The core notion of it is the growth or decline of the population of each strategy depends on its "fitness". Consider the label Y = {-1, +1} as the strategy, and the following behavioral response model to capture the induced target shift: X 1 X 1 X 2 X 2 X 3 X 3 Y Y h(X ) h(X ) X′ 1 X′ 1 X′ 2 X′ 2 X′ 3 X′ 3 Y′ Y′ Y Y X 1 X 1 X 3 X 3 X 2 X 2 h(X ) h(X ) Y′ Y′ X′ 1 X′ 1 X′ 3 X′ 3 X′ 2 X′ 2 P D(h) (Y = +1) P D S (Y = +1) = Fitness(Y = +1) E D S [Fitness(Y )] In short, the change of the Y = +1 population depends on how predicting Y = +1 "fits" a certain utility function. For instance, the "fitness" can take the form of the prediction accuracy of h for class +1: Fitness(Y = +1) := P D S (h(X) = +1|Y = +1) . Intuitively speaking, a higher "fitness" describes more success of agents who adopted a certain strategy (Y = -1 or Y = +1). Therefore, agents will imitate or replicate these successful peer agents by adopting the same strategy, resulting in an increase of the population (P D(h) (Y )). With assuming P(X|Y ) stays unchanged, this instantiates one example of a specific induced target shift. We will specify the condition for target shift in Section 5. We use Figure 2 (Right) as a demonstrating of how distribution might shift for the replicator dynamic setting. In Section 5.3, we will use a detailed replicator dynamics model to further instantiate our results.

3. TRANSFERABILITY OF LEARNING TO INDUCED DOMAINS

In this section, we first provide upper bounds for the transfer error of a classifier h (that is, the difference between Err D(h) (h) and Err D S (h)), as well as between Err D(h) (h) and Err D(h * T ) (h * T ). We then provide lower bounds for max{Err D S (h), Err D(h) (h)}, that is, the minimum error a model h must incur on either the source distribution D S or the induced distribution D(h).

3.1. UPPER BOUND

We first investigate upper bounds for the transfer errors. We begin by showing generic bounds, and further instantiate the bound for specific domain adaptation settings in Section 4 and 5 . We begin with answering a central question in domain adaptation: How does a model h trained on its training data set fare on the induced distribution D(h)? To that end, define the minimum and h-dependent combined error of two distributions D and D as: λ D→D := min h ∈H Err D (h ) + Err D (h ), Λ D→D (h) := Err D (h) + Err D (h) and H-divergence as d H×H (D, D ) = 2 sup h,h ∈H |P D (h(X) = h (X)) -P D (h(X) = h (X))| . The H-divergence is celebrated measure proposed in the domain adaptation literature (Ben-David et al., 2010) which will be useful for bounding the difference in errors of two classifiers. Repeating classical arguments from Ben-David et al. (2010) , we can easily prove the following: Theorem 3.1 (Source risk ⇒ Induced risk). The difference between Err D(h) (h) and Err D S (h) is upper bounded by: Err D(h) (h) ≤ Err D S (h) + λ D S →D(h) + 1 2 d H×H (D S , D(h)). The transferability of a model h between Err D(h) (h) and Err D S (h) looks precisely the same as in the classical domain adaptation setting (Ben-David et al., 2010) . Nonetheless, an arguably more interesting quantity in our setting to understand is the difference between the induced error of a given model h and the error induced by the optimal model h * T : Err D(h) (h) -Err D(h * T ) (h * T ). We get the following bound, which differs from the one in Theorem 3.1: Theorem 3.2 (Induced risk ⇒ Minimum induced risk). The difference between Err D(h) (h) and Err D(h * T ) (h * T ) is upper bounded by: Err D(h) (h) -Err D(h * T ) (h * T ) ≤ λ D(h)→D(h * T ) +Λ D(h)→D(h * T ) (h) 2 + 1 2 • d H×H (D(h * T ), D(h)). The above theorem informs us that the induced transfer error is bounded by the "average" achievable error on both distributions D(h) and D(h * T ), as well as the H × H divergence between the two distributions. Reflecting on the difference between the bounds of Theorem 3.1 and Theorem 3.2, we see that the primary change is replacing the minimum achievable error λ with the average of λ and Λ.

3.2. LOWER BOUND

Now we provide a lower bound on the induced transfer error. We particularly want to show that at least one of the two errors Err D S (h), and Err D(h) (h), must be lower-bounded by a certain quantity. Theorem 3.3 (Lower bound for learning tradeoffs ). Any model h must incur the following error on either the source or induced distribution: max{Err D S (h), Err D(h) (h)} ≥ dTV(D Y |S ,D Y (h))-dTV(D h|S ,D h (h)) 2 . The proof leverages the triangle inequality of d TV . This bound is dependent on h; however, by the data processing inequality of d TV (and f -divergence functions in general) (Liese & Vajda, 2006)  , we have d TV (D h|S , D h (h)) ≤ d TV (D X|S , D X (h)) . Applying this to Theorem 3.3 yields: Corollary 3.4. For any model h, max{Err D S (h), Err D(h) (h)} ≥ dTV(D Y |S ,D Y (h))-dTV(D X|S ,D X (h)) 2 .

3.3. HOW TO USE OUR BOUNDS

The upper and lower bounds we derived in the previous sections (Theorem 3.2 and Theorem 3.3) depend on the following two quantities either explicitly or implicitly: 1) the distribution D(h) induced by the deployment of the model h in question, and 2) the optimal target classifier h * T as well as the distribution D(h * T ) it induces. The bounds may therefore seem to be of only theoretical interest, since in reality we generally cannot compute D(h) without actual deployment, let alone compute h * T . Thus in general it is unclear how to compute the value of these bounds. Nevertheless, our bounds can still be useful and informative in the following ways: General modeling framework with flexible hypothetical shifting models The bounds can be evaluated if the decision maker has a particular shift model in mind, which specifies how the population would adapt to a model. A common special case is when the decision maker posits an individual-level agent response model (e.g. the strategic agent (Hardt et al., 2016a) -we demonstrate how to evaluate in Section 4.3). In these cases, the H-divergence can be consistently estimated from finite samples of the population (Wang et al., 2005) , allowing the decision maker to estimate the performance gap of a given h without deploying it. The general bounds provided can thus be viewed as a framework by which specialized, computationally tractable bounds can be derived. Estimate the optimal target classifier h * T from a set of imperfect models Secondly, when the decision maker has access to a set of imperfect models h1 , h2 • • • ht ∈ H T that will predict a range of possible shifted distribution D( h1 ), • • • D( ht ) ∈ D T and a range of possibly optimal target distribution h T ∈ H T , the bounds on h * T can be further instantiated by calculating the worst case in this predicted setfoot_3 : Err D(h) (h) -Err D(h * T ) (h * T ) max D ∈D T ,h ∈H T UpperBound(D , h ), max{Err D S (h), Err D(h * T ) (h * T )} min D ∈D T ,h ∈H T LowerBound(D , h ). In addition, the challenge we are facing in this paper also shed lights on the danger of directly applying existing standard domain adaptation techniques when the shifting is caused by the deployment of the classifier itself, since the bound will depend on the resulting distribution as well. We add discussions on the tightness of our theoretical bounds in Appendix G.

4. COVARIATE SHIFT

In this section, we focus on a particular domain adaptation setting known as covariate shift, in which the distribution of features changes, but the distribution of labels conditioned on features does not: P D(h) (Y = y|X = x) = P D S (Y = y|X = x), P D(h) (X = x) = P D S (X = x) Thus with covariate shift, we have P D(h) (X = x, Y = y) =P D(h) (Y = y|X = x) • P D(h) (X = x) = P D S (Y = y|X = x) • P D(h) (X = x) Let ω x (h) := X=x) be the importance weight at x, which characterizes the amount of adaptation induced by h at instance x. Then for any loss function we have P D(h) (X=x) P D S ( Proposition 4.1 (Expected Loss on D(h)). E D(h) [ (h; X, Y )] = E D S [ω x (h) • (h; x, y)]. The above derivation was not new and offered the basis for performing importance reweighting when learning under coviarate shift (Sugiyama et al., 2008) . The particular form informs us that ω x (h) controls the generation of D(h) and encodes its dependency of both D S and h, and is critical for deriving our results below.

4.1. UPPER BOUND

We now derive an upper bound for transferability under covariate shift. We will focus particularly on the optimal model trained on the source data D S , which we denote as h * S := arg min h∈H Err S (h). Recall that the classifier with minimum induced risk is denoted as h * T := arg min h∈H Err D(h) (h). We can upper bound the difference between h * S and h * T as follows: Theorem 4.2 (Suboptimality of h * S ). Let X be distributed according to D S . We have: Err D(h * S ) (h * S ) -Err D(h * T ) (h * T ) ≤ ErrD S (h * T ) • Var(ωX (h * S )) + Var(ωX (h * T )) . This result can be interpreted as follows: h * T incurs an irreducible amount of error on the source data set, represented by Err D S (h * T ). Moreover, the difference in error between h * S and h * T is at its maximum when the two classifiers induce adaptations in "opposite" directions; this is represented by the sum of the standard deviations of their importance weights, Var(ω X (h * S )) + Var(ω X (h * T )).

4.2. LOWER BOUND

Recall from Theorem 3.3, for the general setting, it is unclear whether the lower bound is strictly positive or not. In this section, we provide further understanding for when the lower bound dTV(D Y |S ,D Y (h))-dTV(D h|S ,D h (h)) 2 is indeed positive under covariate shift. Under several assumptions, our previously provided lower bound in Theorem 3.3 is strictly positive with covariate shift. Assumption 4.3. |E X∈X+(h),Y =+1 [1 -ω X (h)]| ≥ |E X∈X-(h),Y =+1 [1 -ω X (h)]| . where X + (h) = {x : ω x (h) ≥ 1} and X -(h) = {x : ω x (h) < 1}. This assumption states that increased ω x (h) value points are more likely to have positive labels. Assumption 4.4. |E X∈X+(h),h(X)=+1 [1 -ω X (h)]| ≥ |E X∈X-(h),h(X)=+1 [1 -ω X (h)]|. This assumption states that increased ω x (h) value points are more likely to be classified as positive. Assumption 4.5. Cov P D S (Y = +1|X = x) -P D S (h(x) = +1|X = x), ω x (h) > 0. This assumption is stating that for a classifier h, within all h(X) = +1 or h(X) = -1, a higher P D (Y = +1|X = x) associates with a higher ω x (h). Theorem 4.6. Assuming 4.3 -4.5, the following lower bound is strictly positive for covariate shift: max{ErrD S (h), Err D(h) (h)} ≥ dTV(D Y |S , DY (h)) -dTV(D h|S , D h (h)) 2 > 0.

4.3. EXAMPLE USING STRATEGIC CLASSIFICATION

As introduced in Section 2.1, we consider a setting caused by strategic response in which agents are classified by and adapt to a binary threshold classifier. In particular, each agent is associated with a d dimensional continuous feature x ∈ R d and a binary true qualification y(x) ∈ {-1, +1}, where y(x) is a function of the feature vector x. Consistent with the literature in strategic classification (Hardt et al., 2016a) , a simple case where after seeing the threshold binary decision rule h(x) = 2 • 1[x ≥ τ h ] -1, the agents will best response to it by maximizing the following utility function: u(x, x ) = h(x ) -h(x) -c(x, x ) , where c(x, x ) is the cost function for decision subjects to modify their feature from x to x . We assume all agents are rational utility maximizers: they will only attempt to change their features when the benefit of manipulation is greater than the cost (i.e. when c(x, x ) ≤ 2) and agent will not change their feature if they are already accepted (i.e. h(x) = +1). For a given threshold τ h and manipulation budget B, the theoretical best response of an agent with original feature x is: ∆(x) = arg max x u(x, x ) s.t. c(x, x ) ≤ B. To make the problem tractable and meaningful, we further specify the following setups: Setup 1. (Initial Feature) Agents' initial features are uniformly distributed between [0, 1] ∈ R 1 . Setup 2. (Agent's Cost Function) The cost of changing from x to x is proportional to the distance between them: c(x, x ) = x -x . Setup 2 implies that only agents whose features are in between [τ h -B, τ h ) will attempt to change their feature. We also assume that feature updates are probabilistic, such that agents with features closer to the decision boundary τ h have a greater chance of updating their feature and each updated feature x is sampled from a uniform distribution depending on τ h , B, and x (see Setup 3 & 4): Setup 3. (Agent's Success Manipulation Probability) For agents who attempt to update their features, the probability of a successful feature update is P(X = X) = 1 -|x-τ h | B . Setup 4 (Adapted Feature's Distribution). An agent's updated feature x , given original x, manipulation budget B, and classification boundary τ h , is sampled as X ∼ Unif(τ h , τ h + |B -x|). Setup 4 aims to capture the fact that even though agent targets to change their feature to the decision boundary τ h (i.e. the least cost action to get a favorable prediction outcome), they might end up reaching to a feature that is beyond the decision boundary. With the above setups, we can specify the bound in Theorem 4.2 for the strategic response setting as follows: Proposition 4.7 (Strategic Response Setting). Err D(h * S ) (h * S ) -Err D(h * T ) (h * T ) ≤ 2B 3 Err D S (h * T ). We can see that the upper bound for strategic response depends on the manipulation budget B, and the error the ideal classifier made on the source distribution Err D S (h * T ). This aligns with our intuition that the smaller manipulation budget is, the less agents will change their features, thus leading to a tighter upper bound on the difference between Err h * S (h * S ) and Err h * T (h * T ). This bound also allows us to bound this quantity even without the knowledge of the mapping between D(h) and h, since we can directly compute Err D S (h * T ) from the source distribution and an estimated optimal classifier h * T .

5. TARGET SHIFT

We consider another popular domain adaptation setting known as target shift, in which the distribution of labels changes, but the distribution of features conditioned on the label remains the same: P D(h) (X = x|Y = y) = P D S (X = x|Y = y), P D(h) (Y = y) = P D S (Y = y) For binary classification, let p(h) := P D(h) (Y = +1), and P D(h) (Y = -1) = 1 -p(h). Again, p(h) encodes the induced adaptation from D S and h. Then we have for any proper loss function : E D(h) [ (h; X, Y )] =p(h) • E D(h) [ (h; X, Y )|Y = +1] + (1 -p(h)) • E D(h) [ (h; X, Y )|Y = -1] =p(h) • E D S [ (h; X, Y )|Y = +1] + (1 -p(h)) • E D S [ (h; X, Y )|Y = -1] We will adopt the following shorthands:  Err + (h) := E D S [ (h; X, Y )|Y = +1], Err -(h) := E D S [ (h; X, Y )|Y = -1]. Note that Err + (h), Err -(h) Err D(h * S ) (h * S ) -Err D(h * T ) (h * T ) ≤ |p(h * S ) -p(h * T )| + (1 + p) • (dTV(D+(h * S ), D+(h * T )) + dTV(D-(h * S ), D-(h * T )) . The above upper bound consists of two components. The first quantity captures the difference between the two induced distributions D(h * S ) and D(h * T ). The second quantity characterizes the difference between the two classifiers h * S , h * T on the source distribution. 5.2 LOWER BOUND Now we discuss lower bounds. Denote by TPR S (h) and FPR S (h) the true positive and false positive rates of h on the source distribution D S . We prove the following: Theorem 5.2. For target shift, any model h must incur the following error on either D S or D(h): max{ErrD S (h), Err D(h) (h)} ≥ |p -p(h)| • (1 -|TPRS(h) -FPRS(h)|) 2 . The proof extends the bound of Theorem 3.3 by further explicating each of d TV (D Y |S , D Y (h)), d TV (D h|S , D h (h)) under the assumption of target shift. Since |TPR S (h) -FPR S (h)| < 0 unless we have a trivial classifier that has either TPR S (h) = 1, FPR S (h) = 0 or TPR S (h) = 0, FPR S (h) = 1, the lower bound is strictly positive. Taking a closer look, the lower bound is determined linearly by how much the label distribution shifts: pp(h). The difference is further determined by the performance of h on the source distribution through 1 -|TPR S (h) -FPR S (h)|. For instance, when TPR S (h) > FPR S (h), the quality becomes FNR S (h) + FPR S (h), that is the more error h makes, the larger the lower bound will be.

5.3. EXAMPLE USING REPLICATOR DYNAMICS

Let us instantiate the discussion using a specific fitness function for the replicator dynamics model (Section 2.1), which is the prediction accuracy of h for class y: Fitness(Y = y) := P D S (h(X) = y|Y = y) Then we have E [Fitness(Y )] = 1 -Err D S (h), and p(h) P D S (Y =+1) = Pr D S (h(X)=+1|Y =+1) 1-Err D S (h) . Plugging the result back to our Theorem 5.1 we have  0 1 2 3 4 5 K 10 -2 10 -1 Value Max LB 0 1 2 3 4 5 K 10 -2 10 -1 Value Diff UB |ω(h * S ) -ω(h * T )| ≤ PD S (Y = +1) • |ErrD S (h * S ) -ErrD S (h * T )| • |TPRS(h * S ) -TPRS(h * T )| (1 -ErrD S (h * S )) • (1 -ErrD S (h * T )) . That is, the difference between Err D(h * S ) (h * S ) and Err D(h * T ) (h * T ) is further dependent on the difference between the two classifiers' performances on the source data D S . This offers an opportunity to evaluate the possible error transferability using the source data only.

6. EXPERIMENTS

We perform synthetic experiments using real-world data to demonstrate our bounds. In particular, we use the FICO credit score data set (Board of Governors of the Federal Reserve System (US), 2007) which contains more than 300k records of TransUnion credit score of clients from different demographic groups. For our experiment on the preprocessed FICO data set (Hardt et al., 2016b) , we convert the cumulative distribution function (CDF) of TransRisk score among different groups into group-wise credit score densities, from which we generate a balanced sample to represent a population where groups have equal representations. We demonstrate the application of our results in a series of resource allocations. We consider the hypothesis class of threshold classifiers and treat the classification outcome as the decision received by individuals. For each time step K = k, we compute h * S , the statistical optimal classifier on the source distribution (i.e., the current reality for step K = k), and update the credit score for each individual according to the received decision as the new reality for time step K = k + 1. Details of the data generation is again deferred to Appendix C. We report our results in Figure 3 Challenges in Minimizing Induced Risk and Concluding Remarks We presented a sequence of model transferability results for settings where agents will respond to a deployed model. The response leads to an induced distribution that the learner would not know before deploying the model. Our results cover for both a general response setting and for specific ones (covariate shift and target shift). Looking forward to solving the induced risk minimization, the literature of domain adaptation has provided us solutions to minimize the risk on the target distribution via a nicely developed set of results (Sugiyama et al., 2008; 2007; Shimodaira, 2000) . This allows us to extend the solutions to minimize the induced risk too. Nonetheless we will highlight additional computational challenges. Let's use the covariate shift setting. The scenario for target shift is similar. For covariate shift, recall that earlier we derived the following fact: (Importance Reweighting) : E D(h) [ (h; X, Y )] = E D [ω x (h) • (h; x, y)] . This formula informs us that a promising solution that uses ω x (h) to perform reweighted ERM. There are two primary challenges when carrying out optimization of the above objective. Of course, the primary challenge that stands in the way is how do we know ω x (h). When one could build models to predict the response D(h) and then ω x (h) (e.g., using the replicator dynamics model as we introduced earlier), one could rework the above loss and apply standard gradient descent approaches. We provide a concrete example and discussion in Appendix E. Without making any assumptions on the mapping between h and D(h), one can only potentially rely on the bandit feedbacks from the decision subjects to estimate the influence of h on D(h) -we also laid out a possibility in Appendix E too. It can also be inferred from Eqn. ( 6) that the second challenge is the induced risk minimization might not even be convex -due to the limit of space, we defer the detailed discussion again to the Appendix D.

7. ETHICAL STATEMENT

The primary goal of our study is to put human in the center when considering domain shift. The development of the paper is fully aware of any fairness concerns and we expect positive societal impact. Unawareness of the potential distribution shift might lead to unintended consequence when training a machine learning model. One goal of this paper is to raise awareness of this issue for a safe deployment of machine learning methods in high-stake societal applications. A subset of our results are developed under assumptions (e.g., Theorem 4.6). Therefore we want to caution readers of potential misinterpretation of applicability of the reported theoretical guarantees. Our contributions are mostly theoretical and our experiments use synthetic agent models to simulate distribution shift. A future direction is to collect real human experiment data to support the findings. Our paper ends with discussing the challenges in learning under the responding distribution and other objectives that might arise. We believe this is a promising research direction for the machine learning community, both as a unaddressed technical problem and a stepstone for putting human in the center when training a machine learning model.

8. REPRODUCIBILITY STATEMENT

We provide the following checklist for the purpose of reproducibility: 1. Generals: • It is also known that it is possible to steal model parameters, if agents have incentives to do sofoot_5 . For instance, spammers frequently infer detection mechanisms by sending different email variants; they then adjust their spam content accordingly.

A.2 COMPARISON OF OUR SETTING AND SOME AREAS IN DOMAIN ADAPTATION

We compare our setting (We address it as IDA, representing "induced domain adaptation") with the following areas: • Lemma A.1. For any hypotheses h, h ∈ H and distributions D, D , |Err D (h, h ) -Err D (h, h )| ≤ d H×H (D, D ) 2 . Proof. Define the-cross prediction disagreement between two classifiers h, h on a distribution D as Err D (h, h ) := P D (h(X) = h (X)). By the definition of the H-divergence, d H×H (D, D ) = 2 sup h,h ∈H |P D (h(X) = h (X)) -P D (h(X) = h (X))| = 2 sup h,h ∈H |Err D (h, h ) -Err D (h, h )| ≥ 2 |Err D (h, h ) -Err D (h, h )| . Another helpful lemma for us is the well-known fact that the 0-1 error obeys the triangle inequality (see, e.g., Crammer et al. (2008) ): Lemma A.2. For any distribution D over instances and any labeling functions f 1 , f 2 , and f 3 , we have Err D (f 1 , f 2 ) ≤ Err D (f 1 , f 3 ) + Err D (f 2 , f 3 ). Denote by h * the ideal joint hypothesis, which minimizes the combined error: h * := arg min h ∈H Err D(h) (h ) + Err D S (h ) We have: Err D(h) (h) ≤ Err D(h) ( h * ) + Err D(h) (h, h * ) (Lemma A.2) ≤ Err D(h) ( h * ) + Err D S (h, h * ) + Err D(h) (h, h * ) -Err D S (h, h * ) ≤ Err D(h) ( h * ) + Err D S (h) + Err D S ( h * ) + 1 2 d H×H (D S , D(h)) (Lemma A.1) = Err D S (h) + λ D S →D(h) + 1 2 d H×H (D S , D(h)). (Definition of h * ) A.4 PROOF OF THEOREM 3.2 Proof. Invoking Theorem 3.1, and replacing h with h * T and S with D(h * T ), we have Err D(h) (h * T ) ≤ Err D(h * T ) (h * T ) + λ D(h)→D(h * T ) + 1 2 d H×H (D(h * T ), D(h)) Now observe that Err D(h) (h) ≤ Err D(h) (h * T ) + Err D(h) (h, h * T ) ≤ Err D(h) (h * T ) + Err D(h * T ) (h, h * T ) + Err D(h) (h, h * T ) -Err D(h * T ) (h, h * T ) ≤ Err D(h) (h * T ) + Err D(h * T ) (h, h * T ) + 1 2 d H×H (D(h * T ), D(h)) (by Lemma A.1) ≤ Err D(h) (h * T ) + Err D(h * T ) (h) + Err D(h * T ) (h * T ) + 1 2 d H×H (D(h * T ), D(h)) (by Lemma A.2) ≤ Err D(h * T ) (h * T ) + λ D(h)→D(h * T ) + 1 2 d H×H (D(h * T ), D(h)) (by equation 7) + Err D(h * T ) (h) + Err D(h * T ) (h * T ) + 1 2 d H×H (D(h * T ), D(h)) Adding Err D(h) (h) to both sides and rearranging terms yields 2Err D(h) (h) -2Err D(h * T ) (h * T ) ≤ Err D(h) (h) + Err D(h * T ) (h) + λ D(h)→D(h * T ) + d H×H (D(h * T ), D(h)) = Λ D(h)→D(h * T ) (h) + λ D(h)→D(h * T ) + d H×H (D(h * T ), D(h)) Dividing both sides by 2 completes the proof.

A.5 PROOF OF THEOREM 3.3

Proof. Using the triangle inequality of d TV , we have d TV (D Y |S , D Y (h)) ≤ d TV (D Y |S , D h|S ) + d TV (D h|S , D h (h)) + d TV (D h (h), D Y (h)) and by the definition of d TV , the divergence term d TV (D Y |S , D Y (h)) becomes d TV (D Y |S , D h|S ) = |P D S (Y = +1) -P D S (h(x) = +1)| = E D S [Y ] + 1 2 - E D S [h(X)] + 1 2 = E D S [Y ] 2 - E D S [h(X)] 2 ≤ 1 2 • E D S [|Y -h(X)|] = Err D S (h) Similarly, we have d TV (D h (h), D Y (h)) ≤ Err D(h) (h) As a result, we have Err D S (h) + Err D(h) (h) ≥ d TV (D Y |S , D h|S ) + d TV (D h (h), D Y (h)) ≥ d TV (D Y |S , D Y (h)) -d TV (D h|S , D h (h)) (by equation 8) which implies max{Err D S (h), Err D(h) (h)} ≥ d TV (D Y |S , D Y (h)) -d TV (D h|S , D h (h)) 2 . A.6 PROOF OF PROPOSITION 4.1 Proof. E D(h) [ (h; X, Y )] = P D(h) (X = x, Y = y) (h; x, y) dxdy = P D S (Y = y|X = x) • P D(h) (X = x) (h; x, y) dxdy = P D S (Y = y|X = x) • P D S (X = x) • P D(h) (X = x) P D S (X = x) • (h; x, y) dxdy = P D S (Y = y|X = x) • P D S (X = x) • ω x (h) • (h; x, y) dxdy =E D S [ω x (h) • (h; x, y)] A.7 PROOF OF THEOREM 4.2 Proof. We start from the error induced by h * S . Let the average importance weight induced by h * S be ω(h * S ) = E D S [ω x (h * S )]; we add and subtract this from the error: Err D(h * S ) (h * S ) = E D S [ω x (h * S ) • 1(h * S (x) = y)] = E D S [ω(h * S ) • 1(h * S (x) = y)] + E D S [(ω x (h * S ) -ω(h * S )) • 1(h * S (x) = y)] In fact, ω(h * S ) = 1, since ω(h * S ) =E D S [ω x (h * S )] = ω x (h * S )P D S (X = x)dx = P D(h) (X = x) P D S (X = x) P D S (X = x)dx = P D(h) (X = x)dx = 1 Now consider any other classifier h. We have Err D(h * S ) (h * S ) = E D S [1(h * S (x) = y)] + E D S [(ω x (h * S ) -ω(h * S )) • 1(h * S (x) = y)] ≤ E D S [1(h(x) = y)] + E D S [(ω x (h * S ) -ω(h * S )) • 1(h * S (x) = y)] (by optimality of h * S on D S ) = E D S [ω(h) • 1(h(x) = y)] + E D S [(ω x (h * S ) -ω(h * S )) • 1(h * S (x) = y)] (multiply by ω(h * S ) = 1) = E D S [ω x (h) • 1(h(x) = y)] + E D S [(ω(h) -ω x (h)) • 1(h(x) = y)] (add and subtract ω(h * S )) + E D S [(ω x (h * S ) -ω(h * S )) • 1(h * S (x) = y)] = Err D(h) (h) + Cov(ω x (h * S ), 1(h * S (x) = y)) -Cov(ω x (h), 1(h(x) = y)) Moving the error terms to one side, we have Err D(h * S ) (h * S ) -Err D(h) (h) ≤ Cov(ω x (h * S ), 1(h * S (x) = y)) -Cov(ω x (h), 1(h(x) = y)) ≤ Var(ω x (h * S )) • Var(1(h * S (x) = y)) (|Cov(X, Y )| ≤ Var(X) • Var(Y )) + Var(ω x (h)) • Var(1(h(x) = y)) = Var(ω x (h * S )) • Err S (h * S )(1 -Err S (h * S )) + Var(ω x (h)) • Err D S (h)(1 -Err D S (h)) ≤ Var(ω x (h * S )) • Err S (h * S ) + Var(ω x (h)) • Err D S (h) (1 -Err D S (h) ≤ 1) ≤ Err D S (h) • Var(ω x (h * S )) + Var(ω x (h)) Since this holds for any h, it certainly holds for h = h * T . A.8 OMITTED ASSUMPTIONS AND PROOF OF THEOREM 4.6 Denote X + (h) = {x : ω x (h) ≥ 1} and X -(h) = {x : ω x (h) < 1}. First we observe that X+(h) P D S (X = x)(1 -ω x (h))dx + X-(h) P D S (X = x)(1 -ω x (h))dx = 0 This is simply because of x P D S (X = x) • ω x (h)dx = x P D(h) (X = x)dx = 1. Proof. Notice that in the setting of binary classification, we can write the total variation distance between D Y |S and D Y (h) as the difference between the probability of Y = +1 and the probability of Y = -1: d TV (D Y |S , D Y (h)) = P D S (Y = +1) -P D(h) (Y = +1) = P D S (Y = +1|X = x)P D S (X = x)dx -P D S (Y = +1|X = x)P D S (X = x)ω x (h)dx = P D S (Y = +1|X = x)P D S (X = x) • (1 -ω x (h))dx (9) Similarly we have d TV (D h|S , D h (h)) = P D S (h(x) = +1|X = x)P D S (X = x) • (1 -ω x (h))dx We can further expand the total variation distance between D Y |S and D Y (h) as follows:  d TV (D Y |S , D Y (h)) = P D S (Y = +1|X = x)P D S (X = x) • (1 -ω x (h))dx = X+(h) P D (Y = +1|X = x)P D S (X = x) • (1 -ω x (h))dx ≤0 + X-(h) P D S (Y = +1|X = x)P D S (X = x) • (1 -ω x (h))dx >0 = - X+(h) P D S (Y = +1|X = x)P D S (X = x) • (1 -ω x (h))dx - X-(h) P D S (Y = +1|X = x)P D S (X = x) • (1 -ω x (h))dx (by Assumption 4.3) = X+(h) P D S (Y = +1|X = x)P D S (X = x) • (ω x (h) -1)dx + X-(h) P D S (Y = +1|X = x)P D S (X = x) • (ω x (h) -1)dx (by equation 9) = P D S (Y = +1|X = x)P D S (X = x) • (ω x (h) -1) d TV (D Y |S , D Y (h)) -d TV (D h|S , D h (h)) = P D S (Y = +1|X = x)P D S (X = x) • (ω x (h) -1)dx -P D (h(x) = +1|X = x)P D S (X = x) • (ω x (h) -1)dx = [P D S (Y = +1|X = x) -P D S (h(x) = +1|X = x)]P D S (X = x) • (ω x (h) -1)dx = E D S [(P D S (Y = +1|X = x) -P D S (h(x) = +1|X = x)) (ω x (h) -1)] (by Assumption 4.5) > E D S [P D S (Y = +1|X = x) -P D S (h(x) = +1|X = x)]E D S [ω x (h) -1] = 0 Combining the above with Theorem 3.3, we have max{Err D S (h), Err D(h) (h)} ≥ d TV (D Y |S , D Y (h)) -d TV (D h|S , D h (h)) 2 > 0 A.9 OMITTED DETAILS FOR SECTION 4.3 With Setup 2 -Setup 4, we can further specify the important weight w x (h) for the strategic response setting: Lemma A.3. Recall the definition for the covariate shift important weight coefficient ω x (h) := P D(h) (X=x) X=x) , for our strategic response setting, we have, P D S ( w x (h) =        1, x ∈ [0, τ h -B) τ h -x B , x ∈ [τ h -B, τ h ) 1 B (-x + τ h + 2B), x ∈ [τ h , τ h + B) 1, x ∈ [τ h + B, 1] Proof for Lemma A.3: Proof. We discuss the induced distribution D(h) by cases: • For the features distributed between [0, τ h -B]: since we assume the agents are rational, under assumption 2, agents with feature that is smaller than [0, τ h -B] will not perform any kinds of adaptations, and no other agents will adapt their features to this range of features either, so the distribution between [0, τ h -B] will remain the same as before. • For the target distribution between [τ h -B, τ h ] can be directly calculated from assumption 3. • For distribution between [τ h , τ h + B], consider a particular feature x ∈ [τ h , τ h + B], under Setup 4, we know its new distribution becomes: P D(h) (x = x ) = 1 + τ h x -B 1 -τ h -z B B -τ h + z dz = 1 + τ h x -B 1 B dz = 1 B (-x + τ h + 2B) • For the target distribution between [τ h + B, 1]: under assumption 2 and 4, we know that no agents will change their feature to this feature region. So the distribution between [τ h + B, 1] remains the same as the source distribution. Recall the definition for the covariate shift important weight coefficient ω x (h) := X=x) , the distribution of ω x (h) after agents' strategic responding becomes: P D(h) (X=x) P D S ( ω x (h) =        1, x ∈ [0, τ h -B) and x ∈ [τ h + B, 1] τ h -x B , x ∈ [τ h -B, τ h ) 1 B (-x + τ h + 2B), x ∈ [τ h , τ h + B) 0, otherwise Proof for Proposition 4.7: Proof. According to Lemma A.3, we can compute the variance of w x (h) as Var(w x (h)) = E(w x (h) 2 ) -E(w x (h) 2 ) = 2 3 B. Then by plugging it to the general bound for Theorem 4.2 gives us the desirable result. A.10 PROOF OF THEOREM 5.1 Proof. Defining p := P D S (Y = +1), p(h) = P D(h) (Y = +1), we have Err D(h * S ) (h * S ) = p(h * S ) • Err + (h * S ) + (1 -p(h * S )) • Err -(h * S ) (by definitions of p(h * S ), Err + (h * S ), and Err -(h * S )) = p • Err + (h * S ) + (1 -p) • Err -(h * S ) (I) +(p(h * S ) -p)[Err + (h * S ) -Err -(h * S )] (13) We can expand (I) as follows: p • Err + (h * S ) + (1 -p) • Err -(h * S ) ≤ p • Err + (h * T ) + (1 -p) • Err -(h * T ) (by optimality of h * S on D S ) = p(h * T ) • Err + (h * T ) + (1 -p(h * T )) • Err -(h * T ) + (p -p(h * T )) • [Err + (h * T ) -Err -(h * T )] = Err D(h * T ) (h * T ) + (p -p(h * T )) • [Err + (h * T ) -Err -(h * T )] . Plugging this back into equation 13, we have Err D(h * S ) (h * S ) -Err D(h * T ) (h * T ) ≤ (p(h * S ) -p)[Err + (h * S ) -Err -(h * S )] + (p -p(h * T )) • [Err + (h * T ) -Err -(h * T )] Notice that 0.5(Err + (h) -Err -(h)) = 0.5 • 1 -0.5 • P(h(X) = +1|Y = +1) -0.5 • P(h(X) = +1|Y = -1) = 0.5 -P Du (h(X) = +1) where D u is a distribution with uniform prior. Then (p(h * S ) -p)[Err + (h * S ) -Err -(h * S )] = 2(p(h * S ) -p) • (0.5 -P Du (h(X) = +1)) (p -p(h * T ))[Err + (h * T ) -Err -(h * T )] = 2(p -p(h * T )) • (0.5 -P Du (h(X) = +1)) Adding together these two equations yields (p(h * S ) -p)[Err + (h * S ) -Err -(h * S )] + (p -p(h * T )) • [Err + (h * T ) -Err -(h * T )] = 2(p(h * S ) -p) • (0.5 -P Du (h * S (X) = +1)) + 2(p -p(h * T )) • (0.5 -P Du (h * T (X) = +1)) = (p(h * S ) -p(h * T )) -2 (p(h * S )P Du (h * S (X) = +1) -p(h * T )P Du (h * T (X) = +1)) + 2p • (P Du (h * S (X) = +1) -P Du (h * T (X) = +1)) ≤ |p(h * S ) -p(h * T )| • (1 + 2|P Du (h * S (X) = +1) -P Du (h * T (X) = +1)|) + 2p • |P Du (h * S (X) = +1) -P Du (h * T (X) = +1)| Meanwhile, |P Du (h * S (X) = +1) -P Du (h * T (X) = +1)| ≤ 0.5 • |P D|Y =+1 (h * S (X) = +1) -P D|Y =+1 (h * T (X) = +1)| + 0.5 • |P D|Y =-1 (h * S (X) = +1) -P D|Y =-1 (h * T (X) = +1)| = 0.5 (d TV (D + (h * S ), D + (h * T )) + d TV (D -(h * S ), D -(h * T )) Combining equation 14 and equation 15 gives |p(h * S ) -p(h * T )| • (1 + 2 • |P Du (h * S (X) = +1) -P Du (h * T (X) = +1)|) + 2p • |P Du (h * S (X) = +1) -P Du (h * T (X) = +1)| ≤ |p(h * S ) -p(h * T )| • (1 + d TV (D + (h * S ), D + (h * T )) + d TV (D -(h * S ), D -(h * T )) + p • (d TV (D + (h * S ), D + (h * T )) + d TV (D -(h * S ), D -(h * T )) ≤ |p(h * S ) -p(h * T )| + (1 + p) • (d TV (D + (h * S ), D + (h * T )) + d TV (D -(h * S ), D -(h * T )) . A.11 PROOF OF THEOREM B.1 We will make use of the following fact: Lemma A.4. Under label shift, TPR S (h) = TPR h (h) and FPR S (h) = FPR h (h). Proof. We have TPR h (h) =P D(h) (h(X) = +1|Y = +1) = P D(h) (h(X) = +1, X = x|Y = +1)dx = P D(h) (h(X) = +1|X = x, Y = +1)P D(h) (X = x|Y = +1)dx = 1(h(x) = +1)P D(h) (X = x|Y = +1)dx = 1(h(x) = +1)P D S (X = x|Y = +1)dx (by definition of label shift) = P D S (h(X) = +1|X = x, Y = +1)P D S (X = x|Y = +1)dx =TPR S (h) The argument for TPR h (h) = TPR S (h) is analogous. Now we proceed to prove the theorem. Proof of Theorem B.1. In section 3.2 we showed a general lower bound on the maximum of Err D S (h) and Err D(h) (h): max{Err D S (h), Err D(h) (h)} ≥ d TV (D Y |S , D Y (h)) -d TV (D h|S , D h (h)) 2 In the case of label shift, and by the definitions of p and p(h), d TV (D Y |S , D Y (h)) = |P D S (Y = +1) -P D(h) (Y = +1)| = |p -p(h)| In addition, we have D h|S = P S (h(X) = +1) = p • TPR S (h) + (1 -p) • FPR S (h) Similarly D h (h) = P D(h) (h(X) = +1) = p(h) • TPR h (h) + (1 -p(h)) • FPR h (h) = p(h) • TPR S (h) + (1 -p(h)) • FPR S (h) (by Lemma A.4) Therefore d TV (D h|S , D h (h)) =|P D S (h(X) = +1) -P D(h) (h(X) = +1)| =|(p -p(h)) • TPR S (h) + (p(h) -p) • FPR S (h)| (By equation 18 and equation 17) =|p -p(h)| • |TPR S (h) -FPR S (h)| which yields: d TV (D Y |S , D Y (h)) -d TV (D h|S , D h (h)) = |p -p(h)|(1 -|TPR S (h) -FPR S (h)|) (By equation 16 and equation 19) completing the proof. A.12 PROOF OF PROPOSITION B.2 Proof. |p(h * S ) -p(h * T )| • 1 P D S (Y = +1) = |(1 -Err D S (h * S ))TPR S (h * S ) -(1 -Err D S (h * T ))TPR S (h * T )| (1 -Err D S (h * S )) • (1 -Err D S (h * T )) ≤ |Err D S (h * S ) -Err D S (h * T )| • |TPR S (h * S ) -TPR S (h * T )| (1 -Err D S (h * S )) • (1 -Err D S (h * T )) The inequality above is due to Lemma 7 of Liu & Liu (2015) .

B LOWER BOUND AND EXAMPLE FOR TARGET SHIFT

B.1 LOWER BOUND Now we discuss lower bounds. Denote by TPR S (h) and FPR S (h) the true positive and false positive rates of h on the source distribution D S . We prove the following: Theorem B.1. Under target shift, any model h must incur the following error on either the D S or D(h): max{Err D S (h), Err D(h) (h)} ≥ |p -p(h)| • (1 -|TPR S (h) -FPR S (h)|) 2 . The proof extends the bound of Theorem 3.3 by further explicating each of d TV (D Y |S , D Y (h)), d TV (D h|S , and D h (h)) under the assumption of target shift. Since |TPR S (h) -FPR S (h)| < 0 unless we have a trivial classifier that has either TPR S (h) = 1, FPR S (h) = 0 or TPR S (h) = 0, FPR S (h) = 1, the lower bound is strictly positive. Taking a closer look, the lower bound is determined linearly by how much the label distribution shifts: pp(h). The difference is further determined by the performance of h on the source distribution through 1 -|TPR S (h) -FPR S (h)|. For instance, when TPR S (h) > FPR S (h), the quality becomes FNR S (h) + FPR S (h), that is the more error h makes, the larger the lower bound will be.

B.2 EXAMPLE USING REPLICATOR DYNAMICS

Let us instantiate the discussion using a specific fitness function for the replicator dynamics model (Section 2.1), which is the prediction accuracy of h for class +1: [Fitness of Y = +1] := P D S (h(X) = +1|Y = +1) Then we have E [Fitness of Y ] = Err D S (h), and p(h) P D S (Y = +1) = P D S (h(X) = +1|Y = +1) Err D S (h) Plugging the result back to our Theorem 5.  |p(h * S ) -p(h * T )| ≤ P D S (Y = +1) • |Err D S (h * S ) -Err D S (h * T )| • |TPR S (h * S ) -TPR S (h * T )| Err D S (h * S ) • Err D S (h * T ) . That is, the difference between Err D(h * S ) (h * S ) and Err D(h * T ) (h * T ) is further dependent on the difference between the two classifiers' performances on the source data D S . This offers an opportunity to evaluate the possible error transferability using the source data only.

C MISSING EXPERIMENTAL DETAILS C.1 SYNTHETIC EXPERIMENTS USING DAG

Synthetic experiments using simulated data We generate synthetic data sets from structural equation models described on simple causal DAG in Figure 2 for covariate shift and target shift. To generate the induced distribution D(h), we posit a specific adaptation function ∆ : R d × H → R d , so that when an input x encounters classifier h ∈ H, its induced features are precisely x = ∆(x, h). We provide details of the data generation processes and adaptation functions in Appendix C. We take our training data set {x 1 , . . . , x n } and learn a "base" logistic regression model h(x) = σ(w • x)foot_6 . We then consider the hypothesis class H := {h τ | τ ∈ [0, 1]}, where h τ (x) := 2 • 1[σ(w • x) > τ ] -1. To compute h * S , the model that performs best on the source distribution, we simply vary τ and take the h τ with lowest prediction error. Then, we posit a specific adaptation function ∆(x, h τ ). Finally, to compute h * T , we vary τ from 0 to 1 and find the classifier h τ that minimizes the prediction error on its induced data set {∆(x 1 , h τ ), . . . , ∆(x n , h τ )}. We report our results in Figure 4 . Covariate Shift We specify the causal DAG for covariate shift setting in the following way: X 1 ∼ Unif(-1, 1) X 2 ∼ 1.2X 1 + N (0, σ 2 2 ) X 3 ∼ -X 2 1 + N (0, σ 2 3 ) Y := 2sign(X 2 > 0) -1 where σ 2 2 and σ 2 3 are parameters of our choices. Adaptation function We assume the new distribution of feature X 1 will be generated in the following way: X 1 = ∆(X) = X 1 + c • (h(X) -1) where c ∈ R 1 > 0 is the parameter controlling how much the prediction h(X) affect the generating of X 1 , namely the magnitude of distribution shift. Intuitively, this adaptation function means that if a feature x is predicted to be positive (h(x) = +1), then decision subjects are more likely to adapt to that feature in the induced distribution; Otherwise, decision subjects are more likely to be moving away from x since they know it will lead to a negative prediction. Target Shift We specify the causal DAG for target shift setting in the following way: (Y + 1)/2 ∼ Bernoulli(α) X 1 |Y = y ∼ N [0,1] (µ y , σ 2 ) X 2 = -0.8X 1 + N (0, σ 2 2 ) X 3 = 0.2Y + N (0, σ 2 3 ) 0 1 2 3 4 5 K 10 -2 10 -1

Value

Max LB (a) L1 penalty, strong regularization strength. 

C.2.2 ADDITIONAL EXPERIMENTAL RESULTS

In this section, we present additional experimental results on the real-world FICO credit score data set. With the initialization of the distribution of credit score Q and the specified dynamics, we present results comparing the influence of vanilla regularization terms in decision-making (when estimating the credit score Q) on the calculation of bounds for induced risks. 8 In particular, we consider L1 norm (Figure 5 ) and L2 norm (Figure 6 ) regularization terms when optimizing decision-making policies on the source domain. As we can see from the results, applying vanilla regularization terms (e.g., L1 norm and L2 norm) on source domain without specific considerations of the inducing-risk mechanism does not provide significant performance improvement in terms of smaller induced risk. For example, there is no significant decrease of the term Diff as the regularization strength increases, for both L1 norm (Figure 5 ) and L2 norm (Figure 6 ) regularization terms. 

D CHALLENGES IN MINIMIZING INDUCED RISK D.1 COMPUTATIONAL CHALLENGES

The literature of domain adaptation has provided us solutions to minimize the risk on the target distribution via a nicely developed set of results Sugiyama et al. (2008; 2007) ; Shimodaira (2000) . This allows us to extend the solutions to minimize the induced risk too. Nonetheless we will highlight additional computational challenges. We focus on the covariate shift setting. The scenario for target shift is similar. For covariate shift, recall that earlier we derived the following fact: E D(h) [ (h; X, Y )] = E D [ω x (h) • (h; x, y)] This formula informs us that a promising solution that uses ω x (h) to perform reweighted ERM. Of course, the primary challenge that stands in the way is how do we know ω x (h). There are different methods proposed in the literature to estimate ω x (h) when one has access to D(h) Zhang et al. (2013b) ; Long et al. (2016) ; Gong et al. (2016) . How any of the specific techniques work in our induced domain adaptation setting will be left for a more thorough future study. In this section, we focus on explaining the computational challenges even when such knowledge of ω x (h) can be obtained for each model h being considered during training. Though ω x (h), (h; x, y) might both be convex with respect to (the output of) the classifier h, their product is not necessarily convex. Consider the following example: Example (ω x (h) • (h; x, y) is generally non-convex). Let X = (0, 1]. Let the true label of each x ∈ X be y(x) = 1 x ≥ 1 2 . Let (h; x, y) = 1 2 (h(x)y) 2 , and let h(x) = x (simple linear model). Notice that is convex in h. Let D be the distribution, whose density function is f D = 1, 0 < x ≤ 1 0, otherwise . Notice that if the training data is drawn from D, then h is the linear classifier that minimizes the expected loss. Suppose that, since h rewards large values of x, it induces decision subjects to shift towards higher feature values. In particular, let D(h) have density function f D(h) = 2x, 0 < x ≤ 1 0, otherwise Then for all x ∈ X , ω x (h) = f D(h) (x) f D (x) = 2x. Notice that ω x (h) = 2x is convex in h(x) = x. Then ω x (h) • (h; x, y) = 2x • 1 2 (h(x) -y) 2 = x(x -y) 2 = x 3 , 0 < x < 1 2 x(x -1) 2 , 1 2 ≤ x ≤ 1 which is clearly non-convex. Nonetheless, we provide sufficient conditions under which ω x (h) • (h; x, y) is in fact convex: Proposition D.1. Suppose ω x (h) and (h; x, y) are both convex in h, and ω x (h) and (h; x, y) satisfy ∀h, h , x, y: (ω x (h) -ω x (h )) • ( (h; x, y) -(h ; x, y)) ≥ 0. Then ω x (h) • (h; x, y) is convex. Proof. Let us use the shorthand ω(h) := ω x (h) and (h) := (h; x, y). To show that ω(h) • (h) is convex, it suffices to show that for any α ∈ [0, 1] and any two hypotheses h, h we have ω(α • h + (1 -α) • h ) • (α • h + (1 -α) • h ) ≤ α • ω(h) • (h) + (1 -α) • ω(h ) • (h ) By the convexity of ω, ω(α • h + (1 -α) • h ) ≤ α • ω(h) + (1 -α) • ω(h ) and by the convexity of , (α • h + (1 -α) • h ) ≤ α • (h) + (1 -α) • (h ) Therefore it suffices to show that [α • ω(h) + (1 -α) • ω(h )] • [α • (h) + (1 -α) • (h )] -α • ω(h) • (h) + (1 -α) • ω(h ) • (h ) ≤ 0 ⇔ α(α -1) • ω(h) (h) -α(α -1) • [ω(h) (h ) + ω(h ) (h)] + α(α -1) • ω(h ) (h ) ≤ 0 ⇔ α(α -1) • [ω(h) -ω(h )] • [ (h) -(h )] ≤ 0 ⇔ [ω(h) -ω(h )] • [ (h) -(h )] ≥ 0 By the assumed condition, the left-hand side is indeed non-negative, which proves the claim. This condition is intuitive when each x belongs to a rational agent who responds to a classifier h to maximize her chance of being classified as +1: For y = +1, the higher loss point corresponds to the ones that are close to decision boundary, therefore, more -1 negative label points might shift to it, resulting to a larger ω x (h). For y = -1, the higher loss point corresponds to the ones that are likely mis-classified as +1, which "attracts" instances to deviate to.

D.2 CHALLENGES DUE TO THE LACK OF ACCESS TO DATA

We discuss the challenges in performing induced domain adaptation. In the standard domain adaptation settings, one often assumes the access to a sample set of X, which already poses challenges when there is no access to label Y after the adaptation. Nonetheless, the literature has observed a fruitful development of solutions Sugiyama et al. (2008) ; Zhang et al. (2013b) ; Gong et al. (2016) . One might think the idea can be applied to our IDA setting rather straightforwardly by assuming observing samples from D(h), the induced distribution under each model h during the training. However, we often do not know precisely how the distribution would shift under a model h until we deploy This is particularly true when the distribution shifts are caused by human responding to a model. Therefore, the ability to "predict" accurately how samples "react" to h plays a very important role Ustun et al. (2019) . Indeed, the strategic classification literature enables this capability by assuming full rational human agents. For a more general setting, building robust domain adaptation tools that are resistant to the above "prediction error" is also going to be a crucial criterion.

E DISCUSSIONS ON PERFORMING DIRECT INDUCED RISK MINIMIZATION

In this section, we provide discussions on how to directly perform induced risk minimization for our induced domain adaptation setting. We first provide a gradient descent based method for a particular label shift setting where the underlying dynamic is replicator dynamic described in Section 5.3. Then we propose a solution for a more general induced domain adaptation setting where we do not make any particular assumptions on the undelying distribution shift model.

E.1 GRADIENT DESCENT BASED METHOD

Here we provide a toy example of performing direct induced risk minimization under the assumption of label shift with underlying dynamics as the replicator dynamics described in Section 5.3. Setting Consider a simple setting in which each decision subject is associated with a 1-dimensional continuous feature x ∈ R and a binary true qualification y ∈ {-1, +1}. We assume label shift setting, and the underlying population dynamic evolves the replicator dynamic setting described in Section 5.3. We consider a simple threshold classifier, where Ŷ = h(x) = 1[X ≥ θ], meaning that the classifier is completely characterized by the threshold parameter θ. Below we will use Ŷ and h(X) interchangeably to represent the classification outcome. Recall that the replicator dynamics is specified as follows: P D(h) (Y = y) P D S (Y = y) = Fitness(Y = y) E D S [Fitness(Y )] where E D S [Fitness(Y )] = Fitness(Y = y)P D S (Y = y) + Fitness(Y = -y)(1 -P D S (Y = y)). Fitness(Y = y) is the fitness of strategy Y = y, which is further defined in terms of the expected utility U y,ŷ of each qualification-classification outcome pair (y, ŷ): Fitness(Y = y) := ŷ P[ Ŷ = ŷ|Y = y] • U y,ŷ where U y,ŷ is the utility (or reward) for each qualification-classification outcome combination.P(X|Y = y) is sampled according to a Gaussian distribution, and will be unchanged since we consider a label shift setting. We initialize the distributions we specify the initial qualification rate P D S (Y = +1). To test different settings, we vary the specification of the utility matrix U y,ŷ and generate different dynamics. Formulate the induced risk as a function of h To minimize the induced risk, we first formulate the induced risk as a function of the classifier h's parameter θ taking into account of the underlying dynamic, and then perform gradient descent to solve for locally optimal classifier h * T . Recall from Section 5, under label shift, we can rewrite the induced risk as the following form: Experimental Results Figure 7 shows the experimental results for this toy example. We can see that for each setting, compared to the baseline classifier h * S , the proposed gradient based optimization procedure returns us a classifier that achieves a better prediction accuracy (thus lower induced risk) compared to the accuracy of the source optimal classifier.

E.2 GENERAL SETTING: INDUCED RISK MINIMIZATION WITH BANDIT FEEDBACK

In general, finding optimal classifier that achieves the optimal induced risk h * T is a hard problem due to the interactive nature of the problem (see, e.g. the literature of performative prediction Perdomo et al. (2020) for more detailed discussions). Without making any assumptions on the mapping between h and D(h), one can only potentially rely on the bandit feedbacks from the decision subjects to estimate the influence of h on D(h): when the induced risk is a convex function of the classifier h's parameter θ, one possible approach is to use the standard techniques from bandit optimization (Flaxman et al., 2004) to iteratively find induced optimal classifier h * T . The basic idea is: at each step t = 1, • • • , T , the decision maker deploy a classifier h t , then observe data points sampled from D(h t ) and their losses, and use them to construct an approximate gradient for the induced risk as a function of the model parameter θ t . When the induced risk is a convex function in the model parameter θ, the above approach guarantees to converge to h * T , and have sublinear regret in the total number of steps T . The detailed description of the algorithm for finding h * T is as follows: Algorithm 

F REGULARIZED TRAINING

In this section, we discuss the possibility that indeed minimizing regularized risk will lead to a tighter upper bound. Consider the target shift setting. Recall that p(h) := P D(h) (Y = +1) and we have for any proper loss function : E D(h) [ (h; X, Y )] = p(h) • E D S [ (h; X, Y )|Y = +1] + (1 -p(h)) • E D S [ (h; X, Y )|Y = -1] Suppose p < p(h * T ), now we claim that minimizing the following regularized/penalized risk leads to a smaller upper bound. E D S [ (h; X, Y )] + α • E Duniform || h(X) + 1 2 || where in above D uniform is a distribution with uniform prior for Y . We impose the following assumption: 



See Appendix A.1 for more detailed discussions. The ":=" defines the RHS as the probability measure function for the LHS. For continuous X, the probability measure shall be read as the density function. UpperBound and LowerBound are the RHS expressions in Theorem 3.3 and Theorem 3.2. https://datasociety.net/library/poverty-lawgorithms/ https://www.wired.com/2016/09/how-to-steal-an-ai/ σ(•) is the logistic function and w ∈ R 3 denotes the weights. The regularization that involves induced risk considerations will be discussed in Appendix F.



Figure1: An example of an agent who originally has both savings and debt, observes that the classifier penalizes debt (weight -10) more than it rewards savings (weight +5), and concludes that their most efficient adaptation is to use their savings to pay down their debt.

Distributions of Y on a distribution D: D Y := P D (Y = y) 2 , and in particular D Y (h) := P D(h) (Y = y), D Y |S := P D S (Y = y). • Distribution of h on a distribution D: D h := P D (h(X) = y), and in particular D h (h) := P D(h) (h(X) = y), D h|S := P D S (h(X) = y). • Marginal distribution of X for a distribution D: D X := P D (X = x), and in particular D X (h) := P D(h) (X = x), D X|S := P D S (X = x) 3 . • Total variation distance (Ali & Silvey, 1966): d TV (D, D ) := sup O |P D (O) -P D (O)|.

Figure 2: Example causal graph annotated to demonstrate covariate shift (Left) / target shift (Right) as a result of the deployment of h. Grey nodes indicate observable variables and transparent nodes are not observed at the training stage. Red arrow emphasises h induces changes of certain variables.

are both defined on the conditional source distribution, which is invariant under the target shift assumption. 5.1 UPPER BOUND We again upper bound the transferability of h * S under target shift. Denote by D + the positive label distribution on D S (P D S (X = x|Y = +1)) and D -the negative label distribution on D S (P D S (X = x|Y = -1)). Let p := P D S (Y = +1). Theorem 5.1. For target shift, the difference between Err D(h * S ) (h * S ) and Err D(h * T ) (h * T ) bounds as:

Figure 3: Diff := Err D(h * S ) (h * S ) -Err D(h * T ) (h * T ), Max := max{Err D S (h * T ), Err D(h * T ) (h * T )}, UB := upper bound specified in Theorem 4.2, and LB := lower bound specified in Theorem 4.6. For each time step K = k, we compute and deploy the source optimal classifier h * S and update the credit score for each individual according to the received decision as the new reality for time step K = k + 1. Details of the data generation is again deferred to Appendix C. Proposition 5.3. Under the replicator dynamics model in Eqn. (21), |ω(h * S )ω(h * T )| bounds as:

. We do observe positive gaps Err D(h * S ) (h * S ) -Err D(h * T ) (h * T ), indicating the suboptimality of training on D S . The gaps are well bounded by the theoretical upper bound (UB). Our lower bounds (LB) do return meaningful positive gaps, demonstrating the trade-offs that a classifier has to suffer on either the source distribution or the induced target distribution.

(a) Do the main claims made in the abstract and introduction accurately reflect the paper's contributions and scope? [Yes] (b) Did you describe the limitations of your work?[Yes]  We have stated our assumptions and limitations of the results. We also discussed the limitations in the conclusion. (c) Did you discuss any potential negative societal impacts of your work? [Yes] One of our work's goals is to raise awareness of this issue for a safe deployment of machine learning methods in high-stake societal applications. We discuss the potential misinterpretation of our results in conclusion. (d) Have you read the ethics review guidelines and ensured that your paper conforms to them? [Yes] 2. If you are including theoretical results... (a) Did you state the full set of assumptions of all theoretical results? [Yes] (b) Did you include complete proofs of all theoretical results? [Yes] We present the complete proofs in the appendix. 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] We included experiment details in the appendix and submitted the implementation in the supplementary materials. (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] (c) Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? [N/A] In our controlled experiment, we do not tune parameters and do not observe a significant variance in the results. (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] 4. If you are using existing assets (e.g., code, data, models) or curating/releasing new assets... (a) If your work uses existing assets, did you cite the creators? [Yes] (b) Did you mention the license of the assets? [Yes] (c) Did you include any new assets either in the supplemental material or as a URL? [No] (d) Did you discuss whether and how consent was obtained from people whose data you're using/curating? [N/A] (e) Did you discuss whether the data you are using/curating contains personally identifiable information or offensive content? [N/A] 5. If you used crowdsourcing or conducted research with human subjects... (a) Did you include the full text of instructions given to participants and screenshots, if applicable? [N/A] (b) Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable? [N/A] (c) Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation? [N/A]

Adversarial attackChakraborty et al. (2018);Papernot et al. (2016);Song et al. (2019): in adversarial attack, the true label Y stays the same for the attacked feature, while in IDA, we allow the true label to change as well. One can think of adversarial attack as a specific form of IDA where the induced distribution has a specific target, that is to maximize the classifier's error by only perturbing/modifying. Our transferability bound does, however, provide insights for how standard training results transfer to the attack setting. • Domain generalization Wang et al. (2021b); Li et al. (2017); Muandet et al. (2013): the goal of domain generalization is to learn a model that can be generalized to any unseen distribution; Similar to our setting, one of the biggest challenges in domain generalization also the lack of target distribution during training. The major difference, however, is that our focus is to understand how the performance of a classifier trained on the source distribution degrades when evaluated on the induced distribution (which depends on how the population of decision subjects responds); this degradation depends on the classifier itself. • Test-time adaptation Varsavsky et al. (2020); Wang et al. (2021a); Nado et al. (2021): the issue of test-time adaptation falls into the classical domain adaptation setting where the adaptation is independent of the model being deployed. Applying this technique to solve our problem requires accessing data (either unsupervised or supervised) drawn from D S (h) for each h being evaluated during different training epochs. A.3 PROOF OF THEOREM 3.1Proof. We first establish two lemmas that will be helpful for bounding the errors of a pair of classifiers.Both are standard results from the domain adaption literatureBen-David et al. (2010).

dx Similarly, by assumption 4.4 and equation equation 10, we have d TV (D h|S , D h (h)) = P D S (h(x) = +1|X = x)P D S (X = x) • (ω x (h) -1)dx Thus we can bound the difference between d TV (D Y |S , D Y (h)) and d TV (D h|S , D h (h)) as follows:

Figure 4: Results for synthetic experiments on simulated and real-world data. Diff := Err D(h * S ) (h * S )-Err D(h * T ) (h * T ), Max := max{Err D S (h * T ), Err D(h * T ) (h * T )}, UB := upper bound specified in Theorem 4.2, and LB := lower bound specified in Theorem 4.6.

L1 penalty, regularization strength.

Figure 5: Results of applying L1 penalty with different strength when constructing h * S . The left column consisting of panels (a), (c), and (e) compares Max := max{Err D S (h * T ), Err D(h * T ) (h * T )} and LB := lower bound specified in Theorem 4.6. The right column consisting of panels (b), (d), and (f) compares Diff := Err D(h * S ) (h * S ) -Err D(h * T ) (h * T ) and UB := upper bound specified in Theorem 4.2. For each time step K = k, we compute and deploy the source optimal classifier h * S and update the credit score for each individual according to the received decision as the new reality for time step K = k + 1.

L2 penalty, weak regularization strength.

Figure 6: Results of applying L2 penalty with different strength when constructing h * S . The left column consisting of panels (a), (c), and (e) compares Max := max{Err D S (h * T ), Err D(h * T ) (h * T )} and LB := lower bound specified in Theorem 4.6. The right column consisting of panels (b), (d), and (f) compares Diff := Err D(h * S ) (h * S ) -Err D(h * T ) (h * T ) and UB := upper bound specified in Theorem 4.2. For each time step K = k, we compute and deploy the source optimal classifier h * S and update the credit score for each individual according to the received decision as the new reality for time step K = k + 1.

D(h) [ (h; X, Y )] =p(h) • E D S [ (h; X, Y )|Y = +1] + (1p(h)) • E D S [ (h; X, Y )|Y = -1] where p(h) = P D(h) (Y = +1). Since E D S [ (h; X, Y )|Y = +1] and E D S [ (h; X, Y )|Y = -1]are already functions of both h and D S , it suffices to show that the accuracy on D(h), p(h) = P D(h) (Y = +1), can also be expressed as a function of θ and D S .To see this, recall that for a threshold classifier Ŷ = 1[X > θ], it means that the prediction accuracy can be written as a function of the threshold θ and distribution D(h):P D(h) (Y = +1) = P D(h) ( Ŷ = +1, Y = +1) + P D(h) ( Ŷ = -1, Y = -1) = P D(h) (X ≥ θ, Y = +1) + P D(h) (X ≤ θ, Y = -1) h) (Y = -1) P(X = x|Y = -1) unchanged because of label shift dx (23)where P(X|Y = y) remains unchanged over time, and P D(h) (Y = y) evolves over time according to Equation (22), namelyP D(h) (Y = y) =P D S (Y = y) × Fitness g (Y = y) E D S [Fitness g (Y )] =P D S (Y = y) × ŷ P D S [ Ŷ = ŷ|Y = y, G = g] • U ŷ,y y ( ŷ P D S [ Ŷ = ŷ|Y = y, G = g] • U ŷ,y )P D S [Y = y](24)Notice that Ŷ is only a function of θ, and U y,ŷ are fixed quantities, the above derivation indicates that we can express P D(h) (Y = y) as a function of θ and D S . Plugging it back to Equation (23), we can see that the accuracy can also be expressed as a function of the classifier's parameter θ, indicating that the induced risk can be expressed as a function of θ. Thus we can use gradient descent using automatic differentiation w.r.t θ to find a optimal classifier h * T that minimize the induced risk.

Figure 7: Experimental results of directly optimizing for the induced risk under the assumption of replicator dynamic. The X-axis denotes the prediction accuracy of Err D(h * S ) (h * S ), where h * S is the source optimal classifier under each settings. The Y-axis is the percent of performance improvement using the classifier that optimize for h * T = arg min Err D(h) (h), which the decision maker considers the underlying response dynamics (according to replicator dynamics in Equation (22)) of the decision subjects. Different color represents different utility function, which is reflected by the specifications of values in U y,ŷ ; within each color, different dots represent different initial qualification rate.

The number of predicted +1 for examples with Y = +1 and for examples with Y = -1 are monotonic with respect to α.Consider the easier setting with = 0-1 loss. ThenE Duniform ||h(X)|| = 0.5 • (P X|Y =+1 [h(X) = +1] + P X|Y =-1 [h(X) = +1]) -0.5 = 0.5 • (E X|Y =+1 [ (h(X), +1)] -E X|Y =-1 [ (h(X), -1])

1: One-point bandit gradient descent for performative prediction Result: return θ T after T roundsθ 1 ← 0 foreach time step t ← 1, . . . , T do Sample a unit vector u t ∼ Unif(S) θ + t ← θ t + δu t Observe data points z 1 , . . . , z nt ∼ D(θ + + t ) • u t gt (θ t ) is an approximation of ∇ θ IR(θ t ) θ t+1 ← Π (1-δ)Θ (θ tηg t (θ t ))Take gradient step; project onto (1δ)Θ := {(1δ)θ | θ ∈ Θ} end

A APPENDIX

We arrange the appendix as follows:• Appendix A.1 provides some real life scenarios where transparent models are useful or required.• Appendix A.2 provides comparisons of our setting and other sub-areas in domain adaptation.• Appendix A.3 provides proof for Theorem 3.1.• Appendix A.4 provides proof for Theorem 3.2.• Appendix A.5 provides proof of Theorem 3.3.• Appendix A.6 provides proof for Proposition 4.1.• Appendix A.7 provides proof for Theorem 4.2.• Appendix A.8 provides proof for Theorem 4.6.• Appendix A.9 provides omitted assumptions and proof for Section 4.3.• Appendix A.10 provides proof for Theorem 5.1.• Appendix A.11 provides proof for Theorem B.1.• Appendix A.12 provides proof for Proposition B.2.• Appendix B provides additional lower bound and examples for the target shift setting.• Appendix C provides missing experimental results , including new experimental results using synthetic datasets generated according to causal graphs defined in Figure 2 . We also add additional experimental results on credit score data set. • Appendix D discusses challenges in minimizing induced risk.• Appendix E provides discussions on how to directly minimize the induced risk.• Appendix F provides discussions on adding regularization to the objective function.• Appendix G provides discussions on the tightness of our theoretical bounds.

A.1 EXAMPLE USAGES OF TRANSPARENT MODELS

As we mentioned in Section 1, there is an increasing requirement of making the decision rule to be transparent due to its potential consequences impacts to individual decision subject. Here we provide the following reasons for using transparent models:• Government regulation may require the model to be transparent, especially in public services;• In some cases, companies may want to disclose their models so users will have explanations and are incentivized to better use the provided services.• Regardless of whether models are published voluntarily, model parameters can often be inferred via well-known query "attacks".In addition, we name some concrete examples of some real-life applications:• Consider the Medicaid health insurance program in the United States, which serves lowincome people. There is an obligation to provide transparency/disclose the rules (model to automate the decisions) that decide whether individuals qualify for the program -in fact, most public services have "terms" that are usually set in stone and explained in the documentation. Agents can observe the rules and will adapt their profiles to be qualified if needed. For instance, an agent can decide to provide additional documentation they need to guarantee approval. For more applications along these lines, please refer to this report 5 .• Credit score companies directly publish their criteria for assessing credit risk scores. In loan application settings, companies actually have the incentive to release criteria to incentivize agents to meet their qualifications and use their services.Furthermore, making decision models transparent will gain the trust of users.where N [0,1] represents a truncated Gaussian distribution taken value between 0 and 1. α, µ y , σ 2 ,σ 2 2 and σ 2 3 are parameters of our choices. Adaptation function We assume the new distribution of the qualification Y will be updated in the following way:where 0 ≤ c hy ∈ R 1 ≤ 1 represents the likelihood for a person with original qualification Y = y and get predicted as h(X) = h to be qualified in the next step (Y = +1).Discussion of the Results For all four datasets, we do observe positive gaps Err 

C.2.1 PARAMETERS FOR DYNAMICS

Since we are considering the dynamic setting, we further specify the data generating process in the following way (from time step T = t to T = t + 1):where {•} (0,1] represents truncated value between the interval (0, 1], f t (•) represents the decision policy from input features, and 1 , 2 , σ are parameters of choices. In our experiments, we setWithin the same time step, i.e., for variables that share the subscript t, Q t and A t are root causes for all other variables (X t,1 , X t,2 , X t,3 , D t , Y t ). At each time step T = t, the institution first estimates the credit score Q t (which is not directly visible to the institution, but is reflected in the visible outcome label Y t ) based on (A t , X t,1 , X t,2 , X t,3 ), then produces the binary decision D t according to the optimal threshold (in terms of the accuracy).For different time steps, e.g., from T = t to T = t + 1, the new distribution at T = t + 1 is induced by the deployment of the decision policy D t . Such impact is modeled by a multiplicative update in Q t+1 from Q t with parameters (or functions) α D (•) and α Y (•) that depend on D t and Y t , respectively. In our experiments, we set α D = 0.01 and α Y = 0.005 to capture the scenario where one-step influence of the decision on the credit score is stronger than that for ground truth label.The above regularized risk minimization problem is equivalent toRecall the upper bound in Theorem 5.1:.With a properly specified α > 0, this leads to a distribution with a smaller gap of |p( hS )p(h * T )|, where hS denotes the optimal classifier of the penalized risk minimization -this leads to a smaller Term 1 in the bound of Theorem 5.1. Furthermore, the induced risk minimization problem will correspond to an α s.t. α * = p(h * T )-p 0.5, and the original h * S corresponds to a distribution of α = 0. Using the monotonicity assumption, we will establish that the second term in Theorem 5.1 will also smaller when we tune a proper α.

G DISCUSSION ON THE TIGHTNESS OF OUR THEORETICAL BOUNDS

General Bounds in Section 3 For the general bounds reported in Section 3, it is not trivial to fully quantify the tightness without further quantifying the specific quantities of the terms, e.g. the H divergence of the source and the induced distribution, and the average error a classifier have to incur for both distribution. This part of our results adapted from the classical literature in learning from multiple domains Ben-David et al. (2010) . The tightness of using H-divergence and other terms seem to be partially validated therein.Bounds in Section 4 and Section 5 For more specific bounds provided in Section 4 (for covariate shift) and Section 5 (target shift), however, it is relatively easier to argue about the tightness: the proofs there are more transparent and are easier to back out the conditions where the inequalities are relaxed. For example, in Theorem 5.1, the inequalities of our bound are introduced primarily in the following two places: 1) one is using the optimiality of h * S on the source distribution. 2) the other is bounding the statistical difference in h * S and h * T 's predictions on the positive and negative examples. Both are saying that if the differences in the two classifiers' predictions are bounded in a range, then the result in Theorem 5.1 is relatively tight.

