MODEL TRANSFERABILITY WITH RESPONSIVE DECI-SION SUBJECTS

Abstract

This paper studies model transferability when human decision subjects respond to a deployed machine learning model. In our setting, an agent or a user corresponds to a sample (X, Y ) drawn from a distribution D and will face a model h and its classification result h(X). Agents can modify X to adapt to h, which will incur a distribution shift on (X, Y ). Therefore, when training h, the learner will need to consider the subsequently "induced" distribution when the output model is deployed. Our formulation is motivated by applications where the deployed machine learning models interact with human agents, and will ultimately face responsive and interactive data distributions. We formalize the discussions of the transferability of a model by studying how the model trained on the available source distribution (data) would translate to the performance on the induced domain. We provide both upper bounds for the performance gap due to the induced domain shift, as well as lower bounds for the trade-offs that a classifier has to suffer on either the source training distribution or the induced target distribution. We provide further instantiated analysis for two popular domain adaptation settings with covariate shift and target shift.

1. INTRODUCTION

Decision makers are increasingly required to be transparent on their decision making to offer the "right to explanation" (Goodman & Flaxman, 2017; Selbst & Powles, 2018; Ustun et al., 2019) 1 . Being transparent also invites potential adaptations from the population, leading to potential shifts. We are motivated by settings where the deployed machine learning models interact with human agents, which will ultimately face data distributions that reflect how human agents respond to the models. For instance, when a model is used to decide loan applications, candidates may adapt their features based on the model specification in order to maximize their chances of approval; thus the loan decision classifier observes a data distribution caused by its own deployment (e.g., see Figure 1 for a demonstration). Similar observations can be articulated for application in insurance sector (i.e. developing policy s. This paper investigates model transferability when the underlying distribution shift is induced by the deployed model. What we would like to have is some guarantee on the transferability of a classifier -that is, how training on the available source distribution D S translates to performance on the induced domain D(h), which depends on the model h being deployed. A key concept in our setting is the induced risk, defined as the error a model incurs on the distribution induced by itself: Induced Risk : Err D(h) (h) := P D(h) (h(X) = Y ) (1) Most relevant to the above formulation is the strategic classification literature (Hardt et al., 2016a; Chen et al., 2020b) . In this literature, agents are modeled as rational utility maximizers and game theoretical solutions were proposed to characterize the induced risk. However, our results are motivated by the following challenges in more general scenarios: • Modeling assumptions being restrictive In many practical situations, it is often hard to faithfully characterize agents' utilities. Furthermore, agents might not be fully rational when they response. All the uncertainties can lead to a far more complicated distribution change in (X, Y ), as compared to often-made assumptions that agents only change X but not Y (Chen et al., 2020b). • Lack of access to response data Another relevant literature to our work is performative prediction (Perdomo et al., 2020) . In performative prediction, one would often require knowing D(h) or having samples observed from D(h) through repeated experiments. We posit that machine learning practitioners may only have access to data from the source distribution during training, and although they anticipate changes in the population due to human agents' responses, they cannot observe this new distribution until the model is actually deployed. • Retraining being costly Even when samples from the induced data distribution are available, retraining the model from scratch may be impractical due to computational constraints. The above observations motivate us to understand the transferability of a model trained on the source data to the domain induced by the deployment of itself. We study several fundamental questions: • Source risk ⇒ Induced risk For a given model h, how different is Err D(h) (h), the error on the distribution induced by h, from Err D S (h) := P D S (h(X) = Y ), the error on the source? • Induced risk ⇒ Minimum induced risk How much higher is Err D(h) (h), the error on the induced distribution, than min h Err D(h ) (h ), the minimum achievable induced error? • Induced risk of source optimal ⇒ Minimum induced risk Of particular interest, and as a special case of the above, how does Err D(h * S ) (h * S ), the induced error of the optimal model trained on the source distribution h * S := arg min h Err D S (h), compare to h * T := arg min h Err D(h) (h)? • Lower bound for learning tradeoffs What is the minimum error a model must incur on either the source distribution Err D S (h) or its induced distribution Err D(h) (h)? For the first three questions, we prove upper bounds on the additional error incurred when a model trained on a source distribution is transferred over to its induced domain. We also provide lower bounds for the trade-offs a classifier has to suffer on either the source training distribution or the induced target distribution. We then show how to specialize our results to two popular domain adaptation settings: covariate shift (Shimodaira, 2000; Zadrozny, 2004; Sugiyama et al., 2007; 2008; Zhang et al., 2013b) and target shift (Lipton et al., 2018; Guo et al., 2020; Zhang et al., 2013b) . All omitted proofs can be found in the Appendix. 1.1 RELATED WORKS Most relevant to us are three topics: strategic classification (Hardt et al., 2016a; Chen et al., 2020b; Dekel et al., 2010; Dong et al., 2018; Chen et al., 2020a; Miller et al., 2020; Kleinberg & Raghavan, 2020) , a recently proposed notion of performative prediction (Perdomo et al., 2020; Mendler-Dünner et al., 2020) , and domain adaptation (Jiang, 2008; Ben-David et al., 2010; Sugiyama et al., 2008; Zhang et al., 2019; Kang et al., 2019; Zhang et al., 2020) . addressed the question of repeatedly learning linear classifiers against agents who are strategically trying to game the deployed classifiers. Most of the existing literature focuses on finding the optimal classifier by assuming fully rational agents (and by characterizing the equilibrium response). In contrast, we do not make these assumptions and primarily study the transferability when only having knowledge of source data.



See Appendix A.1 for more detailed discussions.



Figure1: An example of an agent who originally has both savings and debt, observes that the classifier penalizes debt (weight -10) more than it rewards savings (weight +5), and concludes that their most efficient adaptation is to use their savings to pay down their debt.

Hardt et al. (2016a)  pioneered the formalization of strategic behavior in classification based on a sequential two-player game between agents and classifiers.Subsequently, Chen et al. (2020b)

t. customers' behaviors might adapt to lower premium (Haghtalab et al., 2020)), education sector (i.e. developing courses when students are less incentivized to cheat (Kleinberg & Raghavan, 2020)) and so on.

