FAIRNESS AND ACCURACY UNDER DOMAIN GENERAL-IZATION

Abstract

As machine learning (ML) algorithms are increasingly used in high-stakes applications, concerns have arisen that they may be biased against certain social groups. Although many approaches have been proposed to make ML models fair, they typically rely on the assumption that data distributions in training and deployment are identical. Unfortunately, this is commonly violated in practice and a model that is fair during training may lead to an unexpected outcome during its deployment. Although the problem of designing robust ML models under dataset shifts has been widely studied, most existing works focus only on the transfer of accuracy. In this paper, we study the transfer of both fairness and accuracy under domain generalization where the data at test time may be sampled from never-before-seen domains. We first develop theoretical bounds on the unfairness and expected loss at deployment, and then derive sufficient conditions under which fairness and accuracy can be perfectly transferred via invariant representation learning. Guided by this, we design a learning algorithm such that fair ML models learned with training data still have high fairness and accuracy when deployment environments change. Experiments on real-world data validate the proposed algorithm. Model implementation is available at

1. INTRODUCTION

Machine learning (ML) algorithms trained with real-world data may have inherent bias and exhibit discrimination against certain social groups. To address the unfairness in ML, existing studies have proposed many fairness notions and developed approaches to learning models that satisfy these fairness notions. However, these works are based on an implicit assumption that the data distributions in training and deployment are the same, so that the fair models learned from training data can be deployed to make fair decisions on testing data. Unfortunately, this assumption is commonly violated in real-world applications such as healthcare e.g., it was shown that most US patient data for training ML models are from CA, MA, and NY, with almost no representation from the other 47 states (Kaushal et al., 2020) . Because of the distribution shifts between training and deployment, a model that is accurate and fair during training may behave in an unexpected way and induce poor performance during deployment. Therefore, it is critical to account for distribution shifts and learn fair models that are robust to potential changes in deployment environments. The problem of learning models under distribution shifts has been extensively studied in the literature and is typically referred to as domain adaptation/generalization, where the goal is to learn models on source domain(s) that can be generalized to a different (but related) target domain. Specifically, domain adaptation requires access to (unlabeled) data from the target domain at training time, and the learned model can only be used at a specific target domain. In contrast, domain generalization considers a more general setting when the target domain data are inaccessible during training; instead it assumes there exists a set of source domains based on which the learned model can be generalized to an unseen, novel target domain. For both problems, most studies focus only on the generalization of accuracy across domains without considering fairness, e.g., by theoretically examining the relations between accuracy at target and source domains (Mansour et al., 2008; 2009; Hoffman et al., 2018; Zhao et al., 2018; Phung et al., 2021; Deshmukh et al., 2019; Muandet et al., 2013; Blanchard et al., 2021; Albuquerque et al., 2019; Ye et al., 2021; Sicilia et al., 2021; Shui et al., 2022) or/and developing practical methods (Albuquerque et al., 2019; Zhao et al., 2020; Li et al., 2018a; Sun & Saenko, 2016; Ganin et al., 2016; Ilse et al., 2020; Nguyen et al., 2021) . To the best of our knowledge, only Chen et al. ( 2022 In this paper, we study the transfer of both fairness and accuracy in domain generalization via invariant representation learning, where the data in target domain is unknown and inaccessible during training. A motivating example is shown in Figure 1 . Specifically, we first establish a new theoretical framework that develops interpretable bounds on accuracy/fairness at a target domain under domain generalization, and then identify sufficient conditions under which fairness/accuracy can be perfectly transferred to an unseen target domain. Importantly, our theoretical bounds are fundamentally different from the existing bounds, compared to which ours are better connected with practical algorithmic design, i.e., our bounds are aligned with the objective of adversarial learning-based algorithms, a method that is widely used in domain generalization. Inspired by the theoretical findings, we propose Fairness and Accuracy Transfer by Density Matching (FATDM), a two-stage learning framework such that the representations and fair model learned with source domain data can be well-generalized to an unseen target domain. Last, we conduct the experiments on real-world data; the empirical results show that fair ML models trained with our method still attain a high accuracy and fairness when deployment environments differ from the training. Our main contributions and findings are summarized as follows: • We consider the transfer of both accuracy and fairness in domain generalization. To the best of our knowledge, this is the first work studying domain generalization with fairness consideration. • We develop upper bounds for expected loss (Thm. 1) and unfairness (Thm. 3) in target domains. Notably, our bounds are significantly different from the existing bounds as discussed in Appendix A. We also develop a lower bound for expected loss (Thm. 2); it indicates an inherent tradeoff of the existing methods which learn marginal invariant representations for domain generalization. • We identify sufficient conditions under which fairness and accuracy can be perfectly transferred from source domains to target domains using invariant representation learning (Thm. 4). • We propose a two-stage training framework (i.e., based on Thm. 5) that learns models in source domains (Sec. 4), which can generalize both accuracy and fairness to target domain. • We conduct experiments on real-world data to validate the effectiveness of the proposed method.

2. PROBLEM FORMULATION

Notations. Let X , A, and Y denote the space of features, sensitive attribute (distinguishing different groups, e.g., race/gender), and label, respectively. Let Z be the representation space induced from X by representation mapping g : X → Z. We use X, A, Y , Z to denote random variables that take values in X , A, Y, Z and x, a, y, z the realizations. A 



The deterministic labeling function is a special case when it follows Dirac delta distribution in ∆.



Figure 1: An example of domain generalization in healthcare: (fair) ML model trained with patient data in CA, NY, etc., can be deployed in other states by maintaining high accuracy/fairness.

domain D is specified by distribution P D : X × A × Y → [0, 1] and labeling function f D : X → Y ∆ , where ∆ is a probability simplex over Y. Similarly, let h D : Z → Y ∆ be a labeling function from representation space for domain D. Note that f D , h D , g are stochastic functions and f D = h D • g. 1 For simplicity, we use P V D (or P V |U D ) to denote the induced marginal (or conditional) distributions of variable V (given U ) in domain D.

availability

https://github.com/pth1993

