UNSUPERVISED ADAPTATION FOR FAIRNESS UNDER COVARIATE SHIFT

Abstract

Training fair models typically involves optimizing a composite objective accounting for both prediction accuracy and some fairness measure. However, due to a shift in the distribution of the covariates at test time, the learnt fairness tradeoffs may no longer be valid, which we verify experimentally. To address this, we consider an unsupervised adaptation problem of training fair classifiers when only a small set of unlabeled test samples is available along with a large labeled training set. We propose a novel modification to the traditional composite objective by adding a weighted entropy objective on the unlabeled test dataset. This involves a min-max optimization where weights are optimized to mimic the importance weighting ratios followed by classifier optimization. We demonstrate that our weighted entropy objective provides an upper bound on the standard importance sampled training objective common in covariate shift formulations under some mild conditions. Experimentally, we demonstrate that Wasserstein distance based penalty for representation matching across protected sub groups together with the above loss outperforms existing baselines. Our method achieves the best accuracy-equalized odds tradeoff under the covariate shift setup. We find that, for the same accuracy, we get up to 2× improvement in equalized odds on notable benchmarks.

1. INTRODUCTION

Moving away from optimizing only prediction accuracy, there is a lot of interest in understanding and analyzing Machine Learning model performance along other dimensions like robustness (Silva & Najafirad, 2020) , model generalization (Wiles et al., 2021) and fairness (Oneto & Chiappa, 2020) . In this work, we focus on the algorithmic fairness aspect. When the prediction of a machine learning classifier is used to make important decisions that have societal impact, like in criminal justice, loan approvals, to name a few; how decisions impact different protected groups needs to be taken into account. Datasets used for training could be biased in the sense that some groups may be underrepresented, biasing classifier decisions towards the over-represented group or the bias could be in terms of undesirable causal pathways between sensitive attribute and the label in the real world data generating mechanism (Oneto & Chiappa, 2020) . It has often been observed (Bolukbasi et al., 2016) , (Buolamwini & Gebru, 2018) that algorithms that optimize predictive accuracy that are fed pre-existing biases further learn and then propagate the same biases. While there are various approaches for fair machine learning, a class of methods called in-processing methods have been shown to perform well (Wan et al., 2021) . These methods regularize training of fair models typically through a composition of loss objective accounting for a specific fairness measure along with predictive accuracy. Popular fairness measures are based on notions of demographic parity, equal opportunity, predictive rate parity and equalized odds. After regularized training, the model attains a specific fairness-accuracy tradeoff. When the test distribution is close or identical to the training distribution, fairness-accuracy tradeoffs typically hold. However, in practical scenarios, there could be non-trivial distributional shifts due to which tradeoffs achieved in train may not hold in the test. For example, Ding et al. ( 2021) highlights how a classifier's fairness-accuracy tradeoff trained on input samples derived from one state does not extend to predict income in other states for the Adult Income dataset. Similarly, Rezaei et al. (2021); Mandal et al. (2020) demonstrate that the tradeoffs achieved by state of the art fairness techniques do not generalize to test data under shifts. In figure 1 , we complement these claims by analyzing the under-performance for a state-of-the-art fairness method -Adversarial Debiasing (Zhang et al., 2018) . We also see similar drop in performance under covariate shift in other baselines we consider, which we highlight in our experimental analysis. 1. We show that under a scenario of asymmetric covariate shift, where one group exhibits large covariate shift while the other does not, accuracy parity degrades despite perfect representation matching across protected groups highlighting the need to tackle covariate shift explicitly. (Section 4) 2. We introduce a composite objective for prediction that involves a novel weighted entropy objective on the set of unlabeled test samples along with standard a ERM objective on the labeled training samples for tackling covariate shift. We optimize the weights using min-max optimization: The outer minimization optimizes the classifier with the composite objective, while the inner maximization finds the appropriate weights for each sample that are related to importance sampling ratios determined implicitly with no density estimation steps. We prove that our composite objective provides an upper bound on the standard importance sampled training objective common in covariate shift formulations under some mild conditions. We then combine the above composite objective with a representation matching loss to train fair classifiers. (Section 5) 3. We experiment on four benchmark datasets, including Adult, Arrhythmia, Communities and Drug. We demonstrate that, by incorporating our proposed weighted entropy objective, with the Wasserstein based penalty for representation matching across protected sub-groups, we outperform existing fairness methods under covariate shifts. In particular, we achieve the best accuracyequalized odds tradeoff: for the same accuracy, we achieve up to ≈ 2× improvement in equalized odds metric. (Section 6) Techniques for imposing fairness: Pre-processing techniques that aim to transform the dataset (Calmon et al., 2017; Swersky et al., 2013; Feldman et al., 2015; Kamiran & Calders, 2012) followed by a standard training have been studied. In-processing methods directly modify the learning



There have been works studying different types of fairness criterion. Group Fairness metrics have been studied in Hardt et al. (2016b); Kleinberg et al. (2016) while Individual Fairness metrics were studied in Dwork et al. (2012); Sharifi-Malvajerdi et al. (2019), Causal Fairness criterions has been studied in Kilbertus et al. (2017); Kusner et al. (2017); Galhotra et al. (2022); Chiappa (2019); Nabi et al. (2019); Salimi et al. (2019) where causal mechanisms that generate data are leveraged. Our work addresses questions surrounding statistical Group Fairness metrics where we address the effect of covariate shift on fairness-accuracy tradeoffs.

While this question has not received much attention, some recent works likeRezaei et al. (2021)  have begun to address this problem. Prior works rely on explicit density estimation which is then used to adapt to test data. In our work, we focus on avoiding density estimation steps that do not scale well in high dimensions. Here, we propose a novel unsupervised adaptation training objective that is theoretically justified. The objective depends on labeled training samples and unlabeled test samples along with standard fairness objective involving representation matching across the groups on the test. We report the results on equalized odds in our experiments and use the related notion of accuracy parity to motivate our algorithmic design with empirical evidence. Our key contributions are listed as follows:

