THE WORLD IS CHANGING: IMPROVING FAIR TRAINING UNDER CORRELATION SHIFTS

Abstract

Model fairness is an essential element for Trustworthy AI. While many techniques for model fairness have been proposed, most of them assume that the training and deployment data distributions are identical, which is often not true in practice. In particular, when the bias between labels and sensitive groups changes, the group fairness of the trained model is directly influenced and can worsen. We make two contributions for solving this problem. First, we analytically show that existing in-processing fair algorithms have fundamental limits in accuracy and group fairness. We introduce the notion of correlation shifts, which can explicitly capture the change of the above bias. Second, we propose a novel pre-processing step that samples the input data to reduce correlation shifts and thus enables the inprocessing approaches to overcome their limitations. We formulate an optimization problem for adjusting the data ratio among labels and sensitive groups to reflect the shifted correlation. A key advantage of our approach lies in decoupling the roles of pre-processing and in-processing approaches: correlation adjustment via pre-processing and unfairness mitigation on the processed data via in-processing. Experiments show that our framework effectively improves existing in-processing fair algorithms w.r.t. accuracy and fairness, both on synthetic and real datasets.

1. INTRODUCTION

Model fairness is becoming indispensable in many artificial intelligence (AI) applications to prevent discrimination against specific groups such as gender, race, or age (Feldman et al., 2015; Hardt et al., 2016) or individuals (Dwork et al., 2012a) . In this work, we focus on group fairness, and there are three prominent group fairness approaches: pre-processing, where training data is debiased; in-processing, where model training is tailored for fairness; and post-processing, where the trained model's output is modified to satisfy fairness -see more related works discussed in Sec. 6. While fairness in-processing approaches are commonly used to mitigate unfairness, most of them make the limiting assumption that the training and deployment data distributions are the same (Zafar et al., 2017a; Zhang et al., 2018; Roh et al., 2021) . However, the two distributions are usually different, especially in terms of data biases (Wick et al., 2019; Maity et al., 2021) . For example, a recent work shows that the bias amounts likely differ between previously collected data and recently collected data (Ding et al., 2021) . Moreover, when the data bias changes, the fairness and accuracy of the trained model are now unpredictable at deployment, as the above assumption is broken. In this work, we introduce the notion of correlation shifts between the label y and group attribute z in the data to systematically address the data bias changes. Although several works have been recently proposed to investigate fair training on different types of distribution shifts, including covariate and concept shifts (Singh et al., 2021; Mishler & Dalmasso, 2022) , they usually do not explicitly consider bias changes between y and z. In comparison, our correlation shifts enables us to theoretically analyze how exactly data bias changes affect fair training -see how correlation shift compares with other types of distribution shifts in Sec. 6. For fair training under correlation shifts, we first 1) analyze the fundamental accuracy and fairness limits of in-processing approaches with the fixed distribution assumption using the notion of correlation in the data and then 2) design a novel pre-processing step to boost the performances of in-processing approaches under the correlation shifts. We show that existing in-processing fair algorithms are indeed limited by the training distribution and may perform poorly on the deployment (b) A toy example for illustrating the impact of correlation shifts (i.e., bias changes) on the trained classifier. Figure 1 : The central axis in the left figure represents the correlation between the label y and sensitive group attribute z. The correlation of training data is usually higher than that of deployment data. In Sec. 3, we show that the correlation determines the achievable performance of fair training. Thus, we first run our pre-processing and then apply existing fair algorithms on the processed data to address the correlation shift and improve the performances of fair training. Not addressing the correlation shift may result in reduced performance as shown in the right figure -see details in Sec. 2. distribution. In particular, a high (y, z)-correlation results in a poor accuracy-fairness tradeoff for any fair training. Therefore, as most in-processing fair algorithms assume identical training and deployment distributions, there is no guarantee their performances on the training data carry over to the deployment data. Based on the theoretical analysis, we propose a pre-processing step for reducing the shifted correlation by taking samples of (y, z)-classes. Using a possible range of the shifted correlations, we solve an optimization problem that finds the new data ratio among (y, z)-classes to adjust the correlation for the shift, which gives in-processing approaches a better opportunity to perform well. The new data is then used as the input of any fair algorithm. A key advantage of our framework is the decoupling of pre-processing and in-processing for unfairness mitigation where the pre-processing adjusts the correlation while the in-processing performs the rest of the unfairness mitigation, as described in Figure 1a . We note that our pre-processing aims to boost the performances of in-processing approaches based on our theoretical analysis, whereas existing pre-processing approaches for fairness simply remove the bias in the data and are not designed to explicitly benefit the in-processing approaches -see Sec. 5.1 for details. Our framework thus takes the best of both worlds of pre-and in-processings where (1) pre-processing solves the data problems, and (2) in-processing performs its best on the improved data. Our framework is not only useful for improving the fairness of a single metric, but can also be extended to support multiple metrics. In our experiments, we verify our theoretical results and demonstrate how our framework outperforms state-of-the-art pre-processing and in-processing baselines. Experiments on both synthetic and real-world datasets (COMPAS (Angwin et al., 2016) and AdultCensus (Kohavi, 1996) ) show that our framework effectively improves the accuracy and fairness performances of the state-of-the-art in-processing approaches (Zafar et al., 2017a; Zhang et al., 2018; Roh et al., 2021) under correlation shifts. Also, our framework performs better than two-step baselines that first run an existing preprocessing approach (Kamiran & Calders, 2011) and then an in-processing approach. We also show that our framework is still beneficial when we do not know the exact range of the shifted correlations. Summary of Contributions (1) We introduce the notion of correlation shifts, which is important to connect the data bias changes and behaviors of fair training. (2) Using the notion of correlation, we theoretically show that existing in-processing fair algorithms are limited by the training distribution and may perform poorly on the deployment distribution. (3) We propose a novel pre-processing step to boost the performances of fair in-processing approaches. (4) We demonstrate that our framework effectively improves the performances of the state-of-the-art fair algorithms under correlation shifts.

2. LIMITATIONS OF FAIR TRAINING WITH FIXED DISTRIBUTION ASSUMPTION

Most in-processing approaches for fairness assume that the training and deployment distributions are the same, which means that they assume the same level of bias as well. However, data bias may shift over time as confirmed by recent studies (Wick et al., 2019; Maity et al., 2021; Ding et al., 2021) , which means that the deployment data may actually have a different bias than the training data. In fair training, the data bias reflects the relationship between a label y and sensitive group attribute z. For example, if all positive labels are in the same group, the data can be considered highly biased. Conversely, if the labels are randomly assigned to all groups, the data can be considered unbiased. A bias change in the deployment data may have an adverse affect on a trained model's performance. Figure 1b shows a toy example that illustrates how a fair classifier's performance is affected by a bias change during deployment. Here, the bias can be expressed via correlation, and we discuss their

