THE WORLD IS CHANGING: IMPROVING FAIR TRAINING UNDER CORRELATION SHIFTS

Abstract

Model fairness is an essential element for Trustworthy AI. While many techniques for model fairness have been proposed, most of them assume that the training and deployment data distributions are identical, which is often not true in practice. In particular, when the bias between labels and sensitive groups changes, the group fairness of the trained model is directly influenced and can worsen. We make two contributions for solving this problem. First, we analytically show that existing in-processing fair algorithms have fundamental limits in accuracy and group fairness. We introduce the notion of correlation shifts, which can explicitly capture the change of the above bias. Second, we propose a novel pre-processing step that samples the input data to reduce correlation shifts and thus enables the inprocessing approaches to overcome their limitations. We formulate an optimization problem for adjusting the data ratio among labels and sensitive groups to reflect the shifted correlation. A key advantage of our approach lies in decoupling the roles of pre-processing and in-processing approaches: correlation adjustment via pre-processing and unfairness mitigation on the processed data via in-processing. Experiments show that our framework effectively improves existing in-processing fair algorithms w.r.t. accuracy and fairness, both on synthetic and real datasets.

1. INTRODUCTION

Model fairness is becoming indispensable in many artificial intelligence (AI) applications to prevent discrimination against specific groups such as gender, race, or age (Feldman et al., 2015; Hardt et al., 2016) or individuals (Dwork et al., 2012a) . In this work, we focus on group fairness, and there are three prominent group fairness approaches: pre-processing, where training data is debiased; in-processing, where model training is tailored for fairness; and post-processing, where the trained model's output is modified to satisfy fairness -see more related works discussed in Sec. 6. While fairness in-processing approaches are commonly used to mitigate unfairness, most of them make the limiting assumption that the training and deployment data distributions are the same (Zafar et al., 2017a; Zhang et al., 2018; Roh et al., 2021) . However, the two distributions are usually different, especially in terms of data biases (Wick et al., 2019; Maity et al., 2021) . For example, a recent work shows that the bias amounts likely differ between previously collected data and recently collected data (Ding et al., 2021) . Moreover, when the data bias changes, the fairness and accuracy of the trained model are now unpredictable at deployment, as the above assumption is broken. In this work, we introduce the notion of correlation shifts between the label y and group attribute z in the data to systematically address the data bias changes. Although several works have been recently proposed to investigate fair training on different types of distribution shifts, including covariate and concept shifts (Singh et al., 2021; Mishler & Dalmasso, 2022) , they usually do not explicitly consider bias changes between y and z. In comparison, our correlation shifts enables us to theoretically analyze how exactly data bias changes affect fair training -see how correlation shift compares with other types of distribution shifts in Sec. 6. For fair training under correlation shifts, we first 1) analyze the fundamental accuracy and fairness limits of in-processing approaches with the fixed distribution assumption using the notion of correlation in the data and then 2) design a novel pre-processing step to boost the performances of in-processing approaches under the correlation shifts. We show that existing in-processing fair algorithms are indeed limited by the training distribution and may perform poorly on the deployment

