FIFA: MAKING FAIRNESS MORE GENERALIZABLE IN CLASSIFIERS TRAINED ON IMBALANCED DATA

Abstract

Algorithmic fairness plays an important role in machine learning and imposing fairness constraints during learning is a common approach. However, many datasets are imbalanced in certain label classes (e.g. "healthy") and sensitive subgroups (e.g. "older patients"). Empirically, this imbalance leads to a lack of generalizability not only of classification, but also of fairness properties, especially in over-parameterized models. For example, fairness-aware training may ensure equalized odds (EO) on the training data, but EO is far from being satisfied on new users. In this paper, we propose a theoretically-principled, yet Flexible approach that is Imbalance-Fairness-Aware (FIFA). Specifically, FIFA encourages both classification and fairness generalization and can be flexibly combined with many existing fair learning methods with logits-based losses. While our main focus is on EO, FIFA can be directly applied to achieve equalized opportunity (EqOpt); and under certain conditions, it can also be applied to other fairness notions. We demonstrate the power of FIFA by combining it with a popular fair classification algorithm, and the resulting algorithm achieves significantly better fairness generalization on several real-world datasets.

1. INTRODUCTION

). The generalization of fairness constraints (EqualizedOdds) is substantially worse than the generalization of classification error. Machine learning systems are becoming increasingly vital in our daily lives. The growing concern that they may inadvertently discriminate against minorities and other protected groups when identifying or allocating resources has attracted numerous attention from various communities. While significant efforts have been devoted in understanding and correcting biases in classical models such as logistic regressions and supported vector machines (SVM), see, e.g., (Agarwal et al., 2018; Hardt et al., 2016) , those derived tools are far less effective on modern over-parameterized models such as neural networks (NN). Furthermore, in large models, it is also difficult for measures of fairness (such as equalized odds to be introduced shortly) to generalize, as shown in Fig. 1 . In other words, fairness-aware training (for instance, by imposing fairness constraints in training) may ensure measures of fairness on the training data, but those measures of fairness are far from being satisfied on test data. Here we find that sufficiently trained ResNet-10 models generalize well on classification error but poorly on fairness constraints-the gap in equalized odds between the test and training data is more than ten times larger than the gap for classification error between test and training. In parallel, another outstanding challenge for generalization with real-world datasets is that they are often imbalanced across label and demographic groups (see Fig. 2 for imbalance in three commonly used datasets across various domains). This inherent nature of real-world data, greatly hinders the generalization of classifiers that are unaware of this innate imbalance, especially when the performance measure places substantial emphasis on minority classes or subgroups without sufficient samples (e.g., when considering the average classification error for each label class While our method appears to be motivated for overparameterized models such as neural networks, it nonetheless also helps simpler models such as logistic regressions. Experiments on both large datasets using overparameterized models as well as smaller datasets using simpler models demonstrate the effectiveness, and flexibility of our approach in ensuring a better fairness generalization while preserving good classification generalization. Related work. Supervised learning with imbalanced datasets have attracted significant interest in the machine learning communities, where several methods including resampling, reweighting, and data augmentation have been developed and deployed in practice (Mani & Zhang, 2003; He & Garcia, 2009; An et al., 2021) . Theoretical analyses of those methods include margin-based approaches (Li et al., 2002; Kakade et al., 2008; Khan et al., 2019; Cao et al., 2019) . Somewhat tangentially, an outstanding and emerging problem faced by modern models with real-world data is algorithmic fairness (Dwork et al., 2012; Coley et al., 2021; Deng et al., 2023) , where practical algorithms are developed for pre-processing (Feldman et al., 2015) , in-processing (Zemel et al., 2013; Edwards & Storkey, 2015; Zafar et al., 2017; Donini et al., 2018; Madras et al., 2018; Martinez et al., 2020; Lahoti et al., 2020; Deng et al., 2020), and post-processing (Hardt et al., 2016; Kim et al., 2019) steps. Nonetheless, there are several challenges when applying fairness algorithms in practice (Beutel et al., 2019; Saha et al., 2020; Deng et al., 2022; Holstein et al., 2019) . Specifically, as hinted in Fig. 1 , the fairness generalization guarantee, especially in over-parameterized models and large datasets, is not well-understood, leading to various practical concerns. We remark that although Kini et al. (2021) claims it is necessary to use multiplicative instead of additive logits adjustments, their motivating example is different from ours and they studied SVM with fixed and specified budgets for all inputs. Cotter et al. ( 2019) investigate the generalization of optimization with data-dependent constraints, but they do not address the inherent imbalance in real datasets, and their experimental results are not implemented with large neural networks used in practice. To the best of our knowledge, this paper is the first tackling the open challenge of fairness generalization with imbalanced data.

2. BACKGROUND

Notation. For any k ∈ N + , we use [k] to denote the set {1, 2, • • • , k}. For a vector v, let v i be the i-th coordinate of v. We use 1 to denote the indicator function. For a set S, we use |S| to denote



Figure 1: Each marker corresponds to a sufficiently well-trained ResNet-10 model trained on an imbalanced image classification dataset CelebA ((Liu et al., 2015)). The generalization of fairness constraints (EqualizedOdds) is substantially worse than the generalization of classification error.

). Although generalizations with imbalanced data has been extensively studied and mitigation strategies are proposed(Cao et al., 2019; Mani & Zhang, 2003; He & Garcia, 2009; An et al., 2021; He & Ma,  2013; Krawczyk, 2016), it's unclear how well fairness properties generalize. And in this paper, we initiate the study of the open challenge: how to ensure fairness generalization of over-parameterized models for supervised classification tasks on imbalanced datasets?

