DR-FAIRNESS: DYNAMIC DATA RATIO ADJUSTMENT FOR FAIR TRAINING ON REAL AND GENERATED DATA Anonymous

Abstract

Fair visual recognition has become critical for preventing demographic disparity. A major cause of model unfairness is the imbalanced representation of different groups in training data. Recently, several works aim to alleviate this issue using generated data. However, these approaches often use generated data to obtain similar amounts of data across groups, which is not optimal for achieving high fairness due to different learning difficulties and generated data qualities across groups. To address this issue, we propose a novel adaptive sampling approach that leverages both real and generated data for fairness. We design a bilevel optimization that finds the optimal data sampling ratios among groups and between real and generated data while training a model. The ratios are dynamically adjusted considering both the model's accuracy as well as its fairness. To efficiently solve our non-convex bilevel optimization, we propose a simple approximation to the solution given by the implicit function theorem. Extensive experiments show that our framework achieves state-of-the-art fairness and accuracy on the CelebA and ImageNet People Subtree datasets. We also observe that our method adaptively relies less on the generated data when it has poor quality.

1. INTRODUCTION

Model fairness in visual recognition is becoming essential to prevent discriminatory predictions over demographics. Recently, numerous unfairness issues have been reported (Wang et al., 2020; Najibi, 2020) , and several fair image classification approaches have been proposed that do not discriminate against specific groups such as gender, age, or skin color (Ramaswamy et al., 2021; Roh et al., 2021) . With the rapid progress in deep generative learning (Karras et al., 2020; Dhariwal & Nichol, 2021) , there is a new research direction to improve fairness by augmenting training data with generated data. Recent breakthroughs in generative learning make generated data practical enough to use in real-world applications (OpenAI, 2022) , and many high-quality pre-trained generative models are now open to the public (Rombach et al., 2022) , which obviates the need to train such models ourselves. Thus, generated data is increasingly used to improve model performances, including fairness. From a fairness perspective, generated data complements real data by making it more diverse. For example, if a specific group's data is collected from a limited data source that does not have the full data distribution, that group may be discriminated in model training due to the bias (Mehrabi et al., 2021) . In this case, generated data can be used to supplement that underrepresented group. However, most fair training approaches that use generated data simply generate similar amounts of data across groups (Ramaswamy et al., 2021; Choi et al., 2020) , which may not be optimal to improve group fairness such as equalized odds (Hardt et al., 2016) and demographic parity (Feldman et al., 2015) . Such suboptimality could originate from 1) the learning difficulty differences across groups and 2) the potential bias (i.e., typically in the form of missing modes) and quality issues in the generated data that can hurt the accuracy and fairness of the model under training. Therefore, it is essential to find the right mix of generated and real data for the best accuracy and fairness. In this paper, we harness the potential of both real and generated data via adaptive sampling to improve group fairness while minimizing accuracy degradation. To this end, we design a new sampling approach called Dr-Fairness (Dynamic Data Ratio Adjustment for Fairness) that adaptively adjusts data ratios among groups and between real and generated data over iterations, as in Figure 1a .  :1 ratio ✓ ✗ ✗ ✗ FairBatch ✗ ✓ ✗ ✗ Dr-Fairness ✓ ✓ ✓ ✓ In Table 1 , we compare the unique properties of Dr-Fairness against two representative methods: 1) an equal ratio baseline (1:1 ratio) (Ramaswamy et al., 2021) that uses generated data and 2) a fairness-aware adaptive sampling baseline (Fair-Batch) (Roh et al., 2021) that finds the optimal group ratio for fairness only using real data. We can see that Dr-Fairness subsumes the two baselines and improves them by also optimizing the ratio between real and generated data and utilizing accuracy for ratio updates. To perform adaptive sampling systematically, we design a novel bilevel optimization problem along with an efficient algorithm for solving it. Our bilevel optimization consists of 1) an outer optimization that adjusts data sampling ratios considering both fairness and accuracy and 2) an inner optimization that minimizes the standard empirical risk on both real and generated data, given the current sampling ratios. Although various exact algorithms have been proposed to solve bilevel optimizations (Maclaurin et al., 2015) , they often scale poorly in our scenario with large models and data. We thus propose an approximate algorithm that uses the implicit function theorem (Krantz & Parks, 2002) and identity-matrix approximation (Luketina et al., 2016) to efficiently compute the gradient of our bilevel optimization. Specifically, instead of computing the expensive inverse Hessian matrix, we approximate it with a simple diagonal identity matrix. Experiments on CelebA (Liu et al., 2015) and ImageNet People Subtree (Yang et al., 2020) show that our approach achieves the state-of-the-art fairness and accuracy performances. For instance, Figure 1b highlights our results on CelebA, where our framework largely outperforms FairBatch, which only uses real data and the 1:1 ratio baseline -see Sec. 3 for comparisons using more baselines and other fairness metrics, which show consistent results. On the ImageNet People Subtree classification problem, which represents a large-scale real-world scenario, we achieve better accuracies than the best baseline, with an absolute improvement of 5-9%, while obtaining similar fairness scores. We also observe that our framework adaptively relies less on the generated data when it has poor quality.

Summary of Contributions:

(1) We propose Dr-Fairness, a novel adaptive sampling framework for fair training that enjoys the potential of both real and generated data. (2) To perform adaptive sampling systematically, we formulate a bilevel optimization to train fair and accurate models on real and generated data. (3) We also design an approximate algorithm based on the implicit function theorem and identity-matrix approximation to efficiently solve our optimization. (4) We perform extensive experiments on CelebA and ImageNet People Subtree to show that Dr-Fairness achieves the state-of-the-art accuracy and fairness. (5) Finally, we believe that our work reveals the importance of using generated data together with real data to improve model fairness.

2. FRAMEWORK

In this section, we first formulate a bilevel optimization problem for optimizing sampling ratios for real and generated data. We then design a new algorithm that efficiently solves the optimization problem. Throughout this paper, we use the following notations and fairness definitions. Notations Let x ∈ X be the input feature, and let y ∈ Y and ŷ ∈ Y be the true label and the predicted label, respectively. Let z ∈ Z be a sensitive group attribute, e.g., gender, age, or skin



Accuracy and fairness performances.

Figure 1: (a) Our framework iteratively updates the data ratios among groups and between real and generated data based on the fairness and accuracy of the intermediate model. (b) Performances on CelebA, using gender as the group attribute and age as the label attribute. Compared to the original model, the 1:1 ratio baseline (Ramaswamy et al., 2021) does not significantly improve group fairness, measured through equalized odds (EO) disparity. FairBatch (Roh et al., 2021) shows high fairness by adaptively selecting real data only, but loses accuracy. In comparison, Dr-Fairness (ours) achieves high fairness, while not sacrificing accuracy.

Functionality comparison of algorithms.

