RE-WEIGHTING BASED GROUP FAIRNESS REGULAR-IZATION VIA CLASSWISE ROBUST OPTIMIZATION

Abstract

Many existing group fairness-aware training methods aim to achieve the group fairness by either re-weighting underrepresented groups based on certain rules or using weakly approximated surrogates for the fairness metrics in the objective as regularization terms. Although each of the learning schemes has its own strength in terms of applicability or performance, respectively, it is difficult for any method in the either category to be considered as a gold standard since their successful performances are typically limited to specific cases. To that end, we propose a principled method, dubbed as FairDRO, which unifies the two learning schemes by incorporating a well-justified group fairness metric into the training objective using a classwise distributionally robust optimization (DRO) framework. We then develop an iterative optimization algorithm that minimizes the resulting objective by automatically producing the correct re-weights for each group. Our experiments show that FairDRO is scalable and easily adaptable to diverse applications, and consistently achieves the state-of-the-art performance on several benchmark datasets in terms of the accuracy-fairness trade-off, compared to recent strong baselines.

1. INTRODUCTION

Machine learning algorithms are increasingly used in various decision-making applications that have societal impact; e.g., crime assessment (Julia Angwin & Kirchner, 2016) , credit estimation (Khandani et al., 2010) , facial recognition (Buolamwini & Gebru, 2018; Wang et al., 2019) , automated filtering in social media (Fan et al., 2021) , AI-assisted hiring (Nguyen & Gatica-Perez, 2016) and law enforcement (Garvie, 2016) . A critical issue in such applications is the potential discrepancy of model performance, e.g., accuracy, across different sensitive groups (e.g., race or gender) (Buolamwini & Gebru, 2018) , which is easily observed in the models trained with a vanilla empirical risk minimization (ERM) (Valiant, 1984) when the training data has unwanted bias. To address such issues, the fairnessaware learning has recently drawn attention in the AI research community. One of the objectives of fairness-aware learning is to achieve group fairness, which focuses on the statistical parity of the model prediction across sensitive groups. The so-called in-processing methods typically employ additional machinery for achieving the group fairness while training. Depending on the type of machinery used, recent in-processing methods can be divided into two categories (Caton & Haas, 2020) : regularization based methods and re-weighting based methods. Regularization based methods incorporate fairness-promoting terms to their loss functions. They can often achieve good performance by balancing the accuracy and fairness, but be applied only to certain types of model architectures or tasks, such as DNNs (e.g., MFD (Jung et al., 2021) or FairHSIC (Quadrianto et al., 2019) ) or binary classification tasks (e.g., Cov (Baharlouei et al., 2020) ). On the other hand, re-weighting based methods are more flexible and can be applied to a wider range of models and tasks by adopting simpler strategy to give higher weights to samples from underrepresented groups. However, most of them (e.g., LBC (Jiang & Nachum, 2020), RW (Kamiran & Calders, 2012), and FairBatch (Roh et al., 2020) ) lack sound theoretical justifications for enforcing group fairness and may perform poorly on some benchmark datasets. In this paper, we devise a new in-processing method, dubbed as Fairness-aware Distributionally Robust Optimization (FairDRO), which takes the advantages of both regularization and re-weighting based methods. The core of our method is to unify the two learning categories: namely, FairDRO incorporates a well-justified group fairness metric in the training objective as a regularizer, and optimizes the resulting objective function using a re-weighting based learning method. More specifically, we first present that a group fairness metric, Difference of Conditional Accuracy (DCA) (Berk et al., 2021) , which is a natural extension of Equalized Opportunity (Hardt et al., 2016) to the multi-class, multi-group label settings, is equivalent (up to a constant) to the average (over the classes) of the roots of the variances of groupwise 0-1 losses. We then employ the Group DRO formulation, which uses the χ 2 -divergence ball including quasi-probabilities as the uncertainty set, for each class separately to convert the DCA (or variance) regularized group-balanced empirical risk minimization (ERM) to a more tractable minimax optimization. The inner maximizer in the converted optimization problem is then used as re-weights for the samples in each group, making a unified connection between the reweighting and regularization-based fairness-aware learning methods. Lastly, we develop an efficient iterative optimization algorithm, which automatically produces the correct (sometimes even negative) re-weights during the optimization process, in a more principled way than other re-weighting based methods. In our experiments, we empirically show that our FairDRO is scalable and easily adaptable to diverse application scenarios, including tabular (Julia Angwin & Kirchner, 2016; Dua et al., 2017 ), vision (Zhang et al., 2017) , and natural language text (Koh et al., 2021) datasets. We compare with several representative in-processing baselines that apply either re-weighting schemes or surrogate regularizations for group fairness, and show that FairDRO consistently achieves the state-of-the-art performance on all datasets in terms of accuracy-fairness trade-off thanks to leveraging the benefits of both kinds of fairness-aware learning methods.

2.1. IN-PROCESSING METHODS FOR GROUP FAIRNESS

Regularization based methods add penalty terms in their objective function for promoting the group fairness. Due to non-differentiability of desired group fairness metrics, they use weaker surrogate regularization terms; e.g., Cov (Zafar et al., 2017b ), Rényi (Baharlouei et al., 2020) and FairHSIC (Quadrianto et al., 2019) employ a covariance approximation, Rényi correlation (Rényi, 1959) and Hilbert Schmidt Independence Criterion (HSIC) (Gretton et al., 2005) between the group label and the model as fairness constraints, respectively. Jung et al. ( 2021) devised MFD that uses Maximum Mean Discrepancy (MMD) (Gretton et al., 2005) regularization for distilling knowledge from a teacher model and promoting group fairness at the same time. Although these methods can achieve high performance when hyperparameters for strengths of the regularization terms are well-tuned, they are sensitive to choices of model architectures and task settings as they employ surrogate regularization terms; see Sec. 5 and Appendix C.2 for more details. Meanwhile, some other works (Agarwal et al., 2018; Cotter et al., 2019) used an equivalent constrained optimization framework to enforce group fairness instead of using the regularization formulation. Namely, they consider a minimax problem of the Lagrangian function from the given constrained optimization problem and seek to find a saddle point through the alternative updates of model parameters and the Lagrangian variables. By doing so, they can successfully control the degree of fairness while maximizing the accuracy. However, their alternating optimization algorithms require severe computational costs due to the repetitive full training of the model. Furthermore, we empirically observed that they fail to find a feasible solution when applied to complex tasks in vision or NLP domains. A more detailed discussion is in Appendix B. As alternative in-processing methods, several re-weighting based methods (Kamiran & Calders, 2012; Jiang & Nachum, 2020; Roh et al., 2020; Agarwal et al., 2018) have also been proposed to address group fairness. Kamiran & Calders (2012) proposed a re-weighting scheme (RW) based on the ratio of the number of data points per each group. Recently, Label Bias Correction (LBC) (Jiang & Nachum, 2020) and FairBatch (Roh et al., 2020) have been developed, which adaptively adjust weights and mini-batch sizes based on the average loss per group, respectively. Perhaps, the most similar to our work is Agarwal et al. (2018) , which demonstrates that a Lagrangian formulation of fairness-constrained optimization problem can be reduced to a cost-sensitive classification problem

