PUSHING THE ACCURACY-GROUP ROBUSTNESS FRON-TIER WITH INTROSPECTIVE SELF-PLAY

Abstract

Standard empirical risk minimization (ERM) training can produce deep neural network (DNN) models that are accurate on average but underperform in underrepresented population subgroups, especially when there are imbalanced group distributions in the long-tailed training data. Therefore, approaches that improve the accuracy -group robustness tradeoff frontier of a DNN model (i.e. improving worst-group accuracy without sacrificing average accuracy, or vice versa) is of crucial importance. Uncertainty-based active learning (AL) can potentially improve the frontier by preferentially sampling underrepresented subgroups to create a more balanced training dataset. However, the quality of uncertainty estimates from modern DNNs tend to degrade in the presence of spurious correlations and dataset bias, compromising the effectiveness of AL for sampling tail groups. In this work, we propose Introspective Self-play (ISP), a simple approach to improve the uncertainty estimation of a deep neural network under dataset bias, by adding an auxiliary introspection task requiring a model to predict the bias for each data point in addition to the label. We show that ISP provably improves the bias-awareness of the model representation and the resulting uncertainty estimates. On two realworld tabular and language tasks, ISP serves as a simple "plug-in" for AL model training, consistently improving both the tail-group sampling rate and the final accuracy-fairness trade-off frontier of popular AL methods.

1. INTRODUCTION

Modern deep neural network (DNN) models are commonly trained on large-scale datasets (Deng et al., 2009; Raffel et al., 2020) . These datasets often exhibit an imbalanced long-tail distribution with many small population subgroups, reflecting the nature of the physical and social processes generating the data distribution (Zhu et al., 2014; Feldman & Zhang, 2020) . This imbalance in training data distribution, i.e., dataset bias, prevents deep neural network (DNN) models from generalizing equitably to the underrepresented population groups (Hasnain-Wynia et al., 2007) . Accuracy-Group Robustness Frontier: In response, the existing bias mitigation literature has focused on improving training procedures under a fixed and imbalanced training dataset, striving to balance performance between model accuracy and fairness (e.g., the average-case v.s. worst-group performance) (Agarwal et al., 2018; Martinez et al., 2020; 2021) . Formally, this goal corresponds to identifying an optimal model f ∈ F that attains the Pareto efficiency frontier of the accuracy-group robustness trade-off (e.g., see Figure 1 ), so that under the same training data D = {y i , x i } n i=1 , we cannot find another model f ∈ F that outperforms f in both accuracy and worst-group performance. In the literature, this accuracy-group robustness frontier is often characterized by a trade-off objective (Martinez et al., 2021) : f λ = arg min f ∈F F λ ( f |D); F λ ( f |D) := R acc ( f |D) + λ R robust ( f |D), where R acc and R robust are risk functions for a model's accuracy and group robustness (modeled here-in as worst-group accuracy), and λ > 0 a trade-off parameter. Then, f λ cannot be outperformed by any other f at the same trade-off level λ . The entire frontier under a dataset D can then be characterized by finding f λ that minimizes the robustness-accuracy objective (1) at every trade-off level λ , and tracing out its (R acc , R robust ) performances (Figure 1 ). Goal: However, the limited size of the tail-group examples restricts the DNN model's worst-group performance, leading to a compromised accuracy-group robustness frontier (Zhao & Gordon, 2019; Dutta et al., 2020) , and thus we ask: Under a fixed learning algorithm, can we meaningfully push the model's accuracy-group robustness frontier by improving the training data distribution using active learning? That is, denoting by D α,n = {(y i , x i )} n i=1 a training dataset with K subgroups and the group size distribution α = [α 1 , . . . , α K ], we study whether a model's accuracy-group robustness performance F λ can be improved by rebalancing the group distribution of the training data D α,n , i.e., we seek to optimize an outer problem: minimize α∈∆ |G | min f ∈F F λ ( f |D α,n ) , where ∆ K is the simplex of all possible group distributions (Rolf et al 2020) suggest that this failure mode in DNN uncertainty can be caused by an issue in representation learning known as feature collapse, where the DNN over-focuses on correlational features that help to distinguish between output classes on the training data, but ignore the non-predictive but semantically meaningful input features that are important for uncertainty quantification (Figure 2 ). In this work, we show that this failure mode can be provably mitigated by a training procedure we term introspective training (Section 2). Briefly, introspective training adds an auxiliary introspection task to model training, asking the model to predict whether an example belongs to an underrepresented group. It comes with a guarantee in injecting bias-awareness into model representation (Proposition 1), encouraging it to learn diverse hidden features that distinguish the minority-group examples from the majority, even if these features are not correlated with the training labels. Hence it can serve as a simple "plug-in" to the training procedure of any active learning method, leading to improved uncertainty quality for tail groups (Figure 2 ). Contributions: In summary, our contributions are: In particular, it confirms the theoretical necessity of up-sampling the underrepresented groups for obtaining the optimal accuracy-group robustness frontier, and reveals that underrepresentation is in fact caused by an interplay of the subgroup's learning difficulty and its prevalence in the population. • Empirical Effectiveness. Under two challenging real-world tasks (census income prediction and toxic comment detection), we empirically validate the effectiveness of ISP in improving the performance of AL with a DNN model under dataset bias (Section 4). For both classic and stateof-the-art uncertainty-based AL methods, ISP improves tail-group sampling rate, meaningfully pushing the accuracy-group robustness frontier of the final model.



Figure 1: Example of accuracyfairness frontier. Under a more balanced training data distribution, the model can attain a better accuracyfairness frontier (Red) when compared to training under an imbalanced distribution (Blue) at every tradeoff level λ (Equation (1)).

., 2021). Our key observation is that given a sampling model with well-calibrated uncertainty (i.e., the model uncertainty is well-correlated with generalization error), active learning (AL) can preferentially acquire tail-group examples from unlabelled data without needing group annotations, and add them to the training data to reach a more balanced data distribution(Branchaud-Charron et al., 2021). Appendix A.5 discusses the connection between group robustness with fairness.

We introduce Introspective Self-play (ISP), a simple training approach to improve a DNN model's uncertainty quality for underrepresented groups (Section 2). Using group annotations from the training data, ISP conducts introspective training to provably improve a DNN's representation and uncertainty quality for the tail groups. When group annotations are not available, ISP can be combined with a cross-validation-based self-play procedure that uses a noise-bias-variance decomposition of the model's generalization error(Domingos, 2000).• Theoretical Analysis. We theoretically analyze the optimization problem in Equation (2) under a group-specific learning rate model (Rolf et al., 2021) (Section 3). Our result elucidates the dependence of the group distribution α in the model's best-attainable accuracy-group robustness frontier F λ .

