PUSHING THE ACCURACY-GROUP ROBUSTNESS FRON-TIER WITH INTROSPECTIVE SELF-PLAY

Abstract

Standard empirical risk minimization (ERM) training can produce deep neural network (DNN) models that are accurate on average but underperform in underrepresented population subgroups, especially when there are imbalanced group distributions in the long-tailed training data. Therefore, approaches that improve the accuracy -group robustness tradeoff frontier of a DNN model (i.e. improving worst-group accuracy without sacrificing average accuracy, or vice versa) is of crucial importance. Uncertainty-based active learning (AL) can potentially improve the frontier by preferentially sampling underrepresented subgroups to create a more balanced training dataset. However, the quality of uncertainty estimates from modern DNNs tend to degrade in the presence of spurious correlations and dataset bias, compromising the effectiveness of AL for sampling tail groups. In this work, we propose Introspective Self-play (ISP), a simple approach to improve the uncertainty estimation of a deep neural network under dataset bias, by adding an auxiliary introspection task requiring a model to predict the bias for each data point in addition to the label. We show that ISP provably improves the bias-awareness of the model representation and the resulting uncertainty estimates. On two realworld tabular and language tasks, ISP serves as a simple "plug-in" for AL model training, consistently improving both the tail-group sampling rate and the final accuracy-fairness trade-off frontier of popular AL methods.

1. INTRODUCTION

Modern deep neural network (DNN) models are commonly trained on large-scale datasets (Deng et al., 2009; Raffel et al., 2020) . These datasets often exhibit an imbalanced long-tail distribution with many small population subgroups, reflecting the nature of the physical and social processes generating the data distribution (Zhu et al., 2014; Feldman & Zhang, 2020) . This imbalance in training data distribution, i.e., dataset bias, prevents deep neural network (DNN) models from generalizing equitably to the underrepresented population groups (Hasnain-Wynia et al., 2007) . Accuracy-Group Robustness Frontier: In response, the existing bias mitigation literature has focused on improving training procedures under a fixed and imbalanced training dataset, striving to balance performance between model accuracy and fairness (e.g., the average-case v.s. worst-group performance) (Agarwal et al., 2018; Martinez et al., 2020; 2021) . Formally, this goal corresponds to identifying an optimal model f ∈ F that attains the Pareto efficiency frontier of the accuracy-group robustness trade-off (e.g., see Figure 1 ), so that under the same training data D = {y i , x i } n i=1 , we cannot find another model f ∈ F that outperforms f in both accuracy and worst-group performance. In the literature, this accuracy-group robustness frontier is often characterized by a trade-off objective (Martinez et al., 2021) : f λ = arg min f ∈F F λ ( f |D); F λ ( f |D) := R acc ( f |D) + λ R robust ( f |D), where R acc and R robust are risk functions for a model's accuracy and group robustness (modeled here-in as worst-group accuracy), and λ > 0 a trade-off parameter. Then, f λ cannot be outperformed by any other f at the same trade-off level λ . The entire frontier under a dataset D can then be



Figure 1: Example of accuracyfairness frontier. Under a more balanced training data distribution, the model can attain a better accuracyfairness frontier (Red) when compared to training under an imbalanced distribution (Blue) at every tradeoff level λ (Equation (1)).

