OUTLIER-ROBUST GROUP INFERENCE VIA GRADIENT SPACE CLUSTERING

Abstract

Traditional machine learning models focus on achieving good performance on the overall training distribution, but they often underperform on minority groups. Existing methods can improve the worst-group performance, but they can have several limitations: (i) they require group annotations, which are often expensive and sometimes infeasible to obtain, and/or (ii) they are sensitive to outliers. Most related works fail to solve these two issues simultaneously as they focus on conflicting perspectives of minority groups and outliers. We address the problem of learning group annotations in the presence of outliers by clustering the data in the space of gradients of the model parameters. We show that data in the gradient space has a simpler structure while preserving information about minority groups and outliers, making it suitable for standard clustering methods like DBSCAN. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art both in terms of group identification and downstream worst-group performance.

1. INTRODUCTION

Empirical Risk Minimization (ERM), i.e., the minimization of average training loss over the set of model parameters, is the standard training procedure in machine learning. It yields models with strong in-distribution performancefoot_0 but does not guarantee satisfactory performance on minority groups that contribute relatively few data points to the training loss function (Sagawa et al., 2019; Koh et al., 2021) . This effect is particularly problematic when the minority groups correspond to socially-protected groups. For example, in the toxic text classification task, certain identities are overwhelmingly abused in online conversations that form data for training models detecting toxicity (Dixon et al., 2018) . Such data lacks sufficient non-toxic examples mentioning these identities, yielding problematic and unfair spurious correlations -as a result ERM learns to associate these identities with toxicity (Dixon et al., 2018; Garg et al., 2019; Yurochkin & Sun, 2020) . A related phenomenon is subpopulation shift (Koh et al., 2021) , i.e., when the test distribution differs from the train distribution in terms of group proportions. Under subpopulation shift, poor performance on the minority groups in the train data translates into poor overall test distribution performance, where these groups are more prevalent or more heavily weighted. Subpopulation shift occurs in many application domains (Tatman, 2017; Beery et al., 2018; Oakden-Rayner et al., 2020; Santurkar et al., 2020; Koh et al., 2021) . Prior work offers a variety of methods for training models robust to subpopulation shift and spurious correlations, including group distributionally robust optimization (gDRO) (Hu et al., 2018; Sagawa et al., 2019) , importance weighting (Shimodaira, 2000; Byrd & Lipton, 2019) , subsampling (Sagawa et al., 2020; Idrissi et al., 2022; Maity et al., 2022) , and variations of tilted ERM (Li et al., 2020; 2021) . These methods are successful in achieving comparable performance across groups in the data, but they require group annotations. The annotations can be expensive to obtain, e.g., labeling spurious backgrounds in image recognition (Beery et al., 2018) or labeling identity mentions in the toxicity example. It also could be challenging to anticipate all potential spurious correlations in advance, e.g., it could be background, time of day, camera angle, or unanticipated identities subject to harassment.



I.e. low loss on test data drawn from the same distribution as the training dataset. 1

