SELECTIVE CLASSIFICATION CAN MAGNIFY DISPARITIES ACROSS GROUPS

Abstract

Selective classification, in which models can abstain on uncertain predictions, is a natural approach to improving accuracy in settings where errors are costly but abstentions are manageable. In this paper, we find that while selective classification can improve average accuracies, it can simultaneously magnify existing accuracy disparities between various groups within a population, especially in the presence of spurious correlations. We observe this behavior consistently across five vision and NLP datasets. Surprisingly, increasing abstentions can even decrease accuracies on some groups. To better understand this phenomenon, we study the margin distribution, which captures the model's confidences over all predictions. For symmetric margin distributions, we prove that whether selective classification monotonically improves or worsens accuracy is fully determined by the accuracy at full coverage (i.e., without any abstentions) and whether the distribution satisfies a property we call left-log-concavity. Our analysis also shows that selective classification tends to magnify full-coverage accuracy disparities. Motivated by our analysis, we train distributionally-robust models that achieve similar full-coverage accuracies across groups and show that selective classification uniformly improves each group on these models. Altogether, our results suggest that selective classification should be used with care and underscore the importance of training models to perform equally well across groups at full coverage.

1. INTRODUCTION

Selective classification, in which models make predictions only when their confidence is above a threshold, is a natural approach when errors are costly but abstentions are manageable. For example, in medical and criminal justice applications, model mistakes can have serious consequences, whereas abstentions can be handled by backing off to the appropriate human experts. Prior work has shown that, across a broad array of applications, more confident predictions tend to be more accurate (Hanczar & Dougherty, 2008; Yu et al., 2011; Toplak et al., 2014; Mozannar & Sontag, 2020; Kamath et al., 2020) . By varying the confidence threshold, we can select an appropriate trade-off between the abstention rate and the (selective) accuracy of the predictions made. In this paper, we report a cautionary finding: while selective classification improves average accuracy, it can magnify existing accuracy disparities between various groups within a population, especially in the presence of spurious correlations. We observe this behavior across five vision and NLP datasets and two popular selective classification methods: softmax response (Cordella et al., 1995; Geifman & El-Yaniv, 2017) and Monte Carlo dropout (Gal & Ghahramani, 2016) . Surprisingly, we find that increasing the abstention rate can even decrease accuracies on the groups that have lower accuracies at full coverage: on those groups, the models are not only wrong more frequently, but their confidence can actually be anticorrelated with whether they are correct. Even on datasets where selective classification improves accuracies across all groups, we find that it preferentially helps groups that already have high accuracies, further widening group disparities. These group disparities are especially problematic in the same high-stakes areas where we might want to deploy selective classification, like medicine and criminal justice; there, poor performance on particular groups is already a significant issue (Chen et al., 2020; Hill, 2020) . For example, we study a variant of CheXpert (Irvin et al., 2019) , where the task is to predict if a patient has pleural * Equal contribution 1

