LEARNING TO SPLIT FOR AUTOMATIC BIAS DETEC-TION

Abstract

Classifiers are biased when trained on biased datasets. As a remedy, we propose Learning to Split (ls), an algorithm for automatic bias detection. Given a dataset with input-label pairs, ls learns to split this dataset so that predictors trained on the training split cannot generalize to the testing split. This performance gap suggests that the testing split is under-represented in the dataset, which is a signal of potential bias. Identifying non-generalizable splits is challenging since we have no annotations about the bias. In this work, we show that the prediction correctness of each example in the testing split can be used as a source of weak supervision: generalization performance will drop if we move examples that are predicted correctly away from the testing split, leaving only those that are mispredicted. ls is task-agnostic and can be applied to any supervised learning problem, ranging from natural language understanding and image classification to molecular property prediction. Empirical results show that ls is able to generate astonishingly challenging splits that correlate with human-identified biases. Moreover, we demonstrate that combining robust learning algorithms (such as group DRO) with splits identified by ls enables automatic de-biasing. Compared to previous state-of-the-art, we substantially improve the worst-group performance (23.4% on average) when the source of biases is unknown during training and validation. Our code is included in the supplemental materials and will be publicly available.

1. INTRODUCTION

Recent work has shown promising results on de-biasing when the sources of bias (e.g., gender, race) are known a priori Ren et al. (2018) ; Sagawa et al. (2019) ; Clark et al. (2019); He et al. (2019); Mahabadi et al. (2020) ; Kaneko & Bollegala (2021) . However, in the general case, identifying bias in an arbitrary dataset may be challenging even for domain experts: it requires expert knowledge of the task and details of the annotation protocols Zellers et al. (2019); Sakaguchi et al. (2020) . In this work, we study automatic bias detection: given a dataset with only input-label pairs, our goal is to detect biases that may hinder predictors' generalization performance. We propose Learning to Split (ls), an algorithm that simulates generalization failure directly from the set of input-label pairs. Specifically, ls learns to split the dataset so that predictors trained on the training split cannot generalize to the testing split (Figure 1 ). This performance gap indicates that the testing split is under-represented among the set of annotations, which is a signal of potential bias. The challenge in this seemingly simple formulation lies in the existence of many trivial splits. For example, poor testing performance can result from a training split that is much smaller than the testing split (Figure 2a ). Classifiers will also fail if the training split contains all positive examples, leaving the testing split with only negative examples (Figure 2b ). The poor generalization of these trivial solutions arise from the lack of training data and label imbalance, and they do not reveal the hidden biases. To ensure that the learned splits are meaningful, we impose two regularity constraints on the splits. First, the size of the training split must be comparable to the size of the testing split. Second, the marginal distribution of the labels should be the similar across the splits. Our algorithm ls consists of two components, Splitter and Predictor. At each iteration, the Splitter first assigns each input-label pair to either the training split or the testing split. The Predictor then takes the training split and learns how to predict the label from the input. Its prediction performance on the testing split is used to guide the Splitter towards a more challenging split (under the regularity 1

