MITIGATING DATASET BIAS BY USING PER-SAMPLE GRADIENT

Abstract

The performance of deep neural networks is strongly influenced by the training dataset setup. In particular, when attributes with a strong correlation with the target attribute are present, the trained model can provide unintended prejudgments and show significant inference errors (i.e., the dataset bias problem). Various methods have been proposed to mitigate dataset bias, and their emphasis is on weakly correlated samples, called bias-conflicting samples. These methods are based on explicit bias labels provided by humans. However, such methods require human costs. Recently, several studies have sought to reduce human intervention by utilizing the output space values of neural networks, such as feature space, logits, loss, or accuracy. However, these output space values may be insufficient for the model to understand the bias attributes well. In this study, we propose a debiasing algorithm leveraging gradient called Per-sample Gradient-based Debiasing (PGD). PGD is comprised of three steps: (1) training a model on uniform batch sampling, (2) setting the importance of each sample in proportion to the norm of the sample gradient, and (3) training the model using importance-batch sampling, whose probability is obtained in step (2). Compared with existing baselines for various datasets, the proposed method showed state-of-the-art accuracy for the classification task. Furthermore, we describe theoretical understandings of how PGD can mitigate dataset bias.

1. INTRODUCTION

Dataset bias (Torralba & Efros, 2011; Shrestha et al., 2021) is a bad training dataset problem that occurs when unintended easier-to-learn attributes (i.e., bias attributes), having a high correlation with the target attribute, are present (Shah et al., 2020; Ahmed et al., 2020) . This is due to the fact that the model can infer outputs by focusing on the bias features, which could lead to testing failures. For example, most "camel" images include a "desert background," and this unintended correlation can provide a false shortcut for answering "camel" on the basis of the "desert." In (Nam et al., 2020; Lee et al., 2021) , samples of data that have a strong correlation (like the aforementioned desert/camel) are called "bias-aligned samples," while samples of data that have a weak correlation (like "camel on the grass" images) are termed "bias-conflicting samples." To reduce the dataset bias, initial studies (Kim et al., 2019; McDuff et al., 2019; Singh et al., 2020; Li & Vasconcelos, 2019) have frequently assumed a case where labels with bias attributes are provided, but these additional labels provided through human effort are expensive. Alternatively, the bias-type, such as "background," is assumed in (Lee et al., 2019; Geirhos et al., 2018; Bahng et al., 2020; Cadene et al., 2019; Clark et al., 2019) . However, assuming biased knowledge from humans is still unreasonable, since even humans cannot predict the type of bias that may exist in a large dataset (Schäfer, 2016) . Data for deep learning is typically collected by web-crawling without thorough consideration of the dataset bias problem. Recent studies (Le Bras et al., 2020; Nam et al., 2020; Kim et al., 2021; Lee et al., 2021; Seo et al., 2022; Zhang et al., 2022b) have replaced human intervention with DNN results. They have identified bias-conflicting samples by using empirical metrics for output space (e.g., training loss and accuracy). For example, Nam et al. ( 2020) suggested a "relative difficulty" based on per-sample training loss and thought that a sample with a high "relative difficulty" was a bias-conflicting sample. Most of the previous research has focused on the output space, such as feature space (penultimate layer output) (Lee et al., 2021; Kim et al., 2021; Bahng et al., 2020; Seo et al., 2022; Zhang et al., 2022b) , loss (Nam et al., 2020) , and accuracy (Le Bras et al., 2020; Liu et al., 2021) . However, this limited output space can impose restrictions on describing the data in detail. Recently, as an alternative, model parameter space (e.g., gradient (Huang et al., 2021; Killamsetty et al., 2021b; Mirzasoleiman et al., 2020) ) has been used to obtain high-performance gains compared to output space approaches for various target tasks. For example, Huang et al. ( 2021) used gradientnorm to detect out-of-distribution detection samples and showed that the gradient of FC layer ∈ R h×c could capture joint information between feature and softmax output, where h and c are the dimensions of feature and output vector, respectively. Since the gradients of each data point ∈ R h×c constitute high-dimensional information, it is much more informative than the output space, such as logit ∈ R c and feature ∈ R h . However, there is no approach to tackle the dataset bias problem using a gradient norm-based metric. In this paper, we present a resampling method from the perspective of the per-sample gradient norm to mitigate dataset bias. Furthermore, we theoretically justify that the gradient-norm-based resampling method can be an excellent debiasing approach. Our key contributions can be summarized as follows: • We propose Per-sample Gradient-norm based Debiasing (PGD), a simple and efficient gradientnorm-based debiasing method. PGD is motivated by prior research demonstrating (Mirzasoleiman et al., 2020; Huang et al., 2021; Killamsetty et al., 2021b) that gradient is effective at finding rare samples, and it is also applicable to finding the bias-conflicting samples in the dataset bias problem (See Section 3 and Appendix E). • PGD outperforms other dataset bias methods on various benchmarks, such as colored MNIST (CM-NIST), multi-bias MNIST (MBMNIST), corrupted CIFAR (CCIFAR), biased action recognition (BAR), biased FFHQ (BFFHQ), CelebA, and CivilComments-WILD. In particular, for the colored MNIST case, the proposed method yielded higher unbiased test accuracies compared with the vanilla and the best methods by 35.94% and 2.32%, respectively. (See Section 4) • We provide theoretical evidence of the superiority of PGD. To this end, we first explain that minimizing the trace of inverse Fisher information is a good objective to mitigate dataset bias. In particular, PGD, resampling based on the gradient norm computed by the biased model, is a possible optimizer for mitigating the dataset bias problem. (See Section 5) Dataset bias. Let us suppose that a training set, D n , is comprised of images, as shown in Figure 1 , and that the objective is to classify the digits. Each image can be described by a set of attributes, (e.g., for the first image in Figure 1 , it can be {digit 0, red, thin,...}). The purpose of the training classifier is to find a model parameter θ that correctly predicts the target attributes, (e.g., digit). Notably, the target attributes are also interpreted as classes. However, we focus on a case wherein another attribute that is strongly correlated to the target exists, and we call these attributes bias attributes. For example, in Figure 1 , the bias attribute is color. Furthermore, samples



Classification model. We first describe the conventional supervised learning setting. Let us consider the classification problem when a training dataset D n = {(x i , y i )} n i=1 , with input image x i and corresponding label y i , is given. Assuming that there are c ∈ N \ {1} classes, y i is assigned to the one element in set C = {1, ..., c}. Note that we focus on a situation where dataset D n does not have noisy samples, for example, noisy labels or out-of-distribution samples (e.g., SVHN samples when the task is CIFAR-10). When input x i is given, f (y i |x i , θ) represents the softmax output of the classifier for label y i . This is derived from the model parameter θ ∈ R d . The cross-entropy (CE) loss L CE is frequently used to train the classifier, and it is defined as L CE (x i , y i ; θ) = -log f (y i |x i , θ) when the label is one-hot encoded.

Figure 1: Target and bias attribute: digit shape, color.

