L2B: LEARNING TO BOOTSTRAP FOR COMBATING LABEL NOISE

Abstract

Deep neural networks are powerful tools for representation learning, but can easily overfit to noisy labels which are prevalent in many real-world scenarios. Generally, noisy supervision could stem from variation among labelers, label corruption by adversaries, etc. To combat such label noises, one popular line of approach is to apply customized weights to the training instances, so that the corrupted examples contribute less to the model learning. However, such learning mechanisms potentially erase important information about the data distribution and therefore yield suboptimal results. To leverage useful information from the corrupted instances, an alternative is the bootstrapping loss, which reconstructs new training targets on-the-fly by incorporating the network's own predictions (i.e, pseudo-labels). In this paper, we propose a more generic learnable loss objective which enables a joint reweighting of instances and labels at once. Specifically, our method dynamically adjusts the per-sample importance weight between the real observed labels and pseudo-labels, where the weights are efficiently determined in a meta process. Compared to the previous instance reweighting methods, our approach concurrently conducts implicit relabeling, and thereby yield substantial improvements with almost no extra cost. Extensive experimental results demonstrated the strengths of our approach over existing methods on multiple natural and medical image benchmark datasets, including CIFAR-10, CIFAR-100, ISIC2019 and Clothing 1M. Code will be made publicly available.

1. INTRODUCTION

Recent advances in deep learning have achieved great success on various computer vision applications, where large-scale clean datasets are available. However, noisy labels or intentional label corruption by an adversarial rival could easily cause dramatic performance drop (Nettleton et al., 2010) . This problem is even more crucial in the medical field, given that the annotation quality requires great expertise. Therefore, understanding, modeling, and learning with noisy labels has gained great momentum in recent research efforts (Frénay & Verleysen, 2013; Natarajan et al., 2013; Han et al., 2019; Li et al., 2019; Liu et al., 2020; Jiang et al., 2018; Ren et al., 2018; Xue et al., 2019; Li et al., 2020; Wang et al., 2020; Zheng et al., 2021; Yao et al., 2021; Zhu et al., 2021; Wu et al., 2021; Zhou et al., 2021) . Existing methods of learning with noisy labels primarily take a loss correction strategy. One popular direction is to first estimate the noise corruption matrix and then use it to correct the loss function (Patrini et al., 2017; Goldberger & Ben-Reuven, 2017) . However, correctly estimating the noise corruption matrix is usually challenging and often involves assumptions about the noise generation process (Xia et al., 2019; Liu & Tao, 2015; Hendrycks et al., 2018) . Other research efforts focus on selecting clean samples from the noisy data (Jiang et al., 2018; Han et al., 2018; Yu et al., 2019; Fang et al., 2020 ) by treating samples with small loss as clean ones (Arpit et al., 2017) . Instead of directly discarding those "unclean" examples, an extension of this idea is focusing on assigning learnable weights to each example in the noisy training set (Ren et al., 2018; Shu et al., 2019) , where noisy samples have low weights. However, discarding or attending less to a subset of the training data (e.g., noisy samples) can erase important information about the data distribution. To fully exploit the corrupted training samples, another direction is to leverage the network predictions (i.e., pseudo-labels (Lee et al., 2013) ) to correct or reweight the original labels (Reed et al., 2014; Tanaka et al., 2018) , so that the holistic data distribution information could be preserved during network training. One representative work is the bootstrapping loss (Reed et al., 2014) , which introduces a perceptual consistency term in the learning objective that assigns a weight to the pseudo-labels to compensate for the erroneous guiding of noisy samples. While in this strategy, the weight for the pseudo-labels is manually selected and remains the same for all training samples, which does not prevent fitting the noisy ones and can even lead to low-quality label correction (Arazo et al., 2019) . To tackle this challenge, Arazo et al. (Arazo et al., 2019) designed a dynamic bootstrapping strategy to adjusts the label weight by fitting a mixture model. Instead of separately reweighting labels or instances, in this paper, we propose a more generic learning strategy to enable a joint instance and label reweighting. We term our method as Learning to Bootstrap (L2B), where we aim to leverage the learner's own predictions to bootstrap itself up for combating label noise from a meta-learning perspective. During each training iteration, L2B learns to dynamically re-balance the importance between the real observed labels and pseudo-labels, where the per-sample weights are determined by the validation performance on a separated clean set in a meta network. Unlike the bootstrapping loss used in (Reed et al., 2014; Arazo et al., 2019; Zhang et al., 2020) which explicitly conducts relabeling by taking a weighted sum of the pseudo-and the real label, L2B reweights the two losses associated with the pseudo-and the real label instead (where the weights need not be summed as 1). In addition, we theoretically prove that our formulation, which reweights different loss terms, can be reduced to the original bootstrapping loss and therefore conducts an implicit relabeling instead. By learning these weights in a meta-process, our L2B yields substantial improvement (e.g., +8.9% improvement on CIFAR-100 with 50% noise) compared with the instance reweighting baseline with almost no extra cost. We conduct extensive experiments on public natural image datasets (i.e., CIFAR-10, CIFAR-100, and Clothing 1M) and medical image dataset (i.e., ISIC2019), under different types of simulated noise and real-world noise. Our method outperforms various existing explicit label correction and instance reweighting works, demonstrating the strengths of our approach. Our main contributions are as follows: • We propose a generic learnable loss objective which enables a joint instance and label reweighting, for combating label noise in deep learning models. • We prove that our new objective is, in fact, a more general form of the bootstrapping loss, and propose L2B to efficiently solve for the weights in a meta-learning framework. • Compared with previous instance re-weighting methods, L2B exploits noisy examples more effectively without discarding them by jointly re-balancing the contribution of real and pseudo labels. • We show the theoretical convergence guarantees for L2B, and superior results on natural and medical image recognition tasks under both synthetic and real-world noise.

2. RELATED WORKS

Learning through explicit relabeling. To effectively handle noisy supervision, many works propose to directly correct the training labels through estimating the noise transition matrix (Xia et al., 2019; Yao et al., 2020; Goldberger & Ben-Reuven, 2017; Patrini et al., 2017) or modeling noise by graph models or neural networks (Xiao et al., 2015; Vahdat, 2017; Veit et al., 2017; Lee et al., 2018) . Patrini et al. (Patrini et al., 2017) estimate the label corruption matrix to directly correct the loss function. Hendrycks et al. (Hendrycks et al., 2018) further propose to improve the corruption matrix by using a clean set of data, which then enables training a corrected classifier. However, these methods usually require assumptions about noise modeling. For instance, Hendrycks et al. (Hendrycks et al., 2018) assume that the noisy label is only dependent on the true label and independent of the data. Another line of approaches proposes to leverage the network prediction for explicit relabeling. Some methods (Tanaka et al., 2018; Yi & Wu, 2019) relabel the samples by directly using pseudolabels in an iterative manner. Han et al. use generated prototypes as pseudo-labels to be more noise tolerant (Han et al., 2019) . Instead of assigning the pseudo-labels as supervision, Reed et al. (Reed et al., 2014) propose to generate new training targets by a convex combination of the real and pseudo labels. In a recent study, Ortego et al. (Ortego et al., 2021) directly apply this strategy for classification refinement, and combine it with contrastive learning for training noise-robust models. However,

