L2B: LEARNING TO BOOTSTRAP FOR COMBATING LABEL NOISE

Abstract

Deep neural networks are powerful tools for representation learning, but can easily overfit to noisy labels which are prevalent in many real-world scenarios. Generally, noisy supervision could stem from variation among labelers, label corruption by adversaries, etc. To combat such label noises, one popular line of approach is to apply customized weights to the training instances, so that the corrupted examples contribute less to the model learning. However, such learning mechanisms potentially erase important information about the data distribution and therefore yield suboptimal results. To leverage useful information from the corrupted instances, an alternative is the bootstrapping loss, which reconstructs new training targets on-the-fly by incorporating the network's own predictions (i.e, pseudo-labels). In this paper, we propose a more generic learnable loss objective which enables a joint reweighting of instances and labels at once. Specifically, our method dynamically adjusts the per-sample importance weight between the real observed labels and pseudo-labels, where the weights are efficiently determined in a meta process. Compared to the previous instance reweighting methods, our approach concurrently conducts implicit relabeling, and thereby yield substantial improvements with almost no extra cost. Extensive experimental results demonstrated the strengths of our approach over existing methods on multiple natural and medical image benchmark datasets, including CIFAR-10, CIFAR-100, ISIC2019 and Clothing 1M. Code will be made publicly available.

1. INTRODUCTION

Recent advances in deep learning have achieved great success on various computer vision applications, where large-scale clean datasets are available. However, noisy labels or intentional label corruption by an adversarial rival could easily cause dramatic performance drop (Nettleton et al., 2010) . This problem is even more crucial in the medical field, given that the annotation quality requires great expertise. Therefore, understanding, modeling, and learning with noisy labels has gained great momentum in recent research efforts (Frénay & Verleysen, 2013; Natarajan et al., 2013; Han et al., 2019; Li et al., 2019; Liu et al., 2020; Jiang et al., 2018; Ren et al., 2018; Xue et al., 2019; Li et al., 2020; Wang et al., 2020; Zheng et al., 2021; Yao et al., 2021; Zhu et al., 2021; Wu et al., 2021; Zhou et al., 2021) . Existing methods of learning with noisy labels primarily take a loss correction strategy. One popular direction is to first estimate the noise corruption matrix and then use it to correct the loss function (Patrini et al., 2017; Goldberger & Ben-Reuven, 2017) . However, correctly estimating the noise corruption matrix is usually challenging and often involves assumptions about the noise generation process (Xia et al., 2019; Liu & Tao, 2015; Hendrycks et al., 2018) . Other research efforts focus on selecting clean samples from the noisy data (Jiang et al., 2018; Han et al., 2018; Yu et al., 2019; Fang et al., 2020 ) by treating samples with small loss as clean ones (Arpit et al., 2017) . Instead of directly discarding those "unclean" examples, an extension of this idea is focusing on assigning learnable weights to each example in the noisy training set (Ren et al., 2018; Shu et al., 2019) , where noisy samples have low weights. However, discarding or attending less to a subset of the training data (e.g., noisy samples) can erase important information about the data distribution. To fully exploit the corrupted training samples, another direction is to leverage the network predictions (i.e., pseudo-labels (Lee et al., 2013) ) to correct or reweight the original labels (Reed et al., 1 

