A TIME-CONSISTENCY CURRICULUM FOR LEARNING FROM INSTANCE-DEPENDENT NOISY LABELS

Abstract

Many machine learning algorithms are known to be fragile on simple instanceindependent noisy labels. However, noisy labels in real-world data are more devastating since they are produced by more complicated mechanisms in an instancedependent manner. In this paper, we target this practical challenge of Instance-Dependent Noisy Labels by jointly training (1) a model reversely engineering the noise generating mechanism, which produces an instance-dependent mapping between the clean label posterior and the observed noisy label; and (2) a robust classifier that produces clean label posteriors. Compared to previous methods, the former model is novel and enables end-to-end learning of the latter directly from noisy labels. An extensive empirical study indicates that the time-consistency of data is critical to the success of training both models and motivates us to develop a curriculum selecting training data based on their dynamics on the two models' outputs over the course of training. We show that the curriculum-selected data provide both clean labels and high-quality input-output pairs for training the two models. Therefore, it leads to promising and robust classification performance even in notably challenging settings of instance-dependent noisy labels where many SoTA methods could easily fail. Extensive experimental comparisons and ablation studies further demonstrate the advantages and significance of the time-consistency curriculum in learning from instance-dependent noisy labels on multiple benchmark datasets.

1. INTRODUCTION

The training of neural networks can easily fail in the presence of even the simple instance-independent noisy labels since they quickly lead to model overfitting of the noises (Zhang et al., 2017) . In practice, however, it is usually challenging to control the labeling quality of large-scale datasets because the labels were generated by complicated mechanisms such as non-expert workers (Han et al., 2020b ). An average of 3.3% noisy labels is identified in the test/validation sets of 10 of the most commonly-used datasets in computer vision, natural language, and audio analysis (Northcutt et al., 2021) . Moreover, real-world noisy labels are generated in an instance-dependent manner, which is significantly more challenging to address than the most widely studied but oversimplified instance-independent noises, which assume that the noise only depends on the class (Wei et al., 2022) . Two principal methodologies have been developed to address the label noises: (1) detecting samples (X, Ỹ ) with correct labels Ỹ = Y (empirically, they are the ones with the smallest loss values) and using them to train a clean classifier (Han et al., 2018b; Yu et al., 2019) ; (2) learning the noise generating mechanism, i.e., a transition matrix T defining the mapping between clean label Y and noisy label Ỹ such that P ( Ỹ | X) = T ⊤ P (Y | X), where P (• | X) denotes the posterior vector, and then using it to build statistically consistent classifiers (Liu & Tao, 2016; Patrini et al., 2017; Yang et al., 2021) . Although both methodologies have achieved promising results in the simplified instanceindependent (class-dependent) setting, they have non-trivial drawbacks when applied to the more practical but complicated instance-dependent noises: (1) the "small loss" trick is no longer effective in detecting correct labels (Cheng et al., 2021) because the loss threshold drastically varies across instances and is determined by each transition matrix T (X); (2) the instance-dependent transition matrix T (X) is not identifiable given only the noisy sample and it heavily relies on the estimation of clean label Y in the triple (X, Y, Ỹ ) (Yang et al., 2021) , which is an unsolved challenge in (1).

