IMPROVING THE ESTIMATION OF INSTANCE-DEPENDENT TRANSITION MATRIX BY USING SELF-SUPERVISED LEARNING Anonymous authors Paper under double-blind review

Abstract

The transition matrix reveals the transition relationship between clean labels and noisy labels. It plays an important role in building statistically consistent classifiers. In real-world applications, the transition matrix is usually unknown and has to be estimated. It is a challenging task to accurately estimate the transition matrix, especially when it depends on the instance. Given that both instances and noisy labels are available, the major difficulty of learning the transition matrix comes from the missing of clean information. A lot of methods have been proposed to infer clean information. The self-supervised learning has demonstrated great success. These methods could even achieve comparable performance with supervised learning on some datasets but without requiring any labels during the training. It implies that these methods can efficiently infer clean labels. Motivated by this, in this paper, we have proposed a practical method that leverages self-supervised learning to obtain nearly clean labels to help the learning of instance-dependent transition matrix. Empirically, the proposed method has achieved state-of-the-art performance on different datasets.

1. INTRODUCTION

Recently, more researchers in the deep learning community place emphasis on learning with noisy labels (Jiang et al., 2018; Liu, 2021; Yao et al., 2021b; Bai et al., 2021; Ciortan et al., 2021) . This is because manually annotating large-scale datasets is labor-intensive and time-consuming. Then some cheap but imperfect methods, e.g., crowdsourcing and web crawling, have been used to collect largescale datasets which usually contain label errors. Existing work shows that training deep learning models on these datasets can lead to performance degeneration, because deep models can memorize the noisy labels easily (Han et al., 2018; Bai et al., 2021) . How to improve the robustness of deep models when training data containing label errors becomes an important research topic. To learn a classifier robust to label noise, there are two streams of methods, i.e., statistically inconsistent methods and statistically consistent methods. The statistically inconsistent methods mainly focus on designing heuristics to reduce the negative effect of label noise (Nguyen et al., 2019; Li et al., 2019; 2020; Wei et al., 2020; Bai et al., 2021; Yao et al., 2021a) . These methods have demonstrated strong empirical performance but usually require expensive hyper-parameter tuning and do not provide statistical guarantees. To address the limitation, another stream of methods focusing on designing classifier-consistent algorithms (Liu & Tao, 2015; Patrini et al., 2017; Xia et al., 2020; Li et al., 2021) by exploiting the noise transition matrix T (x) ∈ R C×C , where T ij (x) = P ( Ỹ = j|Y = i, X = x), X denotes the random variable for instances or features, Ỹ denotes the noisy label, Y denotes the clean label, and C denotes the number of classes. When the transition matrix is given, the optimal classifier defined on the clean domain can be learned by utilizing noisy data only (Liu & Tao, 2015; Xia et al., 2019) . In real-world applications, the instance-dependent transition matrix T (x) is usually unknown and has to be learned. It is still a challenging task to accurately learn T (x) (Li et al., 2019; Yao et al., 2020) . The reason is that to accurately learn T (x), the instance X, the noisy label Ỹ and the clean label Y generally have to be given. However, for the dataset containing label errors, clean labels usually are not available. In general, without any other assumptions, to learn the transition matrix for an instance, its clean-label information has to be given. Then existing methods hope some cleanlabel information can be inferred to learn T (x) (Xia et al., 2019; Yang et al., 2022; Li et al., 2021 ). We will discuss the details in Section 2. Recently, the classification model based on self-supervised learning has demonstrated comparable performance with supervised learning on some benchmark datasets (He et al., 2020; Niu et al., 2021) . This implies that self-supervised learning has a strong ability to infer clean labels. Motivated by this, in this paper, we propose CoNL (Contrastive label-Noise Learning), which leverages the self-supervised technique to learn the instance-dependent transition matrix. In CoNL, it contains two main stages: contrastive co-selecting and constraint T (x) revision which are as follows: • We propose contrastive co-selecting, which utilizes the visual representations learned by contrastive learning to select confident examples without employing noisy labels. In such a way, the learned visual representations will be less influenced by label errors. The empirical results for both transition-matrix learning and classification have demonstrated the strong performance with different types and levels of label noise on three synthetic IDN datasets (MNIST, CIFAR10, SVHN) and one real-world noisy dataset (CIFAR-10N). The rest of this paper is organized as follows. In Sec. 2, we review related work on label-noise learning especially modeling noisy labels and contrastive learning. In Sec. 3, we discuss how to leverage contrastive learning to learn the instant-dependent transition matrix better. In Sec. 4, we provide the empirical evaluations of the proposed method. In Sec. 5, we conclude our paper.

2. LABEL-NOISE LEARNING AND CONTRASTIVE LEARNING

Problem setting. Let D be the distribution of a noisy example (X, Ỹ ) ∈ X ×{1, . . . , C}, where X denotes the variable of instances, Ỹ the variable of noisy labels, X the feature space, {1, . . . , C} the label space, and C the size of classes. In learning with noisy labels, clean labels are not available, given a noisy training sample S = {x i , ỹi } N i=1 independently drawn from D, the aim is to learn a robust classifier from the sample S. The noise transition matrix T (x). The transition matrix T (x) has been widely used to model label-noise generation. The ij-th entry of the transition matrix T ij (x) = P ( Ỹ = j|Y = i, X = x) represents the possibility of the clean label Y = i of instance x flip to the noisy label Ỹ = j. Existed methods can learn statistically consistent classifiers when the transition matrix is given (Liu & Tao, 2015; Goldberger & Ben-Reuven, 2017; Yu et al., 2018; Xia et al., 2019; 2020; Li et al., 2021) In general, the transition matrices are not given and need to be estimated. Without any other assumptions, to learn the transition matrix for an instance, its clean-label information has to be given (Xia et al., 2019; Yang et al., 2022) Learning the transition matrix T (x). The clean-label information is crucial for learning the transition matrix. To learn transition matrices for all instances, 1). existing methods first learn some of the transition matrices in a training sample by inferring the clean-label information. 2). Then, by



. The reason is that, the clean-class posterior P (Y |X) can be inferred by using the transition matrix and the noisy-class posterior P ( Ỹ |X) (Patrini et al., 2017), i.e., T (x)[P (Y = 1|x), . . . , P (Y = C|x)] ⊤ = [P ( Ỹ = 1|x), . . . , P ( Ỹ = C|x)] ⊤ .

To select confident examples defined on the clean domain, we learn two classifiers estimating P (Y |X) and two transition matrices simultaneously by employing noisy labels and learned representations. We also encourage the two classifiers to have different learning abilities by training them with the representations obtained from strong and weak data augmentations, respectively. Then they can learn different types of confident examples and be robust to different noise rates. Combining two classifiers can obtain more confident examples. • We propose constraint T (x) revision, which refines the learned transition matrix by employing the selected confident examples. Based on the philosophy that the favorable transition matrix would make the classification risks on both clean data and noisy data small.

