IMPROVING THE ESTIMATION OF INSTANCE-DEPENDENT TRANSITION MATRIX BY USING SELF-SUPERVISED LEARNING Anonymous authors Paper under double-blind review

Abstract

The transition matrix reveals the transition relationship between clean labels and noisy labels. It plays an important role in building statistically consistent classifiers. In real-world applications, the transition matrix is usually unknown and has to be estimated. It is a challenging task to accurately estimate the transition matrix, especially when it depends on the instance. Given that both instances and noisy labels are available, the major difficulty of learning the transition matrix comes from the missing of clean information. A lot of methods have been proposed to infer clean information. The self-supervised learning has demonstrated great success. These methods could even achieve comparable performance with supervised learning on some datasets but without requiring any labels during the training. It implies that these methods can efficiently infer clean labels. Motivated by this, in this paper, we have proposed a practical method that leverages self-supervised learning to obtain nearly clean labels to help the learning of instance-dependent transition matrix. Empirically, the proposed method has achieved state-of-the-art performance on different datasets.

1. INTRODUCTION

Recently, more researchers in the deep learning community place emphasis on learning with noisy labels (Jiang et al., 2018; Liu, 2021; Yao et al., 2021b; Bai et al., 2021; Ciortan et al., 2021) . This is because manually annotating large-scale datasets is labor-intensive and time-consuming. Then some cheap but imperfect methods, e.g., crowdsourcing and web crawling, have been used to collect largescale datasets which usually contain label errors. Existing work shows that training deep learning models on these datasets can lead to performance degeneration, because deep models can memorize the noisy labels easily (Han et al., 2018; Bai et al., 2021) . How to improve the robustness of deep models when training data containing label errors becomes an important research topic. To learn a classifier robust to label noise, there are two streams of methods, i.e., statistically inconsistent methods and statistically consistent methods. The statistically inconsistent methods mainly focus on designing heuristics to reduce the negative effect of label noise (Nguyen et al., 2019; Li et al., 2019; 2020; Wei et al., 2020; Bai et al., 2021; Yao et al., 2021a) . These methods have demonstrated strong empirical performance but usually require expensive hyper-parameter tuning and do not provide statistical guarantees. To address the limitation, another stream of methods focusing on designing classifier-consistent algorithms (Liu & Tao, 2015; Patrini et al., 2017; Xia et al., 2020; Li et al., 2021) by exploiting the noise transition matrix T (x) ∈ R C×C , where T ij (x) = P ( Ỹ = j|Y = i, X = x), X denotes the random variable for instances or features, Ỹ denotes the noisy label, Y denotes the clean label, and C denotes the number of classes. When the transition matrix is given, the optimal classifier defined on the clean domain can be learned by utilizing noisy data only (Liu & Tao, 2015; Xia et al., 2019) . In real-world applications, the instance-dependent transition matrix T (x) is usually unknown and has to be learned. It is still a challenging task to accurately learn T (x) (Li et al., 2019; Yao et al., 2020) . The reason is that to accurately learn T (x), the instance X, the noisy label Ỹ and the clean label Y generally have to be given. However, for the dataset containing label errors, clean labels usually are not available. In general, without any other assumptions, to learn the transition matrix

