CLASS2SIMI: A NEW PERSPECTIVE ON LEARNING WITH LABEL NOISE

Abstract

Label noise is ubiquitous in the era of big data. Deep learning algorithms can easily fit the noise and thus cannot generalize well without properly modeling the noise. In this paper, we propose a new perspective on dealing with label noise called "Class2Simi". Specifically, we transform the training examples with noisy class labels into pairs of examples with noisy similarity labels, and propose a deep learning framework to learn robust classifiers with the noisy similarity labels. Note that a class label shows the class that an instance belongs to; while a similarity label indicates whether or not two instances belong to the same class. It is worthwhile to perform the transformation: We prove that the noise rate for the noisy similarity labels is lower than that of the noisy class labels, because similarity labels themselves are robust to noise. For example, given two instances, even if both of their class labels are incorrect, their similarity label could be correct. Due to the lower noise rate, Class2Simi achieves remarkably better classification accuracy than its baselines that directly deals with the noisy class labels.

1. INTRODUCTION

It is expensive to label large-scale data accurately. Therefore, cheap datasets with label noise are ubiquitous in the era of big data. However, label noise will degenerate the performance of trained deep models, because deep networks will easily overfit label noise (Zhang et al., 2017; Zhong et al., 2019; Li et al., 2019; Yi & Wu, 2019; Zhang et al., 2019; 2018; Xia et al., 2019; 2020) . In this paper, we propose a new perspective on handling label noise called "Class2Simi", i.e., transforming training examples with noisy class labels into pairs of examples with noisy similarity labels. A class label shows the class that an instance belongs to, while a similarity label indicates whether or not two instances belong to the same class. This transformation is motivated by the observation that the noise rate becomes lower, e.g., even if two instances have incorrect class labels, their similarity label could be correct. In the label-noise learning community, a lower noise rate usually results in higher classification performance (Han et al., 2018b; Patrini et al., 2017) . Specifically, we illustrate the transformation and the robustness of similarity labels in Figure 1 . Assume we have eight noisy examples {(x 1 , ȳ1 ), . . . , (x 8 , ȳ8 )} as shown in the upper part of the middle column. Their labels are of four classes, i.e., {1, 2, 3, 4}. The labels marked in red are incorrect labels. We transform the 8 examples into 8 × 8 example-pairs with noisy similarity labels as shown in the bottom part of the middle column, where the similarity label 1 means the two instances have the same class label and 0 means the two instances have different class labels. We present the latent clean class labels and similarity labels in the left column. In the middle column, we can see that although the instances x 2 and x 4 both have incorrect class labels, the similarity label of the example-pair (x 2 , x 4 ) is correct. Similarity labels are robust because they further consider the information on the pairwise relationship. We prove that the noise rate in the noisy similarity labels is lower than that of the noisy class labels. For example, if we assume that the noisy class labels in Figure 1 are generated according to the latent clean labels and the transition matrix shown in the upper part of the right column (the ij-th entry of the matrix denotes the probability that the clean class label i flips into the noisy class label j), the noise rate for the noisy class labels is 0.5 while the rate for the corresponding noisy similarity labels is 0.25. Note that the noise rate is the ratio of the number of incorrect labels to the number of total examples, which can be calculated from the noise transition matrix combined with the proportion of each class, i.e., 1/6 × 3/4 + 1/2 × 1/4 = 0.25.

