ROBUST LEARNING VIA GOLDEN SYMMETRIC LOSS OF (UN)TRUSTED LABELS

Abstract

Learning robust deep models against noisy labels becomes ever critical when today's data is commonly collected from open platforms and subject to adversarial corruption. The information on the label corruption process, i.e., corruption matrix, can greatly enhance the robustness of deep models but still fall behind in combating hard classes. In this paper, we propose to construct a golden symmetric loss (GSL) based on the estimated confusion matrix as to avoid overfitting to noisy labels and learn effectively from hard classes. GSL is the weighted sum of the corrected regular cross entropy and reverse cross entropy. By leveraging a small fraction of trusted clean data, we estimate the corruption matrix and use it to correct the loss as well as to determine the weights of GSL. We theoretically prove the robustness of the proposed loss function in the presence of dirty labels. We provide a heuristics to adaptively tune the loss weights of GSL according to the noise rate and diversity measured from the dataset. We evaluate our proposed golden symmetric loss on both vision and natural language deep models subject to different types of label noise patterns. Empirical results show that GSL can significantly outperform the existing robust training methods on different noise patterns, showing accuracy improvement up to 18% on CIFAR-100 and 1% on real world noisy dataset of Clothing1M.

1. INTRODUCTION

Diverse datasets collected from the public domain which power up deep learning models present new challenges -highly noisy labels. It is not only time consuming to collect labels but also difficult to ensure a consistent label quality due to various annotation errors (Patrini et al., 2017) and adversarial attacks (Goodfellow et al., 2015) . The large capacity of deep learning models enables effective learning from complex datasets but also suffers from overfitting to the noise structure in the dataset. The curse of memorization effect (Jiang et al., 2018) can degrade the accuracy of deep learning models in the presence of highly noisy labels. For example, in (Zhang et al., 2017) the accuracy of AlexNet to classify CIFAR10 images drops from 77% to 10%, when there are randomly flipped labels. Designing learning models that can robustly train on noisy labels is thus imperative. To distill the impact of noisy labels, the related work either filters out suspiciously noisy data, derives robust loss functions or tries to proactively correct labels. Symmetric Cross entropy Loss (SCL) is shown effective in combating label noise especially for hard classes by combing the regular with the reverse cross entropy. The former avoids overfitting and the latter is resilient to label noise. Given its promising results, there is yet to have a clear principle on how to weight the regular and reverse cross entropy terms, e.g., at different noise rates and patterns. In contrast, Distilling (Li et al., 2017) and Golden Loss Correction (GLC) (Hendrycks et al., 2018) advocate to use a small clean data to improve the estimated corruption matrix. Specifically, GLC trains the deep model on both a clean and noisy set, whose loss is corrected through the corruption matrix. While the clean set is evenly chosen from all classes, the corrupted labels may appear unevenly across classes depending on the noise pattern (Xiao et al., 2015) . As the corrected loss of GLC does not differentiate the difficulty of classes, it may not learn those hard classes effectively. We propose GSL constructing the golden symmetric loss that dynamically weights regular/reverse cross entropy and corrects the label prediction based on the estimated corruption matrix. Similar to GLC, GSL leverages clean data to estimate the corruption matrix which is used to correct labels and decide the weights of the golden symmetric loss. As such, GSL can effectively differentiate the difficulty level of classes by adjusting the weights and mitigate the impact of noise overfitting via the golden symmetric cross entropy. Specifically, we use the noise rate and noise diversity to adaptively tune the weights of modified cross entropy and reverse cross entropy. We prove that modified cross entropy by using confusion matrix is noise tolerant same as the reverse cross entropy. Motivation example. We demonstrate the advantages and disadvantages of GLC and SCL, and the their combination (the proposed GSL) through the example of learning convolution networks on CIFAR-10 injected with 60% symmetric noise. The experimental setup is detailed in §6. Figure 1 shows the corruption matrix of the injected noise and the confusion matrices from the predictions of SCL, GLC, and GSL. Even if the injected noise is symmetric across all classes (see Figure 1a ), prediction errors are distributed asymmetrically across the classes (see Figure 1b ). Though GLC can achieve a lower average error rate than SCL (reflected in darker diagonal elements on average), it performs worse in hard classes, e.g., class 4 (cat) and class 6 (dog) (difference in blue shades across the diagonal elements). By setting up proper weights for two types of cross entropy, GSL is able to achieve both superior average and per class accuracy.

2. RELATED WORK

Enhancing the robustness of deep models against noisy labels is an active research area. The massive datasets needed to train deep models are commonly found corrupted, (Wang et al., 2018) , severely degrading the achievable accuracy, (Zhang et al., 2017) . The impact of label noise on deep neural networks is first characterized by the theoretical testing accuracy over a limited set of noise patterns (Chen et al., 2019) . (Vahdat, 2017) suggest an undirected graph model for modeling label noise in deep neural networks and indicate symmetric noise to be more challenging than asymmetric. Solutions of the prior art can be categorized into three directions: (i) filtering out noisy labels: (Malach & Shalev-Shwartz, 2017; Han et al., 2018b; Yu et al., 2019; Wang et al., 2018) ; (ii) correcting noisy labels: (Patrini et al., 2017; Hendrycks et al., 2018; Li et al., 2017) ; and (iii) deriving noise resilient loss functions: (Ma et al., 2018; Konstantinov & Lampert, 2019) . Noise Resilient Loss Function. The loss function is modified to enhance the robustness to label noise by introducing new loss functions, (Ghosh et al., 2017; Wang et al., 2019) , or adjusting the weights of noisy data instances, (Ren et al., 2018b; Konstantinov & Lampert, 2019; Ma et al., 2018) 



Figure 1: Noise corruption matrix and confusion matrices of predictions for CIFAR-10 with 60% symmetric label noise.

based on the trustworthiness level of data sources.(Wang et al., 2019)  propose symmetric cross-entropy loss that combines a new term of reverse cross entropy with traditional cross entropy via constant weights on both terms. Meta-Weight-Net(Shu et al., 2019)  re-weights samples during optimizing loss function in the training process by using a multi-layer perceptron to predict the weight of each sample. With the same perspective,(Ren et al., 2018a)  uses the similarity of samples to the clean instances in the validation set for re-weighting them in loss function.

