ROBUST LOSS FUNCTIONS FOR COMPLEMENTARY LABELS LEARNING

Abstract

In ordinary-label learning, the correct label is given to each training sample. 1 Similarly, a complementary label is also provided for each training sample in 2 complementary-label learning. A complementary label indicates a class that the 3 example does not belong to. Robust learning of classifiers has been investi-4 gated from many viewpoints under label noise, but little attention has been paid 5 to complementary-label learning. In this paper, we present a new algorithm of 6 complementary-label learning with the robustness of loss function. We also pro-7 vide two sufficient conditions on a loss function so that the minimizer of the risk 8 for complementary labels is theoretically guaranteed to be consistent with the min-9 imizer of the risk for ordinary labels. Finally, the empirical results validate our 10 method's superiority to current state-of-the-art techniques. Especially in cifar10, 11 our algorithm achieves a much higher test accuracy than the gradient ascent algo-12



rithm, and the parameters of our model are less than half of the ResNet-34 they 13 used. Label is only indicating that the class label of a sample is incorrect. In the view of label noise, 28 complementary labels can also be viewed as noise labels but without any true labels in the training 29 set. Our task is to learn a classifier from the given complementary labels, predicting a correct label 30 for a given sample. Collecting complementary labels is much easier and efficient than choosing a 31 true class from many candidate classes precisely. For example, the label-system uniformly chooses a 32 label for a sample. It has a probability of 1 k to be ordinary-label but k-1 k to be complementary-label. 33 Moreover, another potential application of complementary-label is data privacy. For example, on 34 some privacy issues, it is much easier to collect complementary-label than ordinary-label.

35

Robust learning of classifiers has been investigated from many viewpoints in the presence of label 36 noise Ghosh et al. ( 2017), but little attention paid to complementary-label learning. We call a loss 37 function robust if the minimizer of risk under that loss function with complementary labels would be 38 the same as that with ordinary labels. The robustness of risk minimization relies on the loss function 39 used in the training set.

40

This paper presents a general risk formulation that category cross-entropy loss (CCE) can be used to 41 learn with complementary labels and achieve robustness. We then offer some innovative analytical 42 results on robust loss functions under complementary labels. Having robustness of risk minimization 43



have exhibited excellent performance in many real-applications. Yet, their 16 supper performance is based on the correctly labeled large-scale training set. However, labeling 17 such a large-scale dataset is time-consuming and expensive. For example, the crowd-workers need 18 to select the correct label for a sample from 100 labels for CIFAR100. To migrate this problem, 19 reachers have proposed many solutions to learn from weak-supervision: Noise-label learning Li 20 et al. (2017); Hu et al. (2019); Lee et al. (2018); Xia et al. (2019), semi-supervised learning Zhai 21 et al. (2019); Berthelot et al. (2019); Rasmus et al. (2015); Miyato et al. (2019); Sakai et al. (2017), 22 similar-unlabeled learning Tanha (2019); Bao et al. (2018); Zelikovitz & Hirsh (2000), unlabeled-23 unlabeled learning Lu et al. (2018); Chen et al. (2020a;b), positive-unlabeled learning Elkan & Noto 24 (2008); du Plessis et al. (2014); Kiryo et al. (2017), contrast learning Chen et al. (2020a;b), partial 25 label learning Cour et al. (2011); Feng & An (2018); Wu & Zhang (2018) and others. 26 We investigate complementary-label learning Ishida et al. (2017) in this paper. A complementary 27

annex

Under review as a conference paper at ICLR 2021 helps select the best hyper-parameter by empirical risk since there are no ordinary labels in the 44 validation set. We conclude two sufficient conditions on a loss function to be robust for learning 45 with complementary labels. We then explore some popular loss functions used for ordinary-label 46 learning, such as CCE, Mean square error (MSE) and Mean absolute error (MAE), and show that 47 CCE and MAE satisfy our sufficient conditions. Finally, we present a learning algorithm for learning 48 with complementary labels, named exclusion algorithm. The empirical results well demonstrate the 49 advantage of the theoretical results we addressed and verify our algorithm's superiority to the current 50 state-of-the-art methods. The contribution of this paper can be summarized as:

51

• We present a general risk formulation that can be view as a framework to employing a 52 loss function that satisfies our robustness sufficient condition to learn from complementary 53 labels.

54

• We conclude two sufficient conditions on a loss function to be robust for learning with 55 complementary labels.

56

• We prove that the minimizer of the risk for complementary labels is theoretically guaran-57 teed to be consistent with the minimizer of the risk for ordinary labels.

58

• The empirical results validate the superiority of our method to current state-of-the-art meth- estimator is not necessarily unbiased and proved that learning with complementary labels can the-78 oretically converge to the optimal classifier learned from ordinary labels based on the estimated 79 transition matrix. However, the key to the forward loss correction technique is to evaluate the tran-80 sition matrix correctly. Hence, one will need to assess the transition matrix beforehand, which is 81 relatively tricky without strong assumptions. Moreover, in such a setup, it restricts a small com-82 plementary label space to provide more information. Thus, it is necessary to encourage the worker 83 to provide more challenging complementary labels, for example, by giving higher rewards to the 84 specific classes. Otherwise, the complementary label given by the worker may be too evident and 

