SEMI-SUPERVISED LEARNING BY SELECTIVE TRAINING WITH PSEUDO LABELS VIA CONFIDENCE ESTIMATION Anonymous

Abstract

We propose a novel semi-supervised learning (SSL) method that adopts selective training with pseudo labels. In our method, we generate hard pseudo-labels and also estimate their confidence, which represents how likely each pseudo-label is to be correct. Then, we explicitly select which pseudo-labeled data should be used to update the model. Specifically, assuming that loss on incorrectly pseudo-labeled data sensitively increase against data augmentation, we select the data corresponding to relatively small loss after applying data augmentation. The confidence is used not only for screening candidates of pseudo-labeled data to be selected but also for automatically deciding how many pseudo-labeled data should be selected within a mini-batch. Since accurate estimation of the confidence is crucial in our method, we also propose a new data augmentation method, called MixConf, that enables us to obtain confidence-calibrated models even when the number of training data is small. Experimental results with several benchmark datasets validate the advantage of our SSL method as well as MixConf.

1. INTRODUCTION

Semi-supervised learning (SSL) is a powerful technique to deliver a full potential of complex models, such as deep neural networks, by utilizing unlabeled data as well as labeled data to train the model. It is especially useful in some practical situations where obtaining labeled data is costly due to, for example, necessity of expert knowledge. Since deep neural networks are known to be "data-hungry" models, SSL for deep neural networks has been intensely studied and has achieved surprisingly good performance in recent works (Van Engelen & Hoos, 2020) . In this paper, we focus on SSL for a classification task, which is most commonly tackled in the literature. Many recent SSL methods adopt a common approach in which two processes are iteratively conducted: generating pseudo labels of unlabeled data by using a currently training model and updating the model by using both labeled and pseudo-labeled data. In the pioneering work (Lee, 2013), pseudo labels are hard ones, which are represented by one-hot vectors, but recent methods (Tarvainen & Valpola, 2017; Miyato et al., 2018; Berthelot et al., 2019; 2020; Verma et al., 2019; Wang et al., 2019; Zhang & Qi, 2020) often utilize soft pseudo-labels, which may contain several nonzero elements within each label vector. One simple reason to adopt soft pseudo-labels is to alleviate confirmation bias caused by training with incorrectly pseudo-labeled data, and this attempt seems to successfully contribute to the excellent performance of those methods. However, since soft pseudolabels only provide weak supervisions, those methods often show slow convergence in the training (Lokhande et al., 2020) . For example, MixMatch (Berthelot et al., 2019) , which is one of the stateof-the-art SSL methods, requires nearly 1,000,000 iterations for training with CIFAR-10 dataset. On the other hand, in this paper, we aim to utilize hard pseudo-labels to design an easy-to-try SSL method in terms of computational efficiency. Obviously, the largest problem to be tackled in this approach is how to alleviate the negative impact caused by training with the incorrect pseudo-labels. In this work, we propose a novel SSL method that adopts selective training with pseudo labels. To avoid to train a model with incorrect pseudo-labels, we explicitly select which pseudo-labeled data should be used to update the model. Specifically, assuming that loss on incorrectly pseudo-labeled data sensitively increase against data augmentation, we select the data corresponding to relatively small loss after applying data augmentation. To effectively conduct this selective training, we estimate confidence of pseudo labels and utilize it not only for screening candidates of pseudo-labeled data to be selected but also for automatically deciding how many pseudo-labeled data should be selected within a mini-batch. For accurate estimation of the confidence, we also propose a new data augmentation method, called MixConf, that enables us to obtain confidence-calibrated models even when the number of training data is small. Experimental results with several benchmark datasets validate the advantage of our SSL method as well as MixConf.

2. PROPOSED METHOD

Figure 2 shows an overview of our method. Given a mini-batch from labeled data and that from unlabeled data, we first generate pseudo labels of the unlabeled data based on predictions of the current model. Let x ∈ R m , y ∈ {1, 2, ...C}, and f : R m → R C denote input data, labels, and the classifier to be trained, respectively. Given the input unlabeled data x U , the pseudo label ŷU is generated by simply taking arg max of the classifier's output f (x U ). Then, we conduct selective training using both the labeled data and the pseudo-labeled data. In this training, to alleviate negative effect caused by training with incorrect pseudo-labels, we explicitly select which data should be used to update the model. Below, we describe details of this selective training.

2.1. SELECTIVE TRAINING WITH PSEUDO LABELS BASED ON CONFIDENCE

As described previously, the pseudo labels are generated based on the predictions of the current model, and we assume that the confidence of those predictions can be also computed in addition to the pseudo labels. When we use a popular architecture of deep neural networks, it can be obtained by simply taking max of the classifier's output (Hendrycks & Gimpel, 2016) as: c i = max j∈{1,2,...,C} f (x U i )[j], where c i is the confidence of the classifier's prediction on the i-th unlabeled data x U i , and f (x)[j] is the j-th element of f (x). When the model is sufficiently confidence-calibrated, the confidence c i is expected to match the accuracy of the corresponding prediction f (x U i ) (Guo et al., 2017) , which means it also matches the probability that the pseudo label ŷU i is correct. To avoid training with incorrect pseudo-labels, we explicitly select the data to be used to train the model based on the confidence. This data selection comprises two steps: thresholding the confidence and selecting relatively small loss calculated with augmented pseudo-labeled data. The first step is



Figure 1: An overview of the proposed semi-supervised learning method.

