INPL: PSEUDO-LABELING THE INLIERS FIRST FOR IM-BALANCED SEMI-SUPERVISED LEARNING

Abstract

Recent state-of-the-art methods in imbalanced semi-supervised learning (SSL) rely on confidence-based pseudo-labeling with consistency regularization. To obtain high-quality pseudo-labels, a high confidence threshold is typically adopted. However, it has been shown that softmax-based confidence scores in deep networks can be arbitrarily high for samples far from the training data, and thus, the pseudo-labels for even high-confidence unlabeled samples may still be unreliable. In this work, we present a new perspective of pseudo-labeling for imbalanced SSL. Without relying on model confidence, we propose to measure whether an unlabeled sample is likely to be "in-distribution"; i.e., close to the current training data. To decide whether an unlabeled sample is "in-distribution" or "out-of-distribution", we adopt the energy score from out-of-distribution detection literature. As training progresses and more unlabeled samples become in-distribution and contribute to training, the combined labeled and pseudo-labeled data can better approximate the true class distribution to improve the model. Experiments demonstrate that our energy-based pseudo-labeling method, InPL, albeit conceptually simple, significantly outperforms confidence-based methods on imbalanced SSL benchmarks. For example, it produces around 3% absolute accuracy improvement on CIFAR10-LT. When combined with state-of-the-art long-tailed SSL methods, further improvements are attained. In particular, in one of the most challenging scenarios, InPL achieves a 6.9% accuracy improvement over the best competitor.

1. INTRODUCTION

In recent years, the frontier of semi-supervised learning (SSL) has seen significant advances through pseudo-labeling (Rosenberg et al., 2005; Lee et al., 2013) combined with consistency regularization (Laine & Aila, 2017; Tarvainen & Valpola, 2017; Berthelot et al., 2020; Sohn et al., 2020; Xie et al., 2020a) . Pseudo-labeling, a type of self-training (Scudder, 1965; McLachlan, 1975) technique, converts model predictions on unlabeled samples into soft or hard labels as optimization targets, while consistency regularization (Laine & Aila, 2017; Tarvainen & Valpola, 2017; Berthelot et al., 2019; 2020; Sohn et al., 2020; Xie et al., 2020a) trains a model to produce the same outputs for two different views (e.g, strong and weak augmentations) of an unlabeled sample. However, most methods are designed for the balanced SSL setting where each class has a similar number of training samples, whereas most real-world data are naturally imbalanced, often following a long-tailed distribution. To better facilitate real-world scenarios, imbalanced SSL has recently received increasing attention. State-of-the-art imbalanced SSL methods (Kim et al., 2020; Wei et al., 2021; Lee et al., 2021) are build upon the pseudo-labeling and consistency regularization frameworks (Sohn et al., 2020; Xie et al., 2020a ) by augmenting them with additional modules that tackle specific imbalanced issues (e.g., using per-class balanced sampling (Lee et al., 2021; Wei et al., 2021) ). Critically, these methods still rely on confidence-based thresholding (Lee et al., 2013; Sohn et al., 2020; Xie et al., 2020a; Zhang et al., 2021) for pseudo-labeling, in which only the unlabeled samples whose predicted class confidence surpasses a very high threshold (e.g., 0.95) are pseudo-labeled for training. Confidence-based pseudo-labeling, despite its success in balanced SSL, faces two major drawbacks in the imbalanced, long-tailed setting. First, applying a high confidence threshold yields significantly Note that with a confidence-based approach, the diamond unlabeled sample would be added as a pseudo-label for the green class since the model's confidence for it is very high (0.97). Our InPL instead ignores it since its energy score is too high and is thus considered out-of-distribution at this stage. (b) As training progresses, the training distribution is evolved by both the initial labeled data and the pseudo-labeled "in-distribution" unlabeled data, and more unlabeled data can be included in training. In this example, with our approach InPL, the diamond sample would eventually be pseudo-labeled as the red class. lower recall of pseudo-labels for minority classes (Wei et al., 2021) , resulting in an exacerbation of class imbalance. Lowering the threshold can improve the recall for tail classes but at the cost of reduced precision for other classes (see analysis in Section 4.4). Second, prior studies (Szegedy et al., 2014; Nguyen et al., 2015; Hein et al., 2019) show that softmax-based confidence scores in deep networks can be arbitrarily high on even out-of-distribution samples. Thus, under long-tailed scenarios where the model is generally biased towards the majority classes, the model can predict high confidence scores for the head classes even if the instances are actually from the tail classes, resulting in low precision for head classes. Given the drawbacks of using the confidence score as the pseudo-label criterion, we seek to design a better approach to determine if an unlabeled sample should be pseudo-labeled. In this work, we present a novel approach for pseudo-labeling that addresses the drawbacks of confidence-based pseudo-labeling in imbalanced SSL. Instead of relying on a model's prediction confidence to decide whether to pseudo-label an unlabeled instance or not, we propose to view the pseudo-labeling decision as an evolving in-distribution vs. out-of-distribution classification problemfoot_0 . Initially, only the ground-truth human-labeled samples are considered "in-distribution" because they are the only training examples. In each ensuing training iteration, the unlabeled samples that are close to the current "in-distribution" samples are pseudo-labeled and contribute to training, which in turn gradually expands the "in-distribution". Thus, any "out-of-distribution" unlabeled samples from previous iterations may become "in-distribution" in later iterations, as the distribution of the pseudo-labeled training data is continuously updated and expanded. An illustrative example of this process can be found in Figure 1 . To identify the "inliers", we leverage the energy score (LeCun et al., 2006) for its simplicity and good empirical performance. The energy score is a non-probabilistic scalar that is derived from a model's logits and theoretically aligned with the probability density of a data sample-lower/higher energy reflects data with higher/lower likelihood of occurrence following the training distribution, and has been shown to be useful for conventional out-of-distribution (OOD) detection (Liu et al., 2020) . In our imbalanced SSL setting, at each training iteration, we compute the energy score for each unlabeled sample. If an unlabeled sample's energy is below a certain threshold, we pseudo-label it with the predicted class made by the model. To the best of the authors' knowledge, our work is the first to consider pseudo-labeling in imbalanced SSL from an in-distribution vs. out-distribution perspective and is also the first work that performs pseudo-labeling without using softmax scores. We refer to our method as Inlier Pseudo-Labeling (InPL) in the rest of this paper. To evaluate the proposed InPL, we integrate it into the classic FixMatch (Sohn et al., 2020) framework and the recent state-of-the-art imbalanced SSL framework ABC (Lee et al., 2021) by replacing their vanilla confidence-based pseudo-labeling with our energy-based pseudo-labeling. InPL significantly



Note that our definition of out-of-distribution is different from the typical definition from the Out-of-Distribution literature that constitutes unseen classes.



Figure1: We illustrate the idea of InPL with a toy example with one head class (green) and one tail class (red). (a) At the beginning of training, only a few unlabeled samples are close enough to the training distribution formed by the initial labeled data. Note that with a confidence-based approach, the diamond unlabeled sample would be added as a pseudo-label for the green class since the model's confidence for it is very high (0.97). Our InPL instead ignores it since its energy score is too high and is thus considered out-of-distribution at this stage. (b) As training progresses, the training distribution is evolved by both the initial labeled data and the pseudo-labeled "in-distribution" unlabeled data, and more unlabeled data can be included in training. In this example, with our approach InPL, the diamond sample would eventually be pseudo-labeled as the red class.

