TOWARDS SEMI-SUPERVISED LEARNING WITH NON-RANDOM MISSING LABELS

Abstract

Semi-supervised learning (SSL) tackles the label missing problem by enabling the effective usage of unlabeled data. While existing SSL methods focus on the traditional setting, a practical and challenging scenario called label Missing Not At Random (MNAR) is usually ignored. In MNAR, the labeled and unlabeled data fall into different class distributions resulting in biased label imputation, which deteriorates the performance of SSL models. In this work, class transition tracking based Pseudo-Rectifying Guidance (PRG) is devised for MNAR. We explore the class-level guidance information obtained by the Markov random walk, which is modeled on a dynamically created graph built over the class tracking matrix. PRG unifies the history information of each class transition caused by the pseudo-rectifying procedure to activate the model's enthusiasm for neglected classes, so as the quality of pseudo-labels on both popular classes and rare classes in MNAR could be improved. We show the superior performance of PRG across a variety of the MNAR scenarios, outperforming the latest SSL solutions by a large margin. Checkpoints and evaluation code are available at the anonymous link https://anonymous.4open.science/r/PRG4SSL-MNAR-8DE2 while the source code will be available upon paper acceptance.



The class distribution of total data is balanced whereas labeled data is unevenly distributed across classes. For better illustration, the y-axis has different scaling for labeled (blue) and unlabeled data (green). Semi-supervised learning (SSL), which is in the ascendant, yields promising results in solving the shortage of large-scale labeled data (Chapelle et al., 2009; Zhou, 2021; Van Engelen & Hoos, 2020) . Current prevailing SSL methods (Lee et al., 2013; Berthelot et al., 2020; Sohn et al., 2020; Tai et al., 2021; Zhang et al., 2021) utilize the model trained on the labeled data to impute pseudo-labels for the unlabeled data, thereby boosting the model performance. Although these methods have made exciting advances in SSL, they only work well in the conventional setting, i.e., the labeled and unlabeled data fall into the same (balanced) class distribution. Once this setting is not guaranteed, the gap between the class distributions of the labeled and unlabeled data will lead to a significant accuracy drop of the pseudo-labels, resulting in strong confirmation bias (Arazo et al., 2019) which ultimately corrupts the performance of SSL models. The work in Hu et al. (2022) originally terms the scenario of the labeled and unlabeled data belonging to mismatched class distributions as label Missing Not At Random (MNAR) and proposes an unified doubly robust framework to train an unbiased SSL model in MNAR. It can be easily found that in MNAR, either the labeled or the unlabeled data has an imbalanced class distribution, otherwise, it degrades to the conventional SSL setting. A typical MNAR scenario is shown in Fig. 1 , in which the popular classes of labeled data cause the model to ignore the rare classes, increasingly magnifying the bias in label imputation on the unlabeled data. It is worth noting that although some recent SSL methods (Kim et al., 2020; Wei et al., 2021) are proposed to deal with the class imbalance, they are still built upon the assumption of the matched class distributions between the labeled and unlabeled data, and their performance inevitably declines in MNAR. 1 2 3 4 5 6 7 8 9 MNAR is a more realistic scenario than the conventional SSL setting. In the practical labeling process, labeling all classes uniformly is usually not affordable because some classes are more difficult to recognize (Rosset et al., 2005; Misra et al., 2016; Colléony et al., 2017) . Meanwhile, most automatic data collection methods also have difficulty in ensuring that the collected labeled data is balanced (Mahajan et al., 2018; Hu et al., 2022) . In a nutshell, MNAR is almost inevitable in SSL. In MNAR, the tricky troublemaker is the mismatched class distributions between the labeled and unlabeled data. Training under MNAR, the model increasingly favors some classes, seriously affecting the pseudo-rectifying procedure. Pseudo-rectifying is defined as the change of the label assignment decision made by the SSL model for the same sample according to the knowledge learned at each new epoch. This process may cause class transition, i.e., given a sample, its class prediction at the current epoch is different from that at the last epoch. In the self-training process of the SSL model driven by the labeled data, the model is expected to gradually rectify the pseudo-labels mispredicted for the unlabeled data in last epoches. With pseudo-rectifying, the model trapped in the learning of extremely noisy pseudo-labels will be rescued due to its ability to correct these labels. & OD V V , Q G H [ DL US OD Q H DX WR P R E LO H E LU G FD W G HH U G R J IU R J K R UV H VK LS WU X FN DLUSODQH DXWRPRELOH ELUG FDW GHHU GRJ IURJ KRUVH VKLS WUXFN (c) Unfortunately, the pseudo-rectifying ability of the SSL model could be severely perturbed in MNAR. Take the setting in Fig. 1 for example. The model's "confidence" in predicting the pseudo-labels into the labeled rare classes is attenuated by over-learning the samples of the labeled popular classes. Thus, the model fails to rectify those pseudo-labels mispredicted as the popular classes to the correct rare classes (even if the class distribution is balanced in unlabeled data). As shown in Fig. 2b , compared with FixMatch (Sohn et al., 2020) trained in the conventional setting (Fig. 2a ), FixMatch trained in MNAR (Fig. 1 ) significantly deteriorates its pseudo-rectifying ability. Even after many iterations, the error rates of the pseudo-labels predicted for labeled rare classes remain high. This phenomenon hints the necessity to provide additional guidance to the rectifying procedure to address MNAR. Meanwhile, as observed in Fig. 2c , we notice that the mispredicted pseudo-labels for each class are often concentrated in a few classes, rather than scattered across all other classes. Intuitively, a class can easily be confused with the classes similar to it. For example, as shown in Fig. 2c , the "automobile" samples are massively mispredicted as the most similar class: "truck". Inspired by this, we argue that it is feasible to guide pseudo-rectifying from the class-level, i.e., pointing out the latent direction of class transition based on its current class prediction only. For instance, given a sample classified as "truck", the model could be given a chance to classify it as "automobile" sometimes, and vice versa. Notably, our approach does not require predefined semantically similar classes. We believe that two classes are conceptually similar only if they are frequently misclassified to each other by the classifier. In this sense, we develop a novel definition of the similarity of two classes, which is directly determined by model's output. Even if there are no semantically similar classes, as long as the model makes incorrect prediction during the training, this still leads to class transitions which has seldom been investigated before. Our intuition could be regarded as perturbations on some confident class predictions to preserve the pseudo-rectifying ability of the model. Such a strategy does not rely on the matched class distributions assumption and therefore is amenable to MNAR. Given the motivations above, we propose class transition tracking based Pseudo-Rectifying Guidance (PRG) to address SSL in MNAR, which is shown in Fig. 3 . Our main idea can be presented as



Figure 1: An example of the MNAR scenarios on CIFAR-10 (see Sec. 4 for details). The class distribution of total data is balanced whereas labeled data is unevenly distributed across classes.For better illustration, the y-axis has different scaling for labeled (blue) and unlabeled data (green).

Figure 2: Results of FixMatch (Sohn et al., 2020) in MNAR and the conventional setting. The models are trained on CIFAR-10 with WRN-28-2 backbone (Zagoruyko & Komodakis, 2016). (a) and (b): Class-wise pseudo-label error rate. (c): Confusion matrix of pseudo-labels. In (b) and (c), experiments are conducted with the setting of Fig. 1, whereas in (a) with the conventional setting (i.e., balanced labeled and unlabeled data). The label amount used in (a) is the same as that in (b) and (c).

