ROPAWS: ROBUST SEMI-SUPERVISED REPRESENTA-TION LEARNING FROM UNCURATED DATA

Abstract

Semi-supervised learning aims to train a model using limited labels. State-of-theart semi-supervised methods for image classification such as PAWS rely on selfsupervised representations learned with large-scale unlabeled but curated data. However, PAWS is often less effective when using real-world unlabeled data that is uncurated, e.g., contains out-of-class data. We propose RoPAWS, a robust extension of PAWS that can work with real-world unlabeled data. We first reinterpret PAWS as a generative classifier that models densities using kernel density estimation. From this probabilistic perspective, we calibrate its prediction based on the densities of labeled and unlabeled data, which leads to a simple closed-form solution from the Bayes' rule. We demonstrate that RoPAWS significantly improves PAWS for uncurated Semi-iNat by +5.3% and curated ImageNet by +0.4%.

1. INTRODUCTION

Semi-supervised learning aims to address the fundamental challenge of training models with limited labeled data by leveraging large-scale unlabeled data. Recent works exploit the success of selfsupervised learning (He et al., 2020; Chen et al., 2020a) in learning representations from unlabeled data for training large-scale semi-supervised models (Chen et al., 2020b; Cai et al., 2022) . Instead of self-supervised pre-training followed by semi-supervised fine-tuning, PAWS (Assran et al., 2021) proposed a single-stage approach that combines supervised and self-supervised learning and achieves state-of-the-art accuracy and convergence speed. While PAWS can leverage curated unlabeled data, we empirically show that it is not robust to realworld uncurated data, which often contains out-of-class data. A common approach to tackle uncurated data in semi-supervised learning is to filter unlabeled data using out-of-distribution (OOD) classification (Chen et al., 2020d; Saito et al., 2021; Liu et al., 2022) . However, OOD filtering methods did not fully utilize OOD data, which could be beneficial to learn the representations especially on large-scale realistic datasets. Furthermore, filtering OOD data could be ineffective since in-class and out-of-class data are often hard to discriminate in practical scenarios. To this end, we propose RoPAWS, a robust semi-supervised learning method that can leverage uncurated unlabeled data. PAWS predicts out-of-class data overconfidently in the known classes since it assigns the pseudo-label to nearby labeled data. To handle this, RoPAWS regularizes the pseudolabels by measuring the similarities between labeled and unlabeled data. These pseudo-labels are further calibrated by label propagation between unlabeled data. Figure 1 shows the conceptual illustration of RoPAWS and Figure 4 visualizes the learned representations. More specifically, RoPAWS calibrates the prediction of PAWS from a probabilistic view. We first introduce a new interpretation of PAWS as a generative classifier, modeling densities over representation by kernel density estimation (KDE) (Rosenblatt, 1956) . The calibrated prediction is given by a closed-form solution from Bayes' rule, which implicitly computes the fixed point of an iterative propagation formula of labels and priors of unlabeled data. In addition, RoPAWS explicitly controls out-of-class data by modeling a prior distribution and computing a reweighted loss, making the model robust to uncurated data. Unlike OOD filtering methods, RoPAWS leverages all of the unlabeled (and labeled) data for representation learning. 

2. RELATED WORK

Semi-supervised learning has been an active area of research for decades (Chapelle et al., 2006; Van Engelen & Hoos, 2020) . Classic works directly regularize the classifier's prediction, which we call the prediction-based approach. Recent works focus more on the representation that stems from the final classifier, which we call the representation-based approach. Leveraging the recent progress of self-supervised learning (He et al., 2020) , the representation-based approach has more promise in large-scale scenarios and can be more robust under uncurated data. In the following subsections, we will briefly review the general idea of each approach and how they handle uncurated data.foot_0 

2.1. PREDICTION-BASED SEMI-SUPERVISED LEARNING

General approach. Prediction-based approaches regularize the classifier's prediction of unlabeled data. Two objectives are popularly used: pseudo-labeling and consistency regularization. Pseudolabeling (Lee et al., 2013; Yalniz et al., 2019; Xie et al., 2020b; Rizve et al., 2021; Cascante-Bonilla et al., 2021; Hu et al., 2021) predicts labels of unlabeled data and retrains the classifier using the predicted labels, minimizing the prediction entropy of unlabeled data (Grandvalet & Bengio, 2004) . Consistency regularization (Sajjadi et al., 2016; Laine & Aila, 2017; Tarvainen & Valpola, 2017; Miyato et al., 2018; Xie et al., 2020a) enforces the prediction that two views of the same image are similar. Combining two objectives, prediction-based approaches has shown notable results (Berthelot et al., 2019; 2020; Sohn et al., 2020; Kuo et al., 2020; Li et al., 2021) . However, they underperform than representation-based approaches for large-scale scenarios (Chen et al., 2020b) . Moreover, most prior works assume that unlabeled data are curated, i.e., follow the same distribution of labeled data, and often fail when unlabeled data are uncurated (Oliver et al., 2018; Su et al., 2021) . Handling uncurated data. Numerous works have attempted to make the prediction-based approach robust to uncurated data. Most prior works assume the unlabeled data are composed of in-and outof-domain (OOD) data and filter OOD data by training an OOD classifier (Chen et al., 2020d; Guo et al., 2020; Yu et al., 2020; Huang et al., 2021a; b; Saito et al., 2021; Killamsetty et al., 2021; Nair et al., 2019; Augustin & Hein, 2020; Park et al., 2021) . From the perspective of the prediction-based approach, it is natural to filter OOD (particularly out-of-class) data since they are irrelevant to in-



Besides the deep learning approaches, SSKDE(Wang et al., 2009) proposed semi-supervised kernel density estimation, which technically resembles RoPAWS. The detailed discussion and comparison are in Appendix H.



Figure 1: Conceptual illustration of the proposed RoPAWS. PAWS assigns the pseudo-label of unlabeled data by the nearby labeled data; however, this makes the prediction of out-of-class data overconfident. In uncurated setting, unlabeled data contains out-of-class data, for which the model should have uncertain (not confident) predictions. Therefore, RoPAWS regularizes the pseudo-labels by comparing the similarities between unlabeled data and labeled data.

