IN DEFENSE OF PSEUDO-LABELING: AN UNCERTAINTY-AWARE PSEUDO-LABEL SELEC-TION FRAMEWORK FOR SEMI-SUPERVISED LEARNING

Abstract

The recent research in semi-supervised learning (SSL) is mostly dominated by consistency regularization based methods which achieve strong performance. However, they heavily rely on domain-specific data augmentations, which are not easy to generate for all data modalities. Pseudo-labeling (PL) is a general SSL approach that does not have this constraint but performs relatively poorly in its original formulation. We argue that PL underperforms due to the erroneous high confidence predictions from poorly calibrated models; these predictions generate many incorrect pseudo-labels, leading to noisy training. We propose an uncertainty-aware pseudo-label selection (UPS) framework which improves pseudo labeling accuracy by drastically reducing the amount of noise encountered in the training process. Furthermore, UPS generalizes the pseudo-labeling process, allowing for the creation of negative pseudo-labels; these negative pseudo-labels can be used for multi-label classification as well as negative learning to improve the single-label classification. We achieve strong performance when compared to recent SSL methods on the CIFAR-10 and CIFAR-100 datasets. Also, we demonstrate the versatility of our method on the video dataset UCF-101 and the multi-label dataset Pascal VOC.

1. INTRODUCTION

The recent extraordinary success of deep learning methods can be mostly attributed to advancements in learning algorithms and the availability of large-scale labeled datasets. However, constructing large labeled datasets for supervised learning tends to be costly and is often infeasible. Several approaches have been proposed to overcome this dependency on huge labeled datasets; these include semi-supervised learning (Berthelot et al., 2019; Tarvainen & Valpola, 2017; Miyato et al., 2018; Lee, 2013) , self-supervised learning (Doersch et al., 2015; Noroozi & Favaro, 2016; Chen et al., 2020a) , and few-shot learning (Finn et al., 2017; Snell et al., 2017; Vinyals et al., 2016) . Semi-supervised learning (SSL) is one of the most dominant approaches for solving this problem, where the goal is to leverage a large unlabeled dataset alongside a small labeled dataset. One common assumption for SSL is that decision boundaries should lie in low density regions (Chapelle & Zien, 2005) . Consistency-regularization based methods achieve this by making the network outputs invariant to small input perturbations (Verma et al., 2019) . However, one issue with these methods is that they often rely on a rich set of augmentations, like affine transformations, cutout (DeVries & Taylor, 2017) , and color jittering in images, which limits their capability for domains where these augmentations are less effective (e.g. videos and medical images). Pseudo-labeling based methods select unlabeled samples with high confidence as training targets (pseudo-labels); this can be viewed as a form of entropy minimization, which reduces the density of data points at the decision boundaries (Grandvalet & Bengio, 2005; Lee, 2013) . One advantage of pseudo-labeling over consistency regularization is that it does not inherently require augmentations and can be generally applied to most domains. However, recent consistency regularization approaches tend to outperform pseudo-labeling on SSL benchmarks. This work is in defense of pseudo-labeling: we demonstrate that pseudo-labeling based methods can perform on par with consistency regularization methods.

