ON THE IMPORTANCE OF CALIBRATION IN SEMI-SUPERVISED LEARNING Anonymous

Abstract

State-of-the-art (SOTA) semi-supervised learning (SSL) methods have been highly successful in leveraging a mix of labeled and unlabeled data by combining techniques of consistency regularization and pseudo-labeling. During pseudo-labeling, the model's predictions on unlabeled data are used for training and thus, model calibration is important in mitigating confirmation bias. Yet, many SOTA methods are optimized for model performance, with little focus directed to improve model calibration. In this work, we empirically demonstrate that model calibration is strongly correlated with model performance and propose to improve calibration via approximate Bayesian techniques. We introduce a family of new SSL models that optimizes for calibration and demonstrate their effectiveness across standard vision benchmarks of CIFAR-10, CIFAR-100 and ImageNet, giving up to 16.2% improvement in test accuracy on the CIFAR-100-400-labels benchmark. Furthermore, we also demonstrate their effectiveness in additional realistic and challenging problems, such as class-imbalanced datasets and in photonics science.

1. INTRODUCTION

While deep learning has achieved unprecedented success in recent years, its reliance on vast amounts of labeled data remains a long standing challenge. Semi-supervised learning (SSL) aims to mitigate this by leveraging unlabeled samples in combination with a limited set of annotated data. In computer vision, two powerful techniques that have emerged are pseudo-labeling (also known as self-training) (Rosenberg et al., 2005; Xie et al., 2019b) and consistency regularization (Bachman et al., 2014; Sajjadi et al., 2016) . Broadly, pseudo-labeling is the technique where artificial labels are assigned to unlabeled samples, which are then used to train the model. Consistency regularization enforces that random perturbations of the unlabeled inputs produce similar predictions. These two techniques are typically combined by minimizing the cross-entropy between pseudo-labels and predictions that are derived from differently augmented inputs, and have led to strong performances on vision benchmarks (Sohn et al., 2020; Assran et al., 2021) . Intuitively, given that pseudo-labels (i.e. the model's predictions for unlabeled data) are used to drive training objectives, the calibration of the model should be of paramount importance. Model calibration (Guo et al., 2017 ) is a measure of how a model's output truthfully quantifies its predictive uncertainty, i.e. it can be understood as the alignment between its prediction confidence and its groundtruth accuracy. In some SSL methods, the model's confidence is used as a selection metric (Lee, 2013; Sohn et al., 2020) to determine pseudo-label acceptance, further highlighting the need for proper confidence estimates. Even outside this family of methods, the use of cross-entropy minimization objectives common in SSL implies that models will naturally be driven to output high-confidence predictions (Grandvalet & Bengio, 2004) . Having high-confidence predictions is highly desirable in SSL since we want the decision boundary to lie in low-density regions of the data manifold, i.e. away from labeled data points (Murphy, 2022). However, without proper calibration, a model would easily become over-confident. This is highly detrimental as the model would be encouraged to reinforce its mistakes, resulting in the phenomenon commonly known as confirmation bias (Arazo et al., 2019) . Despite the fundamental importance of calibration in SSL, many state-of-the-art (SOTA) methods have thus far been empirically driven and optimized for performance, with little focus on techniques that specifically target improving calibration to mitigate confirmation bias. In this work, we explore the generality of the importance of calibration in SSL by focusing on two broad families of SOTA

