SEMPPL: PREDICTING PSEUDO-LABELS FOR BETTER CONTRASTIVE REPRESENTATIONS

Abstract

Learning from large amounts of unsupervised data and a small amount of supervision is an important open problem in computer vision. We propose a new semisupervised learning method, Semantic Positives via Pseudo-Labels (SEMPPL), that combines labelled and unlabelled data to learn informative representations. Our method extends self-supervised contrastive learning-where representations are shaped by distinguishing whether two samples represent the same underlying datum (positives) or not (negatives)-with a novel approach to selecting positives. To enrich the set of positives, we leverage the few existing ground-truth labels to predict the missing ones through a k-nearest neighbours classifier by using the learned embeddings of the labelled data. We thus extend the set of positives with datapoints having the same pseudo-label and call these semantic positives. We jointly learn the representation and predict bootstrapped pseudolabels. This creates a reinforcing cycle. Strong initial representations enable better pseudo-label predictions which then improve the selection of semantic positives and lead to even better representations. SEMPPL outperforms competing semisupervised methods setting new state-of-the-art performance of 68.5% and 76% top-1 accuracy when using a ResNet-50 and training on 1% and 10% of labels on ImageNet, respectively. Furthermore, when using selective kernels, SEMPPL significantly outperforms previous state-of-the-art achieving 72.3% and 78.3% top-1 accuracy on ImageNet with 1% and 10% labels, respectively, which improves absolute +7.8% and +6.2% over previous work. SEMPPL also exhibits stateof-the-art performance over larger ResNet models as well as strong robustness, out-of-distribution and transfer performance. We release the checkpoints and the evaluation code at https://github.com/deepmind/semppl.

1. INTRODUCTION

In recent years, self-supervised learning has made significant strides in learning useful visual features from large unlabelled datasets [Oord et al., 2018; Chen et al., 2020a; Mitrovic et al., 2021; Grill et al., 2020; Caron et al., 2021] . Moreover, self-supervised representations have matched the performance of historical supervised baselines on the ImageNet-1k benchmark [Russakovsky et al., 2015] in like-for-like comparisons as well as outperformed supervised learning in many transfer settings [Tomasev et al., 2022] . While such results show exciting progress in the field, in many real-wold applications often there exists a small amount of ground-truth labelled datapoints making the problem of representation learning semi-supervised. In this work we propose a novel approach to semi-supervised learning called Semantic Positives via Pseudo-Labels (SEMPPL) which incorporates supervised information during the representation learning stage within a self-supervised loss. Unlike previous work which uses the available supervision as targets within a cross-entropy objective, we propose to use the supervised information to help inform which points should have similar representations. We propose to learn representations using a contrastive approach, i.e. we learn the representation of a datapoint (anchor) by maximizing the similarity of the embedding of that datapoint with a set of similar points (positives), while simultaneously minimizing the similarity of that embedding with a set of dissimilar points (negatives). As such, the appropriate construction of these sets of positives and negatives is crucial to the success of

