RETHINKING POSITIVE SAMPLING FOR CONTRASTIVE LEARNING WITH KERNEL Anonymous authors Paper under double-blind review

Abstract

Data augmentation is a crucial component in unsupervised contrastive learning (CL). It determines how positive samples are defined and, ultimately, the quality of the representation. Even if efforts have been made to find efficient augmentations for ImageNet, CL underperforms compared to supervised methods and it is still an open problem in other applications, such as medical imaging, or in datasets with easy-to-learn but irrelevant imaging features. In this work, we propose a new way to define positive samples using kernel theory along with a novel loss called decoupled uniformity. We propose to integrate prior information, learnt from generative models viewed as feature extractor, or given as auxiliary attributes, into contrastive learning, to make it less dependent on data augmentation. We draw a connection between contrastive learning and the conditional mean embedding theory to derive tight bounds on the downstream classification loss. In an unsupervised setting, we empirically demonstrate that CL benefits from generative models, such as VAE and GAN, to less rely on data augmentations. We validate our framework on vision and medical datasets including CIFAR10, CIFAR100, STL10, ImageNet100, CheXpert and a brain MRI dataset. In the weakly supervised setting, we demonstrate that our formulation provides state-of-the-art results.

1. INTRODUCTION

Figure 1 : Illustration of the proposed method. Each point is an original image x. Two points are connected if they can be transformed into the same augmented image using a distribution of augmentations A. Colors represent semantic (unknown) classes and light disks represent the support of augmentations for each sample x, A(•|x). From an incomplete augmentation graph (1) where intra-class samples are not connected (e.g. augmentations are insufficient or not adapted), we reconnect them using a kernel defined on prior information (either learnt with generative model, viewed as feature extractor, or given as auxiliary attributes). The extended augmentation graph (3) is the union between the (incomplete) augmentation graph (1) and the kernel graph (2). In (2), the gray disk indicates the set of points x′ that are close to the anchor (blue star) in the kernel space. Contrastive Learning (CL)(44; 3; 4; 7; 10) is a paradigm designed for representation learning which has been applied to unsupervised (10; 13), weakly supervised(55; 20) and supervised problems (37). It gained popularity during the last years by achieving impressive results in the unsupervised 1

