HOW WEAKLY SUPERVISED INFORMATION HELPS CONTRASTIVE LEARNING

Abstract

Contrastive learning has shown outstanding performances in both supervised and unsupervised learning. However, little is known about when and how weakly supervised information helps improve contrastive learning, especially from the theoretical perspective. The major challenge is that the existing theory of contrastive learning based on supervised learning frameworks failed to distinguish between supervised and unsupervised contrastive learning. Therefore, we turn to the unsupervised learning frameworks, and based on the posterior probability of labels, we translate the weakly supervised information into a similarity graph under the framework of spectral clustering. In this paper, we investigate two typical weakly supervised learning problems, noisy label learning, and semi-supervised learning, and analyze their influence on contrastive learning within a unified framework. Specifically, we analyze the effect of weakly supervised information on the augmentation graph of unsupervised contrastive learning, and consequently on its corresponding error bound. Numerical experiments are carried out to verify the theoretical findings.

1. INTRODUCTION

Contrastive learning has shown state-of-the-art empirical performances in both supervised and unsupervised learning. In unsupervised learning, contrastive learning algorithms (Chen et al., 2020; He et al., 2020; Chen et al., 2021; Chen and He, 2021) learn good representations of high-dimensional observations from a large amount of unlabeled data, by pulling together an anchor and its augmented views in the embedding space. On the other hand, supervised contrastive learning (Khosla et al., 2020) uses same-class examples and their corresponding augmentations as positive labels, and achieves significantly better performance than the state-of-the-art cross entropy loss, especially on large-scale datasets. Recently, contrastive learning has been introduced to solve weakly supervised learning problems such as noisy label learning (Tan et al., 2021; Wang et al., 2022) and semi-supervised learning. For noisy label learning, most methodological studies use contrastive learning as a tool to select confident samples based on the learned representations (Yao et al., 2021; Ortego et al., 2021; Li et al., 2022) , whereas the theoretical studies focus on proving the robustness of downstream classifiers with features learned by self-supervised contrastive learning (Cheng et al., 2021; Xue et al., 2022) . For semi-supervised learning, contrastive loss is often used as a regularization to improve the precision of pseudo labeling (Lee et al., 2022; Yang et al., 2022) . However, none of the existing studies use weakly supervised information to improve contrastive learning. Perhaps the closest attempt is Yan et al. (2022) , which leverages the negative correlations from the noisy data to avoid same-class negatives for contrastive learning. Nonetheless, there are purely empirical results presented, without showing when and how the weakly supervised information helps improve contrastive learning. Moreover, a proper theoretical framework of weakly supervised contrastive learning is especially lacking. The major challenge lies in the fact that the existing theoretical frameworks compatible with both supervised and unsupervised contrastive learning (Arora et al., 2019; Nozawa and Sato, 2021; Ash et al., 2022; Bao et al., 2022) fail to distinguish between the two settings. To be specific, in order to build a relationship with supervised learning losses, such studies assume that the positive pairs for unsupervised contrastive learning are generated from the same latent class, and this is exactly how positive samples for supervised contrastive learning are selected. Consequently, such mathemati-cal modeling cannot tell the difference between supervised and unsupervised contrastive learning. Therefore, in this paper, we in turn base our theoretical analysis on an unsupervised learning framework. Based on the posterior probability of labeled samples, we translate the weakly supervised information into a similarity graph under the framework of spectral clustering. This enables us to analyze the effect of the label information on the augmentation graph of the unsupervised spectral clustering (HaoChen et al., 2021) , and consequently on its corresponding error bound. The contributions of this paper are summarized as follows. • We for the first time establish a theoretical framework for weakly supervised learning contrastive learning, which is compatible with both noisy label and semi-supervised learning. • By formulating the label information into a similarity graph based on the posterior probability of labels, we derive the downstream error bound of contrastive learning from both weakly supervised labels and feature information. We show that both noisy labels and semisupervised labels can improve the error bound of unsupervised contrastive learning under certain constraints on the noise rate and labeled sample size. • We empirically verify our theoretical results.

2. RELATED WORKS

Theoretical Frameworks of Contrastive Learning. The theoretical frameworks of unsupervised contrastive learning can be divided into two major categories. The first category is devoted to building the relationship between unsupervised contrastive learning and supervised downstream classification. Arora et al. ( 2019) first introduces the concept of latent classes, hypothesize that semantically similar points are sampled from the same latent class, and proves that the unsupervised contrastive loss serves an upper bound of downstream supervised learning loss. Nozawa and Sato (2021); Ash et al. ( 2022); Bao et al. ( 2022) further investigate the effect of negative samples, and establish surrogate bounds for the downstream classification loss that better match the empirical observations on the negative sample size. However, studies in this category have to assume the existence of supervised latent classes, and that the positive pairs are conditionally independently drawn from the same latent class. This assumption fails to distinguish between supervised and unsupervised contrastive learning, and cannot be used to analyze the weakly supervised setting. Another major approach is to analyze contrastive learning by modeling the feature similarity. In this paper, we follow the second category of contrastive learning approaches, and formulate the weakly supervised information into a similarity graph based on both label and feature information. Contrastive Learning for Noisy Label Learning. Ghosh and Lan (2021) first finds that pretraining with contrastive learning improves robustness to label noise through empirical evidences. Many methodological studies are carried out for noisy label learning with the help of contrastive learning. 



HaoChen et al. (2021) first introduces the concept of the augmentation graph to represent the feature similarity of the augmented samples, and analyzes contrastive learning from the perspective of spectral clustering. Shen et al. (2022) uses a stochastic block model to analyze spectral contrastive learning for the problem of unsupervised domain adaption. Similarly, Wang et al. (2021) proposes the concept of augmentation overlap to formulate how the positive samples are aligned. Moreover, contrastive learning is also understood through other existing theoretical frameworks of unsupervised learning, such as nonlinear independent component analysis (ICA) (Zimmermann et al., 2021), neighborhood component analysis (NCA) (Ko et al., 2022), variational autoencoder (VAE) (Aitchison, 2021), etc.

Yao et al. (2021); Ortego et al. (2021); Li et al. (2022) use representations learned from unsupervised contrastive learning to filter out confident samples from all noisy ones, and in turn use the confident samples to conduct supervised contrastive learning to generate better representations. By contrast, Yan et al. (2022) follows the idea of negative learning (Kim et al., 2019; 2021), and leverage the negative correlations from the noisy data to avoid same-class negatives in contrastive learning. For theoretical studies, Cheng et al. (2021) analyzes the robustness of cross-entropy with SSL features, and Xue et al. (2022) proves the robustness of downstream classifier in contrastive learning. Contrastive Learning for Semi-supervised Learning. Lee et al. (2022); Yang et al. (2022) use contrastive regularization to enhance the reliability of pseudo-labeling in semi-supervised learning.

