IS SELF-SUPERVISED CONTRASTIVE LEARNING MORE ROBUST THAN SUPERVISED LEARNING?

Abstract

Prior work on self-supervised contrastive learning has primarily focused on evaluating the recognition accuracy, but has overlooked other behavioral aspects. In addition to accuracy, distributional robustness plays a critical role in the reliability of machine learning models. We design and conduct a series of robustness tests to quantify the behavioral differences between contrastive learning and supervised learning to downstream and pre-training data distribution changes. These tests leverage data corruptions at multiple levels, ranging from pixel-level distortion to patch-level shuffling and to dataset-level distribution shift, including both natural and unnatural corruptions. Our tests unveil intriguing robustness behaviors of contrastive and supervised learning: while we generally observe that contrastive learning is more robust than supervised learning under downstream corruptions, we surprisingly discover the robustness vulnerability of contrastive learning under pixel and patch level corruptions during pre-training. Furthermore, we observe the higher dependence of contrastive learning on spatial image coherence information during pre-training, e.g., it is particularly sensitive to global patch shuffling. We explain these results by connecting to feature space uniformity and data augmentation. Our analysis has implications in improving the downstream robustness of supervised learning, and calls for more studies on understanding contrastive learning.

1. INTRODUCTION

In recent years, self-supervised contrastive learning (CL) has demonstrated tremendous potential in learning generalizable representations from unlabeled datasets (Chen et al., 2020b; He et al., 2020; Grill et al., 2020; Caron et al., 2020; Chen & He, 2021; Zhong et al., 2021b) . Current state-of-the-art CL algorithms learn representations from ImageNet (Deng et al., 2009) that match or even exceed the accuracy of their supervised learning (SL) counterparts on ImageNet and downstream tasks. However, beyond accuracy, little attention is paid on comparing other behavioral differences between contrastive learning and supervised learning, and even less work investigates the robustness during pre-training. Robustness is an important aspect to evaluate machine learning algorithms. For example, robustness to long-tail or noisy training data allows the learning algorithm to work well in a wide variety of imperfect real-world scenarios (Wang et al., 2017) . Robustness of the model output across training iterations enables anytime early-stop (Hu et al., 2019) and smoother continual 1



Figure 1: We conduct a series of robustness tests based on data distribution corruptions from micro to macro levels, to study the behavior of contrastive and supervised learning beyond accuracy. Our results reveal that contrastive learning is usually more robust than supervised learning to downstream corruptions (∆ D CL < ∆ D SL ), while shows opposite behaviors to pre-training pixel-and patch-level corruptions (∆ P CL > ∆ P SL ) and pretraining dataset-level corruptions (∆ P CL < ∆ P SL ), where ∆ is the accuracy drop from uncorrupted settings.

