A THEORETICAL STUDY OF INDUCTIVE BIASES IN CONTRASTIVE LEARNING

Abstract

Understanding self-supervised learning is important but challenging. Previous theoretical works study the role of pretraining losses, and view neural networks as general black boxes. However, the recent work of Saunshi et al. (2022) argues that the model architecture -a component largely ignored by previous works -also has significant influences on the downstream performance of selfsupervised learning. In this work, we provide the first theoretical analysis of self-supervised learning that incorporates the effect of inductive biases originating from the model class. In particular, we focus on contrastive learning -a popular self-supervised learning method that is widely used in the vision domain. We show that when the model has limited capacity, contrastive representations would recover certain special clustering structures that are compatible with the model architecture, but ignore many other clustering structures in the data distribution. As a result, our theory can capture the more realistic setting where contrastive representations have much lower dimensionality than the number of clusters in the data distribution. We instantiate our theory on several synthetic data distributions, and provide empirical evidence to support the theory.

1. INTRODUCTION

Recent years have witnessed the effectiveness of pre-trained representations, which are learned on unlabeled data with self-supervised losses and then adapted to a wide range of downstream tasks (Chen et al., 2020a; b; He et al., 2020; Caron et al., 2020; Chen et al., 2020c; Gao et al., 2021; Su et al., 2021; Chen & He, 2020; Brown et al., 2020; Radford et al., 2019) . However, understanding the empirical success of this emergent pre-training paradigm is still challenging. It requires novel mathematical frameworks and analyses beyond the classical statistical learning theory. The prevalent use of deep neural networks in self-supervised learning also adds to the mystery. Many theoretical works focus on isolating the roles of self-supervised losses, showing that they encourage the representations to capture certain structures of the unlabeled data that are helpful for downstream tasks (Arora et al., 2019; HaoChen et al., 2021; 2022; Wei et al., 2021; Xie et al., 2021; Saunshi et al., 2020) . However, these works oftentimes operate in the sufficient pre-training data (polynomial in the dimensionality) or even infinite pre-training data regime, and view the neural network as a black box. The only relevant property of neural networks in these works is that they form a parameterized model class with finite complexity measure (e.g., Rademacher complexity). Recently, Saunshi et al. (2022) argue that the pre-training loss is not the only contributor to the performance of self-supervised learning, and that previous works which view neural networks as a black box cannot tell apart the differences in downstream performance between architectures (e.g., ResNet (He et al., 2015) vs vision transformers (Dosovitskiy et al., 2020) ). Furthermore, self-supervised learning with an appropriate architecture can possibly work under more general conditions and/or with fewer pre-training data than predicted by these results on general architecture. Therefore, a more comprehensive and realistic theory needs to take into consideration the inductive biases of architecture. This paper provides the first theoretical analyses of the inductive biases of nonlinear architectures in self-supervised learning. Our theory follows the setup of the recent work by HaoChen et al. (2021) 

