A MESSAGE PASSING PERSPECTIVE ON LEARNING DY-NAMICS OF CONTRASTIVE LEARNING

Abstract

In recent years, contrastive learning achieves impressive results on self-supervised visual representation learning, but there still lacks a rigorous understanding of its learning dynamics. In this paper, we show that if we cast a contrastive objective equivalently into the feature space, then its learning dynamics admits an interpretable form. Specifically, we show that its gradient descent corresponds to a specific message passing scheme on the corresponding augmentation graph. Based on this perspective, we theoretically characterize how contrastive learning gradually learns discriminative features with the alignment update and the uniformity update. Meanwhile, this perspective also establishes an intriguing connection between contrastive learning and Message Passing Graph Neural Networks (MP-GNNs). This connection not only provides a unified understanding of many techniques independently developed in each community, but also enables us to borrow techniques from MP-GNNs to design new contrastive learning variants, such as graph attention, graph rewiring, jumpy knowledge techniques, etc. We believe that our message passing perspective not only provides a new theoretical understanding of contrastive learning dynamics, but also bridges the two seemingly independent areas together, which could inspire more interleaving studies to benefit from each other. The code is available at https://github.

1. INTRODUCTION

Contrastive Learning (CL) has become arguably the most effective approach to learning visual representations from unlabeled data (Chen et al., 2020b; He et al., 2020; Chen et al., 2020c; Wang et al., 2021a; Chen et al., 2020d; 2021; Caron et al., 2021) . However, till now, we actually know little about how CL gradually learns meaningful features from unlabeled data. Recently, there has been a burst of interest in the theory of CL. However, despite the remarkable progress that has been made, existing theories of CL are established for either an arbitrary function f in the function class F (Saunshi et al., 2019; Wang et al., 2022) or the optimal f * with minimal contrastive loss (Wang & Isola, 2020; HaoChen et al., 2021; Wang et al., 2022) . Instead, a theoretical characterization of the learning dynamics is largely overlooked, which is the focus of this work. Perhaps surprisingly, we find out that the optimization dynamics of contrastive learning corresponds to a specific message passing scheme among different samples. Specifically, based on a reformulation of the alignment and uniformity losses of the contrastive loss into the feature space, we show that the derived alignment and uniformity updates actually correspond to message passing on two different graphs: the alignment update on the augmentation graph defined by data augmentations, and the uniformity update on the affinity graph defined by feature similarities. Therefore, the combined contrastive update is a competition between two message passing rules. Based on this perspective, we further show that the equilibrium of contrastive learning can be achieved when the two message rules are balanced, i.e., when the learned distribution P θ matches the ground-truth data distribution P d , which provides a clear picture for understanding the dynamics of contrastive learning.

