A MESSAGE PASSING PERSPECTIVE ON LEARNING DY-NAMICS OF CONTRASTIVE LEARNING

Abstract

In recent years, contrastive learning achieves impressive results on self-supervised visual representation learning, but there still lacks a rigorous understanding of its learning dynamics. In this paper, we show that if we cast a contrastive objective equivalently into the feature space, then its learning dynamics admits an interpretable form. Specifically, we show that its gradient descent corresponds to a specific message passing scheme on the corresponding augmentation graph. Based on this perspective, we theoretically characterize how contrastive learning gradually learns discriminative features with the alignment update and the uniformity update. Meanwhile, this perspective also establishes an intriguing connection between contrastive learning and Message Passing Graph Neural Networks (MP-GNNs). This connection not only provides a unified understanding of many techniques independently developed in each community, but also enables us to borrow techniques from MP-GNNs to design new contrastive learning variants, such as graph attention, graph rewiring, jumpy knowledge techniques, etc. We believe that our message passing perspective not only provides a new theoretical understanding of contrastive learning dynamics, but also bridges the two seemingly independent areas together, which could inspire more interleaving studies to benefit from each other. The code is available at https://github.

1. INTRODUCTION

Contrastive Learning (CL) has become arguably the most effective approach to learning visual representations from unlabeled data (Chen et al., 2020b; He et al., 2020; Chen et al., 2020c; Wang et al., 2021a; Chen et al., 2020d; 2021; Caron et al., 2021) . However, till now, we actually know little about how CL gradually learns meaningful features from unlabeled data. Recently, there has been a burst of interest in the theory of CL. However, despite the remarkable progress that has been made, existing theories of CL are established for either an arbitrary function f in the function class F (Saunshi et al., 2019; Wang et al., 2022) or the optimal f * with minimal contrastive loss (Wang & Isola, 2020; HaoChen et al., 2021; Wang et al., 2022) . Instead, a theoretical characterization of the learning dynamics is largely overlooked, which is the focus of this work. Perhaps surprisingly, we find out that the optimization dynamics of contrastive learning corresponds to a specific message passing scheme among different samples. Specifically, based on a reformulation of the alignment and uniformity losses of the contrastive loss into the feature space, we show that the derived alignment and uniformity updates actually correspond to message passing on two different graphs: the alignment update on the augmentation graph defined by data augmentations, and the uniformity update on the affinity graph defined by feature similarities. Therefore, the combined contrastive update is a competition between two message passing rules. Based on this perspective, we further show that the equilibrium of contrastive learning can be achieved when the two message rules are balanced, i.e., when the learned distribution P θ matches the ground-truth data distribution P d , which provides a clear picture for understanding the dynamics of contrastive learning. Meanwhile, as message passing is a general paradigm in many scenarios, the message passing perspective of contrastive learning above also allows us to establish some intriguing connections to these seemingly different areas. One particular example is in graph representation learning. Message Passing Graph Neural Networks (MP-GNNs) are the prevailing designs in modern Graph Neural Networks (GNNs), including numerous variants like GCN (Kipf & Welling, 2017) , GAT (Veličković et al., 2018) , and even the Transformers (Vaswani et al., 2017) . There is a vast literature studying its diffusion dynamics and representation power (Li et al., 2018; Oono & Suzuki, 2020; Wang et al., 2021b; Li et al., 2022; Dong et al., 2021; Xu et al., 2019; 2018; Chen et al., 2022) . Therefore, establishing a connection between contrastive learning (CL) and MP-GNNs will hopefully bring new theoretical and empirical insights for understanding and designing contrastive learning methods. In this work, we illustrate this benefit from three aspects: 1) we establish formal connections between the basic message passing mechanisms in two domains; 2) based on this connection, we discover some close analogies among the representative techniques independently developed in each domain; and 3) borrowing techniques from MP-GNNs, we design two new contrastive learning variants, and demonstrate their effectiveness on benchmark datasets. We summarize our contributions as follows: • Learning Dynamics. We reformulate of the contrastive learning into the feature space and develop a new decomposition of the alignment and uniformity loss. Based on this framework, we show that the alignment and uniformity updates correspond to two different message passing schemes, and characterize the equilibrium states under the combined update. This message perspective provides a new understanding of contrastive learning dynamics. • Connecting CL and MP-GNNs. Through the message passing perspective of contrastive learning (CL), we show that we can establish an intriguing connection between CL and MP-GNNs. We not only formally establish the equivalence between alignment update and graph convolution, uniformity update and self-attention, but also point out the inherent analogies between important techniques independently developed in each domain. • New Designs Inspired by MP-GNNs. We also demonstrate the empirical benefits of this connection by designing two new contrastive learning variants borrowing techniques from MP-GNNs: one is to avoid the feature collapse of alignment update by multi-stage aggregation, and one is to adaptively align different positive samples with by incorporating the attention mechanism. Empirically, we show that both techniques leads to clear benefits on benchmark datasets. In turn, their empirical successes also help verify the validness of our established connection between CL and MP-GNNs.

2. A MESSAGE PASSING PERSPECTIVE ON CONTRASTIVE LEARNING

In this section, we develop a message passing perspective for understanding the dynamics of contrastive learning. We begin by reformulating the contrastive loss into the feature space with a new decomposition. We then study the update rules derived from the alignment and uniformity losses, and explain their behaviors from a message passing perspective. When combined together, we also characterize how the two updates strike a balance at the equilibrium states. And we finish this section with a proof-of-idea to illustrate the effectiveness of our derive message passing rules.

2.1. BACKGROUND, REFORMULATION, AND DECOMPOSITION

We begin our discussion by introducing the canonical formulation of contrastive learning methods in the parameter space, and present their equivalent formulation in the feature space. Contrastive Learning (CL). Given two positive samples (x, x + ) generated by data augmentations, and an independently sampled negative sample x ′ , we can learn an encoder f θ : R d → R m with the wide adopted InfoNCE loss (Oord et al., 2018) : L nce (θ) = -E x,x + [f θ (x) ⊤ f θ (x + )] + E x log E x ′ [exp(f θ (x) ⊤ f θ (x ′ ))], where the form term pulls positive samples (x, x + ) together by encouraging their similarity, and the latter term pushes negative pairs (x, x ′ ) apart. In practice, we typically randomly draw M negative samples to approximate the second term. In contrastive learning, the encoder is parameterized by deep neural networks, making it hardly amenable for formal analysis. Wen & Li (2021) resort to single-layer networks with strong assumptions on data distribution, but it is far from practice. Instead, in this work, we focus on the dynamics

