LOCALIZED GRAPH CONTRASTIVE LEARNING

Abstract

Contrastive learning methods based on InfoNCE loss are popular in node representation learning tasks on graph-structured data. However, its reliance on data augmentation and its quadratic computational complexity might lead to inconsistency and inefficiency problems. To mitigate these limitations, in this paper, we introduce a simple yet effective contrastive model named Localized Graph Contrastive Learning (LOCAL-GCL in short). LOCAL-GCL consists of two key designs: 1) We fabricate the positive examples for each node directly using its firstorder neighbors, which frees our method from the reliance on carefully-designed graph augmentations; 2) To improve the efficiency of contrastive learning on graphs, we devise a kernelized contrastive loss, which could be approximately computed in linear time and space complexity with respect to the graph size. We provide theoretical analysis to justify the effectiveness and rationality of the proposed methods. Experiments on various datasets with different scales and properties demonstrate that in spite of its simplicity, LOCAL-GCL achieves quite competitive performance in self-supervised node representation learning tasks on graphs with various scales and properties.

1. INTRODUCTION

Self-supervised learning has achieved remarkable success in learning informative representations without using costly handcrafted labels (van den Oord et al., 2018; Devlin et al., 2019; Banville et al., 2021; He et al., 2020; Chen et al., 2020; Grill et al., 2020; Zhang et al., 2021; Gao et al., 2021) . Among current self-supervised learning paradigms, InfoNCE loss (van den Oord et al., 2018) based multi-view contrastive methods (He et al., 2020; Chen et al., 2020; Gao et al., 2021) are recognized as the most widely adopted ones, due to their solid theoretical foundations and strong empirical results. Generally, contrastive learning aims at maximizing the agreement between the latent representations of two views (e.g. through data augmentation) from the same input, which essentially maximizes the mutual information between the two representations (Poole et al., 2019) . Inheriting the spirits of contrastive learning on vision tasks, similar methods have been developed to deal with graphs and bring up promising results on common node-level classification benchmarks (Velickovic et al., 2019; Hassani & Ahmadi, 2020; Zhu et al., 2020b; 2021) . The challenge, however, is that prevailing contrastive learning methods rely on predefined augmentation techniques for generating positive pairs as informative training supervision. Unlike grid-structured data (e.g., images or sequences), it is non-trivial to define well-posed augmentation approaches for graph-structured data Zhu et al. ( 2021); Zhang et al. (2021) . The common practice adopted by current methods resorts to random perturbation on input node features and graph structures (You et al., 2020) , which might unexpectedly violate the underlying data generation and change the semantic information (Lee et al., 2021) . Such an issue plays as a bottleneck limiting the practical efficacy of contrastive methods on graphs. Apart from this, the InfoNCE loss function computes all-pair distance for in-batch nodes as negative pairs for contrasting signals (Zhu et al., 2020b; 2021) , which induces quadratic memory and time complexity with respect to the batch size. Given that the model is preferred to be trained in a full-graph manner (i.e., batch size = graph size) since the graph structure information might be partially lost through mini-batch partition, such a nature heavily constrains contrastive methods for scaling to large graphs. Some recent works seek negative-sample-free methods to resolve the scalability issue by harnessing asymmetric structures (Thakoor et al., 2021) or feature-level decorrelation objectives (Zhang et al., 2021) . However, these methods either lack enough theoretical justification (Thakoor et al., 2021) or

