EFFICIENT BLOCK CONTRASTIVE LEARNING VIA PARAMETER-FREE META-NODE APPROXIMATION

Abstract

Contrastive learning has recently achieved remarkable success in many domains including graphs. However contrastive loss, especially for graphs, requires a large number of negative samples which is unscalable and computationally prohibitive with a quadratic time complexity. Sub-sampling is not optimal. Incorrect negative sampling leads to sampling bias. In this work, we propose a meta-node based approximation technique that can (a) proxy all negative combinations (b) in quadratic cluster size time complexity, (c) at graph level, not node level, and (d) exploit graph sparsity. By replacing node-pairs with additive cluster-pairs, we compute the negatives in cluster-time at graph level. The resulting Proxy approximated meta-node Contrastive (PamC) loss, based on simple optimized GPU operations, captures the full set of negatives, yet is efficient with a linear time complexity. By avoiding sampling, we effectively eliminate sample bias. We meet the criterion for larger number of samples, thus achieving block-contrastiveness, which is proven to outperform pair-wise losses. We use learnt soft cluster assignments for the meta-node construction, and avoid possible heterophily and noise added during edge creation. Theoretically, we show that real world graphs easily satisfy conditions necessary for our approximation. Empirically, we show promising accuracy gains over state-of-the-art graph clustering on 6 benchmarks. Importantly, we gain substantially in efficiency; up to 2x in training time and over 5x in GPU memory reduction. The code is publicly available.

1. INTRODUCTION

Discriminative approaches based on contrastive learning has been outstandingly successful in practice (Guo et al., 2017; Wang & Isola, 2020) , achieving state-of-the-art results (Chen et al., 2020a) or at times outperforming even supervised learning (Logeswaran & Lee, 2018; Chen et al., 2020b) . Specifically in graph clustering, contrastive learning can outperform traditional convolution and attention-based Graph Neural Networks (GNN) on speed and accuracy (Kulatilleke et al., 2022) . While traditional objective functions encourage similar nodes to be closer in embedding space, their penalties do not guarantee separation of unrelated graph nodes (Zhu et al., 2021a) . Differently, many modern graph embedding models (Hamilton et al., 2017; Kulatilleke et al., 2022) , use contrastive objectives. These encourage representation of positive pairs to be similar, while making features of the negatives apart in embedding space (Wang & Isola, 2020) . A typical deep model consists of a trainable encoder that generates positive and negative node embedding for the contrastive loss (Zhu et al., 2021a) . It has been shown that convolution is computationally expensive and may not be necessary for representation learning (Chen et al., 2020a) . As the requirement for contrastive loss is simply an encoder, recently researchers have been able to produce state-of-the-art results using simpler and more efficient MLP based contrastive loss implementations (Hu et al., 2021; Kulatilleke et al., 2022) . Thus, there is a rapidly expanding interest and scope for contrastive loss based models. We consider the following specific but popular (Hu et al., 2021; Kulatilleke et al., 2022) form of contrastive loss where τ is the temperature parameter, γ ij is the relationship between nodes i, j and the loss for the i th node is: ℓ i = -log B j=1 1 [j̸ =i] γ ij • exp (sim (z i , z j ) • τ ) B k=1 1 [k̸ =i] exp (sim (z i , z k ) • τ ) , 1

