CLEP: EXPLOITING EDGE PARTITIONING FOR GRAPH CONTRASTIVE LEARNING

Abstract

Generative and contrastive are two fundamental unsupervised approaches to model graph information. The graph generative models extract intra-graph information whereas the graph contrastive learning methods focus on inter-graph information. Combining these complementary sources of information can potentially enhance the expressiveness of graph representations, which, nevertheless, is underinvestigated by existing methods. In this work, we introduce a probabilistic framework called contrastive learning with edge partitioning (CLEP) that integrates generative modeling and graph contrastive learning. CLEP models edge generation by aggregating latent node interactions over multiple overlapping hidden communities. Inspired by the assembling behavior of communities in graph generation, CLEP learns community-specific graph embeddings, which are assembled together to represent the entire graph and further used to predict the graph's identity via a contrastive objective. To relate each embedding to one hidden community, we define a set of community-specific weighted edges for node feature aggregation by partitioning the observed edges according to the latent node interactions associated with the corresponding hidden community. With these unique designs, CLEP is able to model the statistical dependency among hidden communities, graph structures, as well as the identity of each graph; it can also be trained end-to-end via variational inference. We evaluate CLEP on real-world benchmarks under self-supervised and semi-supervised settings and achieve promising results, which demonstrate the effectiveness of our method. Various exploratory studies are also conducted to highlight the characteristics of the inferred hidden communities and the potential benefits they bring to representation learning.

1. INTRODUCTION

Generative modeling and contrastive learning are both commonly employed to learn graph representations without label supervision. Both types of methods learn the embedding space by leveraging some ground-truth information from the observed graphs, but the aspects of data that each type chooses to fit are different. Graph generative models (Kipf & Welling, 2016; Mehta et al., 2019; Wang et al., 2020) prioritize intra-graph information, i.e., the information in each individual graph. The representations provided by graph generative models are usually related to the formation of its own edges. Contrariwise, graph contrastive learning methods (You et al., 2020; 2021; 2022; Xie et al., 2022) focus on capturing inter-graph information, they put graphs under comparison to highlight the inherent similarity and differences among a group of graphs. The difference in the focused graph information leads to complementary strengths and weaknesses of graph generative and contratsive learning methods. The advantage of graph generative models is their ability to recover the structural information of some latent factors, which is lost during graph generation. These latent factors, relevant to each graph in the sense of its own formation, usually preserve valuable information for various graph-analytic tasks. However, the quality of the embeddings provided by graph generative models is questionable because the encoded information is limited to the"expression levels" of these latent factors, which may be insufficient to downstream tasks other than graph generation. Unlike generative models, graph contrastive learning methods cannot automatically find meaningful latent factors in the graph, but they are well recognized for producing high-quality feature representations once the raw structural information is given. An integration of graph generative modeling and graph contrastive learning potentially combines the

