MOTIF-DRIVEN CONTRASTIVE LEARNING OF GRAPH REPRESENTATIONS

Abstract

Graph motifs are significant subgraph patterns occurring frequently in graphs, and they play important roles in representing the whole graph characteristics. For example, in chemical domain, functional groups are motifs that can determine molecule properties. Mining and utilizing motifs, however, is a non-trivial task for large graph datasets. Traditional motif discovery approaches rely on exact counting or statistical estimation, which are hard to scale for large datasets with continuous and high-dimension features. In light of the significance and challenges of motif mining, we propose MICRO-Graph: a framework for MotIf-driven Contrastive leaRning Of Graph representations to: 1) pre-train Graph Neural Networks (GNNs) in a self-supervised manner to automatically extract motifs from large graph datasets; 2) leverage learned motifs to guide the contrastive learning of graph representations, which further benefit various downstream tasks. Specifically, given a graph dataset, a motif learner cluster similar and significant subgraphs into corresponding motif slots. Based on the learned motifs, a motif-guided subgraph segmenter is trained to generate more informative subgraphs, which are used to conduct graph-to-subgraph contrastive learning of GNNs. By pretraining on ogbg-molhiv molecule dataset with our proposed MICRO-Graph, the pre-trained GNN model can enhance various chemical property prediction downstream tasks with scarce label by 2.0%, which is significantly higher than other state-of-the-art self-supervised learning baselines.

1. INTRODUCTION

Graph-structured data, such as molecules and social networks, is ubiquitous in many scientific research areas and real-world applications. To represent graph characteristics, graph motifs were proposed in Milo et al. (2002) as significant subgraph patterns occurring frequently in graphs and uncovering graph structural principles. For example, functional groups are important motifs that can determine molecule properties. Like the hydroxide (-OH) usually implies higher water solubility, and for proteins, Zif268 can mediate protein-protein interactions in sequence-specific DNA-binding proteins. (Pabo et al., 2001) . Graph motifs has been studied for years. Meaningful motifs can benefit many important applications like quantum chemistry and drug discovery (Ramsundar et al., 2019) . However, extracting motifs from large graph datasets remains a challenging question. Traditional motif discovery approaches (Milo et al., 2002; Kashtan et al., 2004; Chen et al., 2006; Wernicke, 2006) rely on discrete counting or statistical estimation, which are hard to generalize to large-scale graph datasets with continuous and high-dimension features, as often the case in real-world applications. Recently, Graph Neural Networks (GNNs) have shown great expressive power for learning graph representations without explicit feature engineering (Kipf & Welling, 2016; Hamilton et al., 2017; Veličković et al., 2017; Xu et al., 2018) . In addition, GNNs can be trained in a self-supervised manner without human annotations to capture important graph structural and semantic properties (Veličković et al., 2018; Hu et al., 2020c; Qiu et al., 2020; Bai et al., 2019; Navarin et al., 2018; Wang et al., 2020; Sun et al., 2019; Hu et al., 2020b) . This motivates us to rethink about motifs as more general representations than exact structure matches and ask the following research questions: • Can we use GNNs to automatically extract graph motifs from large graph datasets? • Can we leverage the learned graph motif to benefit self-supervised GNN learning? In this paper, we propose MICRO-Graph: a framework for MotIf-driven Contrastive leaRning Of Graph representations. The key idea of this framework is to learn graph motifs as prototypical cluster centers of subgraph embeddings encoded by GNNs. In this way, the discrete counting problem is transfered to a fully-differentiable framework that can generalize to large-scale graph datasets with continuous and high-dimensional features. In addition, the learned motifs can help generate more informative subgraphs for graph-to-subgraph contrastive learning. The motif learning and contrastive learning are mutually reinforced to pre-train a more generalizable GNN encoder. For motif learning, given a graph dataset, a motif-guided subgraph segmenter generates subgraphs from each graph, and a GNN encoder turns these subgraphs into vector representations. We then learn graph motifs through clustering, where we keep the K prototypical cluster centers as representations of motifs. Similar and significant subgraphs are assigned to the same motif and become closer to their corresponding motif representation. We train our model in an Expectation-Maximization (EM) fashion to update both the motif assignment of each subgraph and the motif representations. For leveraging learned motifs, we propose a graph-to-subgraph contrastive learning framework for GNN pre-training. One of the key components for contrastive learning is to generate semantically meaningful views of each instance. For example, a continuous span within a sentence (Joshi et al., 2020) or a random crop of an image (Chen et al., 2020) . For graph data, previous approaches leverage node-level views, which is not sufficient to capture high-level graph structural information Sun et al. (2019) . As motifs can represent the key graph properties by its nature, we propose to leverage the learned motifs to generate more informative subgraph views. For example, alpha helix and beta sheet can come together as a simple ββα fold to form a zinc finger protein with unique properties. By learning such subgraph co-occurrence via contrastive learning, the pre-trained GNN can capture higher-level information of the graph that node-level contrastive can't capture. The pre-trained GNN using MICRO-Graph on the ogbg-molhiv molecule dataset can successfully learn meaningful motifs, including Benzene rings, nitro, acetate, and etc. Meanwhile, fine-tune this GNN on seven chemical property prediction benchmarks yielding 2.0% average improvement over non-pretrained GNNs and outperforming other self-supervised pre-training baselines. Also, extensive ablation studies show the significance of the learned motifs for the contrastive learning.

2. RELATED WORK

The goal of self-supervised learning is to train a model to capture significant characteristics of data without human annotations. This paper studies whether we can use such approach to automatically extract graph motifs, i.e. the significant subgraph patterns, and leverage the learned motifs to benefit self-supervised learning. In the following, we first review graph motifs especially challenges for motif mining, and then discuss approaches for pre-training GNNs in a self-supervised manner. Graph motifs are building blocks of complex graphs. They reveal the interconnections of graphs and represent graph characteristics. Mining motifs can benefit many tasks from exploratory analysis to transfer learning (Henderson et al., 2012) . For many years, various motif mining algorithms have been proposed. There are generally two categories, either exact counting as in Milo et al. (2002) ; Kashtan et al. (2004); Schreiber & Schwöbbermeyer (2005) ; Chen et al. ( 2006), or sampling and statistical estimation as in Wernicke (2006) . However, both approaches cannot scale to large graph datasets with high-dimension and continuous features, which is common in real-world applications. In this paper, we proposes to turn the discrete motif mining problem into a GNN-based differentiable cluster learning problem that can generalize to large-scale datasets. Another GNN-based work related to graph motifs is the GNNExplainer, which focuses on post-process model interpretation (Ying et al., 2019) . It can identify substructures that are important for graph property prediciton, e.g. motifs. The difference between GNNExplainer and MICRO-Graph is that the former identify motifs at a single graph level, and the later learns motifs across the whole dataset. Contrastive learning is one of the state-of-the-art self-supervised representation learning algorithms. It achieves great results for visual representation learning (Chen et al., 2020; He et al., 2019) . Contrastive learning forces views generated from the same instance (e.g. different crops of the same image) to become closer, while views from different instances apart. One key component in contrastive learning is to generate informative and diverse views from each data instance. In computer vision, researchers use various techniques, including cropping, color distortion, and Gaussian

