SELF-SUPERVISED GRAPH-LEVEL REPRESENTATION LEARNING WITH LOCAL AND GLOBAL STRUCTURE Anonymous

Abstract

This paper focuses on unsupervised/self-supervised whole-graph representation learning, which is critical in many tasks including drug and material discovery. Current methods can effectively model the local structure between different graph instances, but they fail to discover the global semantic structure of the entire dataset. In this work, we propose a unified framework called Local-instance and Global-semantic Learning (GraphLoG) for self-supervised whole-graph representation learning. Specifically, besides preserving the local instance-level structure, GraphLoG leverages a nonparametric strategy to learn hierarchical prototypes of the data. These prototypes capture the semantic clusters in the latent space, and the number of prototypes can automatically adapt to different feature distributions. We evaluate GraphLoG by pre-training it on massive unlabeled graphs followed by fine-tuning on downstream tasks. Extensive experiments on both chemical and biological benchmark datasets demonstrate the effectiveness of our approach.

1. INTRODUCTION

Learning informative representations of whole graphs is a fundamental problem in a variety of domains and tasks, such as molecule properties prediction in drug and material discovery (Gilmer et al., 2017; Wu et al., 2018) , protein function forecast in biological networks (Alvarez & Yan, 2012; Jiang et al., 2017) , and predicting the properties of circuits in circuit design (Zhang et al., 2019) . Recently, Graph Neural Networks (GNNs) have attracted a surge of interest and showed the effectiveness in learning graph representations. These methods are usually trained in a supervised fashion, which requires a large number of labeled data. Nevertheless, in many scientific domains, labeled data are very limited and expensive to obtain. Therefore, it is becoming increasingly important to learn the representations of graphs in an unsupervised or self-supervised fashion. Self-supervised learning has recently achieved profound success for both natural language processing, e.g. GPT (Radford et al., 2018) and BERT (Devlin et al., 2019) , and image understanding, e.g. MoCo (He et al., 2019) and SimCLR (Chen et al., 2020) . However, how to effectively learn the representations of graphs in a self-supervised way is still an open problem. Intuitively, a desirable graph representation should be able to preserve the local-instance structure, so that similar graphs are embedded close to each other and dissimilar ones stay far apart. In addition, the representations of a set of graphs should also reflect the global-semantic structure of the data, so that the graphs with similar semantic properties are compactly embedded, which benefits various downstream tasks, e.g. graph classification or regression. Such structure can be sufficiently captured by semantic clusters (Caron et al., 2018; Ji et al., 2019) , especially in a hierarchical fashion (Li et al., 2020) . There are some recent works that learn graph representation in a self-supervised manner, such as local-global mutual information maximization (Velickovic et al., 2019; Sun et al., 2019) , structuralsimilarity/context prediction (Navarin et al., 2018; Hu et al., 2019; You et al., 2020) and contrastive multi-view learning (Hassani & Ahmadi, 2020) . However, all these methods are capable of modeling only the local structure between different graph instances but fail to discover the global-semantic structure. To address this shortcoming, we are seeking for an approach that is sufficient to model both the local and global structure of a given set of graphs. To attain this goal, we propose a Local-instance and Global-semantic Learning (GraphLoG) framework for self-supervised graph representation learning. In specific, for preserving the local similarity

