IMPROVING SUBGRAPH REPRESENTATION LEARNING VIA MULTI-VIEW AUGMENTATION

Abstract

Subgraph representation learning based on Graph Neural Network (GNN) has exhibited broad applications in scientific advancements, such as predictions of molecular structure-property relationships and collective cellular function. In particular, graph augmentation techniques have shown promising results in improving graph-based and node-based classification tasks. Still, they have rarely been explored in the existing GNN-based subgraph representation learning studies. In this study, we develop a novel multi-view augmentation mechanism to improve subgraph representation learning models and thus the accuracy of downstream prediction tasks. Our augmentation technique creates multiple variants of subgraphs and embeds these variants into the original graph to achieve highly improved training efficiency, scalability, and accuracy. Benchmark experiments on several real-world biological and physiological datasets demonstrate the superiority of our proposed multi-view augmentation techniques in subgraph representation learning.

1. INTRODUCTION

Subgraph representation learning using Graph Neural Networks (GNNs) can be broadly applied to various subgraph-related tasks in many fields of science and technology. As an outstanding example, the PPI (Protein-Protein Interaction) network (Zitnik et al., 2018) uses nodes, edges, and subgraphs to represent single proteins, their interactions, and the set of interacting proteins, respectively. GNNs can be used to predict the biological processes (PPI-BP), cell component (PPI-CC), and molecular function (PPI-MF) by classifying the functionality of a subgraph (i.e., a group of proteins) in the PPI network. Another example is to apply GNN to fragment-based quantum chemical theory where each fragment in a crystal or aggregate is a subgraph and subgraph representation learning can predict the quantitative interactions between different fragments. Although applying GNNs to subgraph-related tasks (Alsentzer et al., 2020; Kim & Oh, 2022; Wang & Zhang, 2021) starts to draw some attention, none of them have implemented graph augmentation techniques to improve task accuracy. This work presents a novel multi-view approach to augment graphs for improving accuracy of subgraph classification tasks. Inspired by the effectiveness of graph contrastive learning Hassani & Khasahmadi (2020); Zhu et al. (2020); You et al. (2020) , our basic idea is to create multiple views of a subgraph by augmenting it, learn the embedding for each of the view, and then combine the representations for predicting the label of the subgraph. The rationale behind it is that the augmented subgraphs (i.e., the multiple views) essentially form an ensemble, which could provide more robust signal in determining the properties of the subgraph. The basic idea poses a fundamental challenge in how to efficiently create augmented subgraphs. Augmenting the entire graph to produce different views of the same subgraph is not scalable because the size of the augmented graph will grow linearly with the number of views. Figure 1(c ) illustrates the problem. With only one additional view, GNNs need to conduct forward and backward propagations on two independent graphs (i.e., the original graph and the augmented graph) during training, doubling the training cost. We address the efficiency issue by embedding augmented subgraphs in the original graph, significantly decreasing the demand for GPU resources. In this case, the computation of the embeddings for the augmented subgraphs can share intermediate representations within their neighborhood. Figure 1(d ) illustrates an alternative efficient design where the augmented subgraphs are embedded into an augmented graph, instead of the original graph. We empirically validate that preserving the original view of subgraphs is essential for multi-view augmentation to improve task accuracy. In summary, this work makes the following contributions: • This work proposes a novel multi-view augmentation strategy to improve the accuracy of subgraph-based learning tasks. This study is the first to explore the benefits of graph augmentation techniques in subgraph representation learning. • The proposed multi-view augmentation strategy dynamically binds augmented subgraph views to the whole graph to drop exaggerated GPU resource consumption in order to achieve highly-improved training efficiency and task accuracy. • Empirical evaluations on three subgraph datasets demonstrate that our augmentation approach can improve existing subgraph representation learning by 0.3%-2.9% in accuracy, which is on average 1.1% higher than general graph augmentation techniques DropEdge and GAug-M.

2. RELATED WORKS

Subgraph Representation Learning Subgraph representation learning using GNNs has gained substantial attention these years (Meng et al., 2018) 



Figure 1: Illustration of graph augmentation approaches. (a) The original graph G contains two subgraphs (colored in orange and blue). (b) Augmented subgraphs are created by randomly dropping some edges in G. The new graph G ′ is called the augmented graph. (c) A graph with two independent components where G is the original graph and G ′ is the augmented graph. Learning on this graph doubles the training cost. (d) Augmented subgraphs G" are embedded into an augmented graph G ′ .

due to its broad applications in scientific domains. Outstanding examples include SubGNN (SubGraph Neural Network)(Alsentzer et al.,  2020), which routes messages for internal and border properties within sub-channels of each channel, including neighborhood, structure, and position. After that, the anchor patch is sampled and the features of the anchor patch are aggregated to the connected components of the subgraph through six sub-channels. GLASS (Wang & Zhang, 2021) employs a labeling trick(Zhang et al., 2021)  and labels nodes belonging to any subgraph to boost plain GNNs on subgraph tasks. S2N (Subgraph-To-Node) (Kim & Oh, 2022) translates subgraphs into nodes and thus reduces the scale of the input graph. These approaches focus on developing novel subgraph-based GNNs to improve task accuracy, but they have never implemented graph augmentation techniques.Graph Augmentation Data augmentation is a vital part of deep learning. Many general graph augmentation techniques have been proposed to improve task accuracy recently. For node classification tasks,Rong et al. (2020)  proposes DropEdge to randomly drop the edges in a graph to enlarge the support of the training distribution. DGI (Deep Graph Infomax)(Veličković et al., 2019)  perturbs the nodes by performing a row-wise swap of the input feature matrix while the adjacency matrix remains unchanged, generating negative samples for comparison learning and maximizing the mutual information of input and output.GAug (Zhao et al., 2021)  generates and removes edges of the graph by training an edge predictor to finally achieve the effect of high connectivity between nodes

