IMPROVING SUBGRAPH REPRESENTATION LEARNING VIA MULTI-VIEW AUGMENTATION

Abstract

Subgraph representation learning based on Graph Neural Network (GNN) has exhibited broad applications in scientific advancements, such as predictions of molecular structure-property relationships and collective cellular function. In particular, graph augmentation techniques have shown promising results in improving graph-based and node-based classification tasks. Still, they have rarely been explored in the existing GNN-based subgraph representation learning studies. In this study, we develop a novel multi-view augmentation mechanism to improve subgraph representation learning models and thus the accuracy of downstream prediction tasks. Our augmentation technique creates multiple variants of subgraphs and embeds these variants into the original graph to achieve highly improved training efficiency, scalability, and accuracy. Benchmark experiments on several real-world biological and physiological datasets demonstrate the superiority of our proposed multi-view augmentation techniques in subgraph representation learning.

1. INTRODUCTION

Subgraph representation learning using Graph Neural Networks (GNNs) can be broadly applied to various subgraph-related tasks in many fields of science and technology. As an outstanding example, the PPI (Protein-Protein Interaction) network (Zitnik et al., 2018) uses nodes, edges, and subgraphs to represent single proteins, their interactions, and the set of interacting proteins, respectively. GNNs can be used to predict the biological processes (PPI-BP), cell component (PPI-CC), and molecular function (PPI-MF) by classifying the functionality of a subgraph (i.e., a group of proteins) in the PPI network. Another example is to apply GNN to fragment-based quantum chemical theory where each fragment in a crystal or aggregate is a subgraph and subgraph representation learning can predict the quantitative interactions between different fragments. Although applying GNNs to subgraph-related tasks (Alsentzer et al., 2020; Kim & Oh, 2022; Wang & Zhang, 2021) 2020), our basic idea is to create multiple views of a subgraph by augmenting it, learn the embedding for each of the view, and then combine the representations for predicting the label of the subgraph. The rationale behind it is that the augmented subgraphs (i.e., the multiple views) essentially form an ensemble, which could provide more robust signal in determining the properties of the subgraph. The basic idea poses a fundamental challenge in how to efficiently create augmented subgraphs. Augmenting the entire graph to produce different views of the same subgraph is not scalable because the size of the augmented graph will grow linearly with the number of views. Figure 1(c ) illustrates the problem. With only one additional view, GNNs need to conduct forward and backward propagations on two independent graphs (i.e., the original graph and the augmented graph) during training, doubling the training cost. We address the efficiency issue by embedding augmented subgraphs in the original graph, significantly decreasing the demand for GPU resources. In this case, the computation of the embeddings for the augmented subgraphs can share intermediate representations within their neighborhood. Figure 1 (d) illustrates an alternative efficient design where the augmented subgraphs are embedded into an augmented graph, instead of the original graph. We empirically validate



starts to draw some attention, none of them have implemented graph augmentation techniques to improve task accuracy. This work presents a novel multi-view approach to augment graphs for improving accuracy of subgraph classification tasks. Inspired by the effectiveness of graph contrastive learning Hassani & Khasahmadi (2020); Zhu et al. (2020); You et al. (

