NON-LOCAL GRAPH NEURAL NETWORKS

Abstract

Modern graph neural networks (GNNs) learn node embeddings through multilayer local aggregation and achieve great success in applications on assortative graphs. However, tasks on disassortative graphs usually require non-local aggregation. In addition, we find that local aggregation is even harmful for some disassortative graphs. In this work, we propose a simple yet effective non-local aggregation framework with an efficient attention-guided sorting for GNNs. Based on it, we develop various non-local GNNs. We perform thorough experiments to analyze disassortative graph datasets and evaluate our non-local GNNs. Experimental results demonstrate that our non-local GNNs significantly outperform previous state-of-the-art methods on six benchmark datasets of disassortative graphs, in terms of both model performance and efficiency.

1. INTRODUCTION

Graph neural networks (GNNs) process graphs and map each node to an embedding vector (Zhang et al., 2018b; Wu et al., 2019) . These node embeddings can be directly used for node-level applications, such as node classification (Kipf & Welling, 2017) and link prediction (Schütt et al., 2017) . In addition, they can be used to learn the graph representation vector with graph pooling (Ying et al., 2018; Zhang et al., 2018a; Lee et al., 2019; Yuan & Ji, 2020) , in order to fit graph-level tasks (Yanardag & Vishwanathan, 2015) . Many variants of GNNs have been proposed, such as ChebNets (Defferrard et al., 2016 ), GCNs (Kipf & Welling, 2017) , GraphSAGE (Hamilton et al., 2017) , GATs (Veličković et al., 2018) , LGCN (Gao et al., 2018) and GINs (Xu et al., 2019) . Their advantages have been shown on various graph datasets and tasks (Errica et al., 2020) . However, these GNNs share a multilayer local aggregation framework, which is similar to convolutional neural networks (CNNs) (LeCun et al., 1998) on grid-like data such as images and texts. In recent years, the importance of non-local aggregation has been demonstrated in many applications in the field of computer vision (Wang et al., 2018; 2020) and natural language processing (Vaswani et al., 2017) . In particular, the attention mechanism has been widely explored to achieve non-local aggregation and capture long-range dependencies from distant locations. Basically, the attention mechanism measures the similarity between every pair of locations and enables information to be communicated among distant but similar locations. In terms of graphs, non-local aggregation is also crucial for disassortative graphs, while previous studies of GNNs focus on assortative graph datasets (Section 2.2). In addition, we find that local aggregation is even harmful for some disassortative graphs (Section 4.3). The recently proposed Geom-GCN (Pei et al., 2020) explores to capture longrange dependencies in disassortative graphs. It contains an attention-like step that computes the Euclidean distance between every pair of nodes. However, this step is computationally prohibitive for large-scale graphs, as the computational complexity is quadratic in the number of nodes. In addition, Geom-GCN employs pre-trained node embeddings (Tenenbaum et al., 2000; Nickel & Kiela, 2017; Ribeiro et al., 2017) that are not task-specific, limiting the effectiveness and flexibility. In this work, we propose a simple yet effective non-local aggregation framework for GNNs. At the heart of the framework lies an efficient attention-guided sorting, which enables non-local aggregation through classic local aggregation operators in general deep learning. The proposed framework can be flexibly used to augment common GNNs with low computational costs. Based on the framework, we build various efficient non-local GNNs. In addition, we perform detailed analysis on existing disassortative graph datasets, and apply different non-local GNNs accordingly. Experimental results show that our non-local GNNs significantly outperform previous state-of-the-art methods on node classification tasks on six benchmark datasets of disassortative graphs.

