MEMORY-AUGMENTED DESIGN OF GRAPH NEURAL NETWORKS Anonymous authors Paper under double-blind review

Abstract

The expressive power of graph neural networks (GNN) has drawn much interest recently. Most existent work focused on measuring the expressiveness of GNN through the task of distinguishing between graphs. In this paper, we inspect the representation limits of locally unordered messaging passing (LUMP) GNN architecture through the lens of node classification. For GNNs based on permutation invariant local aggregators, we characterize graph-theoretic conditions under which such GNNs fail to discriminate simple instances, regardless of underlying architecture or network depth. To overcome this limitation, we propose a novel framework to augment GNNs with global graph information called memory augmentation. Specifically, we allow every node in the original graph to interact with a group of memory nodes. For each node, information from all the other nodes in the graph can be gleaned through the relay of the memory nodes. For proper backbone architectures like GAT and GCN, memory augmented GNNs are theoretically shown to be more expressive than LUMP GNNs. Empirical evaluations demonstrate the significant improvement of memory augmentation. In particular, memory augmented GAT and GCN are shown to either outperform or closely match state-of-the-art performance across various benchmark datasets.

1. INTRODUCTION

Graph neural networks (GNN) are a powerful tool for learning with graph-structured data, and has achieved great success on problems like node classification (Kipf & Welling, 2016 ), graph classification (Duvenaud et al., 2015) and link prediction (Grover & Leskovec, 2016) . GNNs typically follow a recursive neighborhood aggregation (or message passing) scheme (Xu et al., 2019) such that within each aggregation step, each node collects information from its neighborhood (usually feature vectors), then apply aggregation and combination mechanism to compute its new feature vector. Typically, GNN architectures differ in their design of aggregation and combination mechanisms. Popular architectures like GCN (Kipf & Welling, 2016 ), GraphSAGE (Hamilton et al., 2017 ), and GAT (Veličković et al., 2018) fall into this paradigm. Despite their empirical success, there are a couple of limitations of GNNs that update node features only based on local information. One important issue is their limited expressive power. In the graph classification setting (Xu et al., 2019) , it was shown that message passing neural networks are at most as powerful as Weisfeiler Lehman graph isomorphism tests. A more recent line of work has suggested using variants of message passing scheme that incorporates the layout of local neighborhoods (Sato et al., 2019; Klicpera et al., 2020) or spatial information of the graph (You et al., 2019) . Another problem is due to the phenomenon that the performance of GNN does not improve, or even degrades when layer size increases (Kipf & Welling, 2016; Xu et al., 2018; Li et al., 2018; Oono & Suzuki, 2020) , known as the problem of over-smoothing that makes extending the receptive path of message passing GNNs a difficult task. Many successful GNN architectures are based on stacking a few number of layers like 2 or 3 (Kipf & Welling, 2016), which could be viewed as an implicit inductive bias that node labels are determined up to neighborhoods that are a few hops away. However this assumption may not hold for many real-world data-for example, structurally similar nodes may offer strong predictive power for very distant node pairs (Donnat et al., 2018) . Several techniques are proposed for aggregating node information from a wider range (Xu et al., 2018; Klicpera et al., 2019a; b) .

