ON GRAPH NEURAL NETWORKS VERSUS GRAPH-AUGMENTED MLPS

Abstract

From the perspectives of expressive power and learning, this work compares multi-layer Graph Neural Networks (GNNs) with a simplified alternative that we call Graph-Augmented Multi-Layer Perceptrons (GA-MLPs), which first augments node features with certain multi-hop operators on the graph and then applies learnable node-wise functions. From the perspective of graph isomorphism testing, we show both theoretically and numerically that GA-MLPs with suitable operators can distinguish almost all non-isomorphic graphs, just like the Weisfeiler-Lehman (WL) test and GNNs. However, by viewing them as node-level functions and examining the equivalence classes they induce on rooted graphs, we prove a separation in expressive power between GA-MLPs and GNNs that grows exponentially in depth. In particular, unlike GNNs, GA-MLPs are unable to count the number of attributed walks. We also demonstrate via community detection experiments that GA-MLPs can be limited by their choice of operator family, whereas GNNs have higher flexibility in learning.

1. INTRODUCTION

While multi-layer Graph Neural Networks (GNNs) have gained popularity for their applications in various fields, recently authors have started to investigate what their true advantages over baselines are, and whether they can be simplified. On one hand, GNNs based on neighborhood-aggregation allows the combination of information present at different nodes, and by increasing the depth of such GNNs, we increase the size of the receptive field. On the other hand, it has been pointed out that deep GNNs can suffer from issues including over-smoothing, exploding or vanishing gradients in training as well as bottleneck effects (Kipf & Welling, 2016; Li et al., 2018; Luan et al., 2019; Oono & Suzuki, 2020; Rossi et al., 2020; Alon & Yahav, 2020) . Recently, a series of models have attempted at relieving these issues of deep GNNs while retaining their benefit of combining information across nodes, using the approach of firstly augmenting the node features by propagating the original node features through powers of graph operators such as the (normalized) adjacency matrix, and secondly applying a node-wise function to the augmented node features, usually realized by a Multi-Layer Perceptron (MLP) (Wu et al., 2019; NT & Maehara, 2019; Chen et al., 2019a; Rossi et al., 2020) . Because of the usage of graph operators for augmenting the node features, we will refer to such models as Graph-Augmented MLPs (GA-MLPs). These models have achieved competitive performances on various tasks, and moreover enjoy better scalability since the augmented node features can be computed during preprocessing (Rossi et al., 2020) . Thus, it becomes natural to ask what advantages GNNs have over GA-MLPs. In this work, we ask whether GA-MLPs sacrifice expressive power compared to GNNs while gaining these advantages. A popular measure of the expressive power of GNNs is their ability to distinguish non-isomorphic graphs (Hamilton et al., 2017; Xu et al., 2019; Morris et al., 2019) . In our work, besides studying the expressive power of GA-MLPs from the viewpoint of graph isomorphism tests, we propose a new perspective that better suits the setting of node-prediction tasks: we analyze the expressive power of models including GNNs and GA-MLPs as node-level functions, or equivalently, as functions on rooted graphs. Under this perspective, we prove an exponential-in-depth gap between the expressive powers of GNNs and GA-MLPs. We illustrate this gap by finding a broad family of user-friendly functions that can be provably approximated by GNNs but not GA-MLPs, based on counting attributed walks on the graph. Moreover, via the task of community detection, we show a lack of flexibility of GA-MLPs, compared to GNNs, to learn the best operators to use. In summary, our main contributions are: • Finding graph pairs that several GA-MLPs cannot distinguish while GNNs can, but also proving there exist simple GA-MLPs that distinguish almost all non-isomorphic graphs. • From the perspective of approximating node-level functions, proving an exponential gap between the expressive power of GNNs and GA-MLPs in terms of the equivalence classes on rooted graphs that they induce. • Showing that the functions that count a particular type of attributed walk among nodes can be approximated by GNNs but not GA-MLPs both in theory and numerically. • Through community detection tasks, demonstrating that GNNs have higher flexibility in learning than GA-MLPs due to the fixed choice of the operator family in the latter. (2019) show that GNNs based on neighborhood-aggregation are no more powerful than the Weisfeiler-Lehman (WL) test for graph isomorphism (Weisfeiler & Leman, 1968) , in the sense that these GNNs cannot distinguish between any pair of non-isomorphic graphs that the WL test cannot distinguish. They also propose models that match the expressive power of the WL test. Since then, many attempts have been made to build



Depth in GNNsKipf & Welling (2016)  observe that the performance of Graph Convolutional Networks (GCNs) degrade as the depth grows too large, and the best performance is achieved with 2 or 3 layers. Along the spectral perspective on GNNs(Bruna et al., 2013; Defferrard et al., 2016;  Bronstein et al., 2017; NT & Maehara, 2019), Li et al. (2018)  andWu et al. (2019)  explain the failure of deep GCNs by the over-smoothing of the node features.Oono & Suzuki (2020)  show an exponential loss of expressive power as the depth in GCNs increases in the sense that the hidden node states tend to converge to Laplacian sub-eigenspaces as the depth increases to infinity.Alon &  Yahav (2020)  show an over-squashing effect of deep GNNs, in the sense that the width of the hidden states needs to grow exponentially in the depth in order to retain all information about long-range interactions. In comparison, our work focuses on more general GNNs based on neighborhoodaggregation that are not limited in the hidden state widths, and demonstrates the their advantage in expressive power compared to GA-MLP models at finite depth, in terms of distinguishing rooted graphs for node-prediction tasks. On the other hand, there exist examples of useful deep GNNs. Chen et al. (2019b) apply 30-layer GNNs for community detection problems, which uses a family of multi-scale operators as well as normalization steps(Ioffe & Szegedy, 2015; Ulyanov et al.,  2016).Recently, Li et al. (2019; 2020a)  and Chen et al. (2020a) build deeper GCN architectures with the help of various residual connections(He et al., 2016)  and normalization steps to achieve impressive results in standard datasets, which further highlights the need to study the role of depth in GNNs.Gong et al. (2020)  propose geometrically principled connections, which improve upon vanilla residual connections on graph-and mesh-based tasks.Existing GA-MLP-type models Motivated by better understanding GNNs as well as enhancing computational efficiency, several models of the GA-MLP type have been proposed and they achieve competitive performances on various datasets.Wu et al. (2019)  propose the Simple Graph Convolution (SGC), which removes the intermediary weights and nonlinearities in GCNs.Chen et al.  (2019a)  propose the Graph Feature Network (GFN), which further adds intermediary powers of the normalized adjacency matrix to the operator family and is applied to graph-prediction tasks.NT &  Maehara (2019)  propose the Graph Filter Neural Networks (gfNN), which enhances the SGC in the final MLP step.Rossi et al. (2020)  propose Scalable Inception Graph Neural Networks (SIGNs), which augments the operator family withPersonalized-PageRank-based (Klicpera et al., 2018; 2019)  and triangle-based (Monti et al., 2018; Chen et al., 2019b)  adjacency matrices. Expressive Power of GNNs Xu et al. (2019) and Morris et al.

