EXPRESSIVE POWER OF INVARIANT AND EQUIVARIANT GRAPH NEURAL NETWORKS

Abstract

Various classes of Graph Neural Networks (GNN) have been proposed and shown to be successful in a wide range of applications with graph structured data. In this paper, we propose a theoretical framework able to compare the expressive power of these GNN architectures. The current universality theorems only apply to intractable classes of GNNs. Here, we prove the first approximation guarantees for practical GNNs, paving the way for a better understanding of their generalization. Our theoretical results are proved for invariant GNNs computing a graph embedding (permutation of the nodes of the input graph does not affect the output) and equivariant GNNs computing an embedding of the nodes (permutation of the input permutes the output). We show that Folklore Graph Neural Networks (FGNN), which are tensor based GNNs augmented with matrix multiplication are the most expressive architectures proposed so far for a given tensor order. We illustrate our results on the Quadratic Assignment Problem (a NP-Hard combinatorial problem) by showing that FGNNs are able to learn how to solve the problem, leading to much better average performances than existing algorithms (based on spectral, SDP or other GNNs architectures). On a practical side, we also implement masked tensors to handle batches of graphs of varying sizes.

1. INTRODUCTION

Graph Neural Networks (GNN) are designed to deal with graph structured data. Since a graph is not changed by permutation of its nodes, GNNs should be either invariant if they return a result that must not depend on the representation of the input (typically when building a graph embedding) or equivariant if the output must be permuted when the input is permuted (typically when building an embedding of the nodes). More fundamentally, incorporating symmetries in machine learning is a fundamental problem as it allows to reduce the number of degree of freedom to be learned. Deep learning on graphs. This paper focuses on learning deep representation of graphs with network architectures, namely GNN, designed to be invariant to permutation or equivariant by permutation. From a practical perspective, various message passing GNNs have been proposed, see Dwivedi et al. (2020) for a recent survey and benchmarking on learning tasks. In this paper, we study 3 architectures: Message passing GNN (MGNN) which is probably the most popular architecture used in practice, order-k Linear GNN (k-LGNN) proposed in Maron et al. (2018) and order-k Folklore GNN (k-FGNN) first introduced by Maron et al. (2019a) . MGNN layers are local thus highly parallelizable on GPUs which make them scalable for large sparse graphs. k-LGNN and k-FGNN are dealing with representations of graphs as tensors of order k which make them of little practical use for k ≥ 3. In order to compare these architectures, the separating power of these networks has been compared to a hierarchy of graph invariants developed for the graph isomorphism problem. Namely, for k ≥ 2, k-WL(G) are invariants based on the Weisfeiler-Lehman tests (described in Section 4.1). For each k ≥ 2, (k + 1)-WL has strictly more separating power than k-WL (in the sense that there is a pair of non-isomorphic graphs distinguishable by (k + 1)-WL and not by k-WL). GIN (which are invariant MGNN) introduced in Xu et al. ( 2018) are shown to be as powerful as 2-WL. In Maron et al. (2019a) , Geerts (2020b) and Geerts (2020a), k-LGNN are shown to be as powerful as k-WL and 2-FGNN is shown to be as powerful as 3-WL. In this paper, we extend this last result about k-FGNN to general values of k. So in term of separating power, when restricted to tensors of order k, k-FGNN is the most powerful architecture among the ones considered in this work. This means that for a given pair of graphs G and G , if (k + 1)-WL(G) = (k + 1)-WL(G ), then there exists a k-FGNN, say GNN G,G such that GNN G,G (G) = GNN G,G (G ). Approximation results for GNNs. Results on the separating power of GNNs only deal with pairwise comparison of graphs: we need a priori a different GNN for each pair of graphs in order to distinguish them. Such results are of little help in a practical learning scenario. Our main contribution in this paper overcomes this issue and we show that a single GNN can give a meaningful representation for all graphs. More precisely, we characterize the set of functions that can be approximated by MGNNs, k-LGNNs and k-FGNNs respectively. Standard Stone-Weierstrass theorem shows that if an algebra A of real continuous functions separates points, then A is dense in the set of continuous function on a compact set. Here we extend such a theorem to general functions with symmetries and apply it to invariant and equivaraint functions to get our main result for GNNs. As a consequence, we show that k-FGNNs have the best approximation power among architectures dealing with tensors of order k. Universality results for GNNs. Universal approximation theorems (similar to Cybenko (1989) 2019). They show that some classes of GNNs can approximate any function defined on graphs. To be able to approximate any invariant function, they require the use of very complex networks, namely k-LGNN where k tends to infinity with n the number of nodes. Since we prove that any invariant function less powerful than (k + 1)-WL can be approximated by a k-FGNN, letting k tends to infinity directly implies universality. Universality results for k-FGNN is another contribution of our work. Equivariant GNNs. Our second set of results extends previous analysis from invariant functions to equivariant functions. There are much less results about equivariant GNNs: Keriven & Peyré (2019) proves the universality of linear equivariant GNNs, and Maehara & Hoang (2019) shows the universality of a new class of networks they introduced. Here, we consider a natural equivariant extension of k-WL and prove that equivariant (k + 1)-LGNNs and k-FGNN can approximate any equivariant function less powerful than this equivariant (k + 1)-WL for k ≥ 1. At this stage, we should note that all universality results for GNNs by Maron et al. (2019b) ; Keriven & Peyré (2019); Chen et al. ( 2019) are easily recovered from our main results. Also our analysis is valid for graphs of varying sizes. Empirical results for the Quadratic Assigment Problem (QAP). To validate our theoretical contributions, we empirically show that 2-FGNN outperforms classical MGNN. Indeed, Maron et al. (2019a) already demonstrate state of the art results for the invariant version of 2-FGNNs (for graph classification or graph regression). Here we consider the graph alignment problem and show that the equivariant 2-FGNN is able to learn a node embedding which beats by a large margin other algorithms (based on spectral method, SDP or GNNs). Outline and contribution. After reviewing more previous works and notations in the next section, we define the various classes of GNNs studied in this paper in Section 3 : message passing GNN, linear GNN and folklore GNN. Section 4 contains our main theoretical results for GNNs. First in Section 4.2 we describe the separating power of each GNN architecture with respect to the Weisfeiler-Lehman test. In Section 4.3, we give approximation guarantees for MGNNs, LGNNs and FGNNs at fixed order of tensor. They cover both the invariant and equivariant cases and are our main theoretical contributions. For these, we develop in Section D a fine-grained Stone-Weierstrass approximation theorem for vector-valued functions with symmetries. Our theorem handles both invariant and equivariant cases and is inspired by recent works in approximation theory. In Section 6, we illustrate our theoretical results on a practical application: the graph alignment problem, a well-known NP-hard problem. We highlight a previously overlooked implementation question: the handling of batches of graphs of varying sizes. A PyTorch implementation of the code necessary to reproduce the results is available at https://github.com/mlelarge/graph_neural_net 

2. RELATED WORK

The pioneering works that applied neural networks to graphs are Gori et al. (2005) and Scarselli et al. (2009) that learn node representation with recurrent neural networks. More recent message passing architectures make use of non-linear functions of the adjacency matrix (Kipf & Welling, 2016),



for multi-layers perceptron) have been proved for linear GNNs in Maron et al. (2019b); Keriven & Peyré (2019); Chen et al. (

