GRAPH NEURAL NETWORKS ARE MORE POWERFUL THAN WE THINK

Abstract

Graph Neural Networks (GNNs) are powerful convolutional architectures that have shown remarkable performance in various node-level and graph-level tasks. Despite their success, the common belief is that the expressive power of standard GNNs is limited and that they are at most as discriminative as the Weisfeiler-Lehman (WL) algorithm. In this paper we argue the opposite and show that standard GNNs, with anonymous inputs, produce more discriminative representations than the WL algorithm. In this direction, we derive an alternative analysis that employs linear algebraic tools and characterize the representational power of GNNs with respect to the eigenvalue decomposition of the graph operators. We prove that GNNs are able to generate distinctive outputs from white uninformative inputs, for, at least, all graphs that have different eigenvalues. We also show that simple convolutional architectures with white inputs, produce features that count the closed paths in the graph and are provably more expressive than the WL representations. Thorough experimental analysis on graph isomorphism and graph classification datasets corroborates our theoretical results and demonstrates the effectiveness of the proposed approach.

1. INTRODUCTION

Graph Neural Networks (GNNs) have emerged in the field of machine learning and artificial intelligence as powerful tools that process network structures and network data. Their convolutional architecture allows them to inherit all the favorable properties of convolutional neural networks (CNNs), while they also exploit the graph structure. Despite their remarkable performance, the success of GNNs is still to be demystified. A lot of research has been conducted to theoretically support the experimental developments, focusing on understanding the functionality of GNNs and analyzing their properties. In particular, permutation invariance-equivariance (Maron et al., 2018) , stability to perturbations (Gama et al., 2020) and transferability (Ruiz et al., 2020a; Levie et al., 2021) are properties tantamount to the success of the GNNs. Lately, the research focus has been shifted towards analyzing the expressive power of GNNs, since their universality depends on their ability to produce different outputs for different graphs. The common belief is that standard anonymous GNNs have limited expressive power (Xu et al., 2019) and that it is upper bounded by the expressive power of the Weisfeiler-Lehman (WL) algorithm (Weisfeiler & Leman, 1968) . This induced increased research activity towards building more expressive GNNs by either increasing their complexity, or employ independent graph algorithms to design expressive inputs. In this work we argue the opposite. We prove that standard anonymous graph convolutional structures are able to generate more expressive representations than the WL algorithm. Therefore, resorting to handcrafted features or complex GNNs to break the WL limits is not necessary.

Our work is motivated by the following research problem:

Problem definition: Given a pair of different graphs G, Ĝ and anonymous inputs X, X; is there a GNN ϕ with parameter tensor H such that ϕ (X; G, H) , ϕ X; Ĝ, H are nonisomorphic? As anonymous inputs, we define inputs that are identity and structure agnostic, i.e., they cannot distinguish graphs or nodes of the graph before processing. Why anonymous? Because if the inputs are discriminative prior to processing, concrete conclusions on the discriminative power of GNNs, cannot be derived. Analyzing GNNs with powerful input features only indicates whether GNNs will maintain or ignore valuable information, not if they can produce this information. This study does not underestimate the importance of drawing powerful input features, which is crucial for most tasks. However, it underscores the need for an alternative analysis. This paper gives an affirmative answer to the above research question. Our analysis utilizes spectral decomposition tools to show that the source of the WL test as a limit for the expressive power of GNNs is the use of the all-one input. This is expected, since analyzing the representational capacity of ϕ (X; G, H) by studying ϕ (1; G, H) cannot lead to definitive conclusions. For this reason we study GNNs with white random inputs and show that they generate discriminative outputs, for at least all graphs with different eigenvalues. In particular, we prove that ϕ (X; G, H), ϕ X; Ĝ, H belong to nonisomorphic distributions, even though the input X is drawn from the same distribution. This implies that standard anonymous GNNs are provably more expressive than the WL algorithm as they produce discriminative representations for graphs that fail the WL test, yet have different eigenvalues. In fact, having different eigenvalues is a very mild condition that is rarely not met in practice. From a practical viewpoint, using white noise as an input to a GNN may be computationally intractable. We show, however, that there are two alternative architectures that are equivalent to a GNN with white random inputs: (i) A GNN that operates on graph representations without requiring any input. (ii) A GNN in which input features are the number of closed paths each node participates. Note that these features can be viewed as the output of the first GNN layer, i.e., they can be generated from a GNN. These results also imply that ϕ (X; G, H) is more powerful than the WL algorithm even if we restrict out attention to countable inputs X. Our numerical results show that our proposed GNNs are better anonymous discriminators in some graph classification problems. Our contribution is summarized as follows: (C1) We provide a meaningful definition to characterize the representational power of GNNs and develop spectral decomposition tools to study their expressivity. (C2) We explain that the WL algorithm is not the real limit on the expressive power of anonymous GNNs, but it is associated with the all-one vector as an input. (C3) We study standard GNNs with white random inputs and show that they can produce discriminative representations for any pair of graphs with different eigenvalues. This implies that standard anonymous GNNs are provably more expressive than the WL algorithm. (C4) We prove that standard GNNs with white random inputs can count the number of closed paths of each node, which enables the design of equivalent architectures that circumvent the use of random input features. (C5) We demonstrate the effectiveness of using GNNs with white random inputs, or the proposed alternatives, vs all-one inputs in graph isomorphism and graph classification datasets.

Related work:

The first work to study the approximation properties of the GNNs was by (Scarselli et al., 2008a) . Along the same lines (Maron et al., 2019b; Keriven & Peyré, 2019) discuss the universality of GNNs for permutation invariant or equivariant functions. Then the scientific attention focused on the ability of GNNs to distinguish between nonisomorphic graphs. The works of (Morris et al., 2019; Xu et al., 2019) place the expressive power of GNNs with respect to that of the WL algorithm and prompted various follow-up works in the area. Specifically, (Abboud et al., 2021; Sato et al., 2021) use random features to increase the separation capabilities of GNNs, whereas (Tahmasebi et al., 2020; You et al., 2021; Bouritsas et al., 2022) et al., 2019a; Murphy et al., 2019; Azizian et al., 2020; Morris et al., 2020; Geerts & Reutter, 2021; Giusti et al., 2022) . These works use a tensor framework, and employ more expressive structures compared to simple GNNs. However, they are usually computationally heavier to implement and also prone to overfitting. Moreover, (Balcilar et al., 2021) design convolutions in the spectral domain to produce powerful GNNs, whereas (Loukas, 2019) studies the learning capabilities of a GNN with respect to its width and depth. Finally, (Chen et al., 2019) reveal a connection between the universal approximation and the capacity capabilities of GNNs.



compute features related to the subgraph information. (Ishiguro et al., 2020) uses label features in WL settings and (Corso et al., 2020; Beaini et al., 2021) use multiple and directional aggregators, respectively, to increase the GNN expressivity. GNNs that use k-tuple and k-subgraph information have been designed by (Maron

