N -WL: A NEW HIERARCHY OF EXPRESSIVITY FOR GRAPH NEURAL NETWORKS

Abstract

The expressive power of Graph Neural Networks (GNNs) is fundamental for understanding their capabilities and limitations, i.e., what graph properties can or cannot be learnt by a GNN. Since standard GNNs have been characterised to be upper-bounded by the Weisfeiler-Lehman (1-WL) algorithm, recent attempts concentrated on developing more expressive GNNs in terms of the k-WL hierarchy, a well-established framework for graph isormorphism tests. In this work we show that, contrary to the widely accepted view, the k-WL hierarchy is not well-suited for measuring expressive GNNs. This is due to limitations that are inherent to highdimensional WL algorithms such as the lack of a natural interpretation and high computational costs, which makes it difficult to draw any firm conclusions about the expressive power of GNNs beyond 1-WL. Thus, we propose a novel hierarchy of graph isomorphism tests, namely Neighbourhood WL (N -WL), and also establish a new theorem on the equivalence of expressivity between induced connected subgraphs and induced subgraphs within this hierarchy. Further, we design a GNN model upon N -WL, Graph Neighbourhood Neural Network (G3N), and empirically verify its expressive power on synthetic and real-world benchmarks.

1. INTRODUCTION

Graph-theoretic algorithms are a powerful source of inspiration for Graph Neural Networks (GNNs). The most known is that the expressive power of standard GNNs is upper-bounded by the Weisfeiler-Lehman (1-WL) algorithm (Weisfeiler & Leman, 1968; Xu et al., 2019; Morris et al., 2019) . In pursuit of more expressive GNNs, various attempts have been made to leverage existing results in graph theory such as high-dimensional WL algorithms (Azizian & Lelarge, 2021; Maron et al., 2019a; Morris et al., 2020b) , substructure counting (Bouritsas et al., 2022; Barceló et al., 2021), and individualisation (Dupty et al., 2022) . The expressivity of these GNNs is measured in terms of the k-WL hierarchy, a well-established framework for graph isomorphism testing (Grohe, 2017). However, the k-WL hierarchy exhibits several theoretical and practical limitations as a measure of expressivity for GNNs. Theoretically, it is a highly non-trivial problem to tell if and when k-WL algorithms can distinguish two particular graphs (Kiefer, 2020) . Deciding which graph properties are important for distinguishing graphs is even much harder, if not impossible. A complete description of all subgraph patterns whose counts and occurrence are k-WL invariant is only available for k = 1 (Arvind et al., 2020) . Even bearing high computational costs, the power of k-WL algorithms in recognising graph properties seems still limited and some negative results are known, e.g., 3-WL cannot identify any k-cliques with k > 3 (Fürer, 2017). These issues hamper the practical applicability of high-dimensional WL algorithms for solving real-world tasks on graph-structured data (Chen et al., 2020; Garg et al., 2020) . A question that arises from this is -Whether the k-WL hierarchy is a good yardstick for expressivity of GNNs? In the search for an answer to this question, we observe several disparities between (standard) GNNs and the k-WL hierarchy. First, GNNs encode structural information into nodes as an efficient and practical way for graph learning. This is however against the spirit of the k-WL hierarchy which increases expressive power by going up to higher order objects, i.e., k-tuples, rather than just nodes (Cai et al., 1992; Grohe, 2017) . Second, GNNs are built upon a natural notion of local neighbourhood, i.e., within a certain distance to a node. In contrast, the k-WL hierarchy defines the neighbourhood of a k-tuple based on "adjacency". This notion of adjacency involves the enumeration of all nodes of a graph in each dimension, which is not local and raises concerns about computational efficiency (Morris et al., 2020b) . Last but not least, GNNs learn node representations by aggregating the information from its neighbouring nodes, assuming "birds of a feather flock together" from real-world perception (Zhu et al., 2020; McPherson et al., 2001) . The k-WL hierarchy updates the representation of a k-tuple by aggregating the information from its adjacent neighbours, which does not have a natural interpretation and thus makes it difficult to understand its real-world implications. In light of these observations, we explore a hierarchy of expressivity that is grounded on a new class of graph isomorphism algorithms, called Neighbourhood WL (N -WL) algorithms. This hierarchy overcomes the aforementioned limits of the k-WL hierarchy. More importantly, it enables a new paradigm for designing expressive GNNs while still remaining intuitive and computational efficient. We integrate the following novel insights into the algorithmic design of this hierarchy: (1) Instead of imposing a rigid condition that both objects and its neighbours use the same structure (e.g., k-tuple and its variants), why can we not separate them by colouring nodes in a lower-dimensional space based on information from induced subgraphs in a high-dimensional space? (2) Can we build a hierarchy of expressivity for GNNs upon a natural choice about neighbourhood, i.e., dhop neighbourhood? On one hand, this ensures the locality of neighbourhood and thus brings in computational efficiency; on the other hand, it allows high-dimensional neighbours to capture intricate graph properties into node representations for distinguishing graphs. ( 3) Unlike the k-WL hierarchy which increases expressivity only through one dimension k, the hierarchy in our work enables two independent ways of controlling expressive power: the size t of induced subgraphs and the size d of neighbourhoods, i.e., enumerating all subgraphs of order t within a d-hop neighbourhood. This helps strike a balance between computational complexity and expressivity of algorithms, which is often highly sought by real-world applications. Figure 1 shows pairs of simple graphs of eight vertices that are indistinguishable by 1-WL but can be distinguished by our proposed hierarchy N -WL under different t and d parameters. By the k-WL hierarchy, we only know that 312 pairs of simple graphs lie between 1-WL and 3-WL as none of them can be distinguished by 1-WL but all of them can be distinguished by 3-WL. Rather than "None" or "All", our N -WL hierarchy can distinguish these graphs in a more refined way under varied t and d values, i.e., each point (t, d) in Figure 1 indicates the number of pairs of simple graphs that remain indistinguishable under these parameters, and all pairs are distinguishable when t ≥ 3 and d ≥ 2. Further details and example graphs are provided in Appendix A. With a hierarchy of expressivity, it is natural to ask whether the hierarchy is strict. We thus further explore whether the N -WL hierarchy is strictly more expressive when considering induced subgraphs or neighbourhoods of larger sizes. To show the strictness, we construct counterexample graphs such that, for any fixed d ∈ N and every t ∈ N, there exist non-isomorphic graphs which N -WL with (t, d) fails to distinguish but can be distinguished by N -WL with (t+1, d); on the other hand, for any fixed t ∈ N and every d ∈ N, there also exist non-isomorphic graphs which N -WL with (t, d) fails to distinguish but can be distinguished by N -WL with (t, d+1). Not surprisingly, constructing such counterexample graphs turns out to be difficult, due to the intricate interaction between t and d as well as the combinatorial nature of graph structure. We present such a construction which can produce families of non-isomorphic graph pairs with O(t) or O(d) vertices. To understand how graph connectivity may affect the expressivity of N -WL, we go on to examine the relation between induced subgraphs and their connectivity. Inspired by the Algebra of Subgraphs (Kocay, 1982) , we discover a previously unknown connection between induced subgraphs of size t, for any t ∈ N, and induced connected subgraphs whose sizes are less than or equal to t. This surprisingly leads to the finding that these two families of subgraphs have equivalent expressive power for distinguishing graphs. Hence, when graphs are sparse, instead of considering all induced subgraphs, we may consider only induced connected subgraphs, improving efficiency considerably.



Figure 1: Indistinguishable pairs of simple graphs of eight vertices by 1-WL, which are distinguishable by N -WL under different d and t values.

