GOING BEYOND 1-WL EXPRESSIVE POWER WITH 1-LAYER GRAPH NEURAL NETWORKS Anonymous

Abstract

Graph neural networks have become the de facto standard for representational learning in graphs, and have achieved SOTA in many graph-related tasks such as node classification, graph classification and link prediction. However, it has been shown that the expressive power is equivalent maximally to Weisfeiler-Lehman Test. Recently, there is a line of work aiming to enhance the expressive power of graph neural networks. In this work, we propose a more generalized variant of neural Weisfeiler-Lehman test to enhance structural representation for each node in a graph to uplift the expressive power of any graph neural network. It is shown theoretically our method is strictly more powerful than 1&2-WL test. The Numerical experiments also show that our proposed method outperforms the standard GNNs on almost all the benchmark datasets by a large margin in most cases with significantly lower running time and memory consumption compared with other more powerful GNNs.

1. INTRODUCTION

Graph-structured data is ubiquitous in many real-world applications ranging from social network analysis Fan et al. (2019) , drug discovery Jiang et al. (2020) , personalized recommendation He et al. (2020) and bioinformatics Gasteiger et al. (2021) . In recent years, Graph Neural Networks (GNNs) have seized increasing attention due to their powerful expressiveness and have become dominant approaches for graph-related tasks. Message Passing Graph Neural Networks (MPGNNs) are the most common types of GNNs due to their efficiency and expressivity. MPGNNs can be viewed as a neural version of the 1-Weisfeiler-Lehman (1-WL) algorithm Weisfeiler & Leman (1968) , where colors are replaced by continuous feature vectors and neural networks are used to aggregate over node neighborhoods Morris et al. (2019) . By iteratively aggregating neighboring node features to the center node, MPGNNs learn node representations that encode their local structures and feature information. A graph readout function can be further leveraged to pool a whole-graph representation for downstream tasks such as graph classification. Despite the success of MPGNNs, it is proved in some recent literatures that the expressive power of MPGNNs is bounded by 1-WL isomorphism test (Morris et al., 2019; Xu et al., 2018a) , i.e, standard MPGNNs or 1-WL GNNs cannot distinguish any (sub-)graph structure that 1-WL cannot distinguish such as for any two n-node r-regular graphs, standard MPGNNs will output the same node representation. Since then, a few works have been proposed to enhance the expressivity of MPGNNs. Methods proposed by (Morris et al., 2019; Chen et al., 2019; Maron et al., 2019) aim at approximating high-dimensional WL tests. However, these methods require learning all node tuples, which are computationally expensive and not able to scale well to large-scale graphs. Another line of works augment node features to enhance the expressive power of GNNs. E.g., works proposed by (Loukas, 2019; Sato et al., 2021) inject one-hot features or random features to each node of a graph, while other works incorporate structural features to enhance expressivity of GNNs such as distance-based features (Zhang & Chen, 2019; Li et al., 2020) and counting features of certain substructures Bouritsas et al. (2022) . More recently, (Zhang & Li, 2021; Zhao et al., 2021) propose to leverage subgraph information that cannot be captured by 1-WL test to infer node representations. Concretely, instead of hashing the direct neighborhood information in 1-WL test, these methods hash the subgraph information, and therefore inject additional structural information in the learning process. These methods are able to strike a balance between effectiveness and running time complexity. However, scalability and memory consumption are still an issue as these methods need to materialize all the subgraphs into GPU memory. Accordingly, in this paper we tackle the above defects by proposing a lightweight module which is an extension of neural Weisfeiler-Lehman test to extract meaningful structural representations, which can be leveraged alone or plug into any MPGNN to enhance its expressive power. Our proposed method generalizes a rooted subtree by encoding a multi-hop multi-color rooted subtree, which induces a different message passing function. It is shown theoretically and empirically that our method is strictly more powerful than 1&2-WL test with significant reduction in computational complexity and memory consumption with comparable predictive performance or even superior to previous methods. Our main contributions are summarized as follows: (1) New methodologies. We develop a more generalized variant of neural WL test, where the message passing function induces a multi-hop multi-color rooted subtree instead of a rooted subtree. Our proposed methods enjoys high flexibility and can be leveraged alone or equipped with any graph neural network. (2) Theoretical justification. We show our method is provably more expressive than 1-WL GNNs with only 1 iteration of message passing. (3) High efficiency. Our method can be equipped with any base graph neural network, incurring almost no additional memory consumption while boosting the performance of the base GNN significantly. (4) Superior performance. We conduct extensive experiments in a wide variety of datasets with different tasks. Empirically our approach outperforms all the baseline GNNs by a large margin in most cases.

2. PRELIMINARY

We begin by introducing our notations, followed by presenting the concept of WL test and message passing graph neural network framework.

2.1. NOTATION

A graph can be represented as G = (V, E), where V = {v 1 , . . . , v n } is the node set and E ⊆ V × V is the edge set. X = {x v | ∀v ∈ V} is the node feature matrix and F = {e uv | ∀e uv ∈ E} denote the edge feature matrix. The k-hop neighborhood of a node v ⊆ V is the set of nodes whose distance (shortest path) to v is no greater than k and is denoted as N ≤k (v), furthermore we denote N k (v) to be the k-th hop neighbors of node v. Given a set of nodes S ⊆ V, the subgraph induced by S is a graph that has nodes in S and every endpoint of the edges is in S. The k-hop neighborhoods of node v constitute an induced subgraph denoted by G k v . We further denote D and A to be the diagonal degree matrix and adjacency matrix of G respectively, and Âk to be the k-hop neighborhood matrix of graph G. Âk (i) outputs the non-zero entries of the i-th node whose distance to it equals to k.

2.2. WEISFEILER-LEHMAN TEST

WL test is a family of very successful algorithmic heuristics used in graph isomorphism problems. 1-WL test, being the simplest one in the family, works as follows -each node is assigned the same color initially, and gets refined in each iteration by aggregating information from their neighbors' states. The refinement stabilizes after a few iterations and the algorithm outputs a representation of the graph. Two graphs with different representations are not isomorphic. The test can uniquely identify a large set of graphs up to isomorphism (Babai & Kucera, 1979) , but there are simple examples where the test tragically fails-for instance, two regular graphs with the same number of nodes and same degrees cannot be distinguished by the test. As a result, a natural extension to 1-WL test is k-WL test which provides a hierarchical testing process by keeping the state of k-tuples of nodes.

