YOUR NEIGHBORS ARE COMMUNICATING: TOWARDS POWERFUL AND SCALABLE GRAPH NEURAL NET-WORKS

Abstract

Message passing graph neural networks (GNNs) are known to have their expressiveness upper-bounded by 1-dimensional Weisfeiler-Lehman (1-WL) algorithm. To achieve more powerful GNNs, existing attempts either require ad hoc features, or involve operations that incur high time and space complexities. In this work, we propose a general and provably powerful GNN framework that preserves the scalability of message passing scheme. In particular, we first propose to empower 1-WL for graph isomorphism test by considering edges among neighbors, giving rise to NC-1-WL. The expressiveness of NC-1-WL is shown to be strictly above 1-WL and below 3-WL theoretically. Further, we propose the NC-GNN framework as a differentiable neural version of NC-1-WL. Our simple implementation of NC-GNN is provably as powerful as NC-1-WL. Experiments demonstrate that our NC-GNN achieves remarkable performance on various benchmarks.

1. INTRODUCTION

Graph Neural Networks (GNNs) (Gori et al., 2005; Scarselli et al., 2008) have been demonstrated to be effective for various graph tasks. In general, modern GNNs employ a message passing mechanism where the representation of each node is recursively updated by aggregating representations from its neighbors (Atwood & Towsley, 2016; Li et al., 2016; Kipf & Welling, 2017; Hamilton et al., 2017; Veličković et al., 2018; Xu et al., 2019; Gilmer et al., 2017) . Such message passing GNNs, however, have been shown to be at most as powerful as the 1-dimensional Weisfeiler-Lehman (1-WL) algorithm (Weisfeiler & Lehman, 1968 ) in distinguishing non-isomorphic graphs (Xu et al., 2019; Morris et al., 2019) . Thus, message passing GNNs cannot distinguish some simple graphs and cannot detect certain important structural concepts (Chen et al., 2020; Arvind et al., 2020) . Recently, a lot of efforts have been made to improve the expressiveness of message passing GNNs by considering high-dimensional WL algorithms (e.g., Morris et al. ( 2019 2022)). As thoroughly discussed in Section 5, these existing methods either rely on handcrafted/predefined/domain-specific features, or require high computational cost and memory budget. In contrast, our goal in this work is to develop a general GNN framework with provably expressive power, while maintaining the scalability of the message passing scheme. Specifically, we first propose an extension of the 1-WL algorithm, namely NC-1-WL, by considering the edges among neighbors. In other words, we incorporate the information of which two neighbors are communicating (i.e., connected) into the graph isomorphism test algorithm. To achieve this, we mathematically model the edges among neighbors as a multiset of multisets, in which each edge is represented as a multiset of two elements. We theoretically show that the expressiveness of our NC-1-WL in distinguishing non-isomorphic graphs is stricly above 1-WL and below 3-WL. Further, based on NC-1-WL, we propose a general GNN framework, known as NC-GNN, which can be considered as a differentiable neural version of NC-1-WL. We provide a simple implementation of NC-GNN that is proved to be as powerful as NC-1-WL. Compared to existing expressive GNNs, our NC-GNN is a general, provably powerful and, more importantly, scalable framework. The main question addressed in our work is how to make best use of information in the one-hop neighborhood to improve expressive power while preserving scalability. In the one-hop neighborhood of each node, the local patterns we can consider are (A) what are the neighbors and (B) how the neighbors are connected to each other. The previous message passing GNNs only consider (A). We move a significant step forward to consider (B) by modeling edges among neighbors as a multiset of multisets, thereby leading to provably expressive power and preserved scalability. From this perspective, our method is fundamentally different from existing methods that encode triangle features, such as MotifNet (Monti et al., 2018) and SIGN (Rossi et al., 2020) . Specifically, these methods employ triangle-related motif-induced adjacency matrices in their convolution and diffusion operators, respectively. The edge weight in a motif-induced adjacency matrix is obtained by multiplying the original edge weight with the frequency that each edge participates in triangle motifs. Compared to this hand-crafted way, our method is a general framework to encode how the neighbors are connected to each other, and the expressiveness of our framework can be rigorously characterized. We perform experiments on graph classification and node classification to evaluate NC-GNN comprehensively. Our NC-GNN consistently outperforms GIN, which is as powerful as 1-WL, by significant margins on various tasks. Remarkably, NC-GNN outperforms GIN by an absolute margin over 12.0 on CLUSTER in term of test accuracy. In addition, NC-GNN performs competitively, often achieves better results, compared to existing expressive GNNs, while being much more efficient.

2. PRELIMINARIES

We start by introducing notations. We represent an undirected graph as G = (V, E, X), where V is the set of nodes and E ⊆ V × V denotes the set of edges. We represent an edge {v, u} ∈ E by (v, u) or (u, v) for simplicity. X = [x 1 , • • • , x n ] T ∈ R n×d is the node feature matrix, where n = |V | is the number of nodes and x v ∈ R d represents the d-dimensional feature of node v. N v = {u ∈ V |(v, u) ∈ E} is the set of neighboring nodes of node v. A multiset is denoted as {{• • • }} and formally defined as follows. Definition 1 (Multiset). A multiset is a generalized concept of set allowing repeating elements. A multiset X can be formally represented by a 2-tuple as X = (S X , m X ), where S X is the underlying set formed by the distinct elements in the multiset and m X : S X → Z + gives the multiplicity (i.e., the number of occurrences) of the elements. If the elements in the multiset are generally drawn from a set X (i.e., S X ⊆ X ), then X is the universe of X and we denote it as X ⊆ X for ease of notation. Message passing GNNs. Modern GNNs usually follow a message passing scheme to learn node representations in graphs (Gilmer et al., 2017) . To be specific, the representation of each node is updated iteratively by aggregating the multiset of representations formed by its neighbors. In general, the ℓ-th layer of a message passing GNN can be expressed as a (ℓ) v = f aggregate (ℓ) {{h (ℓ-1) u |u ∈ N v }} , h (ℓ) v = f update (ℓ) h (ℓ-1) v , a (ℓ) v . f aggregate (ℓ) and f update (ℓ) are the parameterized functions of the ℓ-th layer. h v is the representation of node v at the ℓ-th layer and h (0) v can be initialized as x v . After employing L such layers, the final representation h (L) v can be used for prediction tasks on each node v. For graph-level problems, a graph representation h G can be obtained by applying a readout function as, h G = f readout {{h (L) v |v ∈ V }} . Definition 2 (Isomorphism). Two graphs G = (V, E, X) and H = (P, F, Y ) are isomorphic, denoted as G ≃ H, if there exists a bijective mapping g : V → P such that x v = y g(v) , ∀v ∈ V and (v, u) ∈ E iff (g(v), g(u)) ∈ F . Graph isomorphism is still an open problem without a known polynomial-time solution. Weisfeiler-Lehman algorithm. The Weisfeiler-Lehman algorithm (Weisfeiler & Lehman, 1968 ) provides a hierarchy for graph isomorphism testing problem. Its 1-dimensional form (a.k.a., 1-WL or color refinement) is a heuristic method that can efficiently distinguish a broad class of non-isomorphic graphs (Babai & Kucera, 1979) . 1-WL assigns a color c (0) v to each node v according to its initial label



); Maron et al. (2019)), exploiting subgraph information (e.g., Bodnar et al. (2021a); Zhang & Li (2021)), or adding more distinguishable features (e.g., Murphy et al. (2019); Bouritsas et al. (

