ORDERED GNN: ORDERING MESSAGE PASSING TO DEAL WITH HETEROPHILY AND OVER-SMOOTHING

Abstract

Most graph neural networks follow the message passing mechanism. However, it faces the over-smoothing problem when multiple times of message passing is applied to a graph, causing indistinguishable node representations and prevents the model to effectively learn dependencies between farther-away nodes. On the other hand, features of neighboring nodes with different labels are likely to be falsely mixed, resulting in the heterophily problem. In this work, we propose to order the messages passing into the node representation, with specific blocks of neurons targeted for message passing within specific hops. This is achieved by aligning the hierarchy of the rooted-tree of a central node with the ordered neurons in its node representation. Experimental results on an extensive set of datasets show that our model can simultaneously achieve the state-of-the-art in both homophily and heterophily settings, without any targeted design. Moreover, its performance maintains pretty well while the model becomes really deep, effectively preventing the over-smoothing problem. Finally, visualizing the gating vectors shows that our model learns to behave differently between homophily and heterophily settings, providing an explainable graph neural model. 1

1. INTRODUCTION

Graph neural network (GNN) (Scarselli et al., 2008; Bruna et al., 2013; Defferrard et al., 2016; Kipf & Welling, 2016; Veličković et al., 2017; Hamilton et al., 2017; Gilmer et al., 2017; Xu et al., 2018a) has become the prominent approach to learn representations for graphs, such as social networks (Li & Goldwasser, 2019) , biomedical information networks (Yan et al., 2019) , communication networks (Suárez-Varela et al., 2021) , n-body systems (Kipf et al., 2018) , etc. Most GNNs rely on the message passing mechanism (Gilmer et al., 2017) to implement the interactions between neighbouring nodes. Despite its huge success, message passing GNNs still faces two fundamental but fatal drawbacks: it fails to generalize to heterophily where neighboring nodes share dissimilar features or labels, and a simple multilayer perceptron can outperform many GNNs (Zhu et al., 2020b) , this limit GNNs extending to many real-world networks with heterophily; it is also observed the node representations become indistinguishable when stacking multiple layers, and suffers sharply performance drop, resulting in the so-called "over-smoothing" problem (Li et al., 2018) , which prevent GNNs to utilize high-order neighborhood information effectively. To address these two drawbacks, numerous approaches have been proposed. Most of them concentrate on the aggregation stage of message passing. Some design signed messages to distinguish neighbors belong to different classes (Yang et al., 2021; Bo et al., 2021; Luan et al., 2021; Yan et al., 2021) , allowing GNNs to capture high-frequency signals; Min et al. (2020) design specific filters to capture band-pass signals; some apply personalized aggregation with reinforcement learning (Lai et al., 2020) or neural architecture search (Wang et al., 2022b) ; others attempt to aggregate messages not only from the direct neighbors, but also from the embedding space (Pei et al., 2020) or higher-order neighbors (Zhu et al., 2020b) . These aggregator designs have achieved good performance, however, they primarily focus on the single-round message passing process and ignore the integration of messages from multiple hops. Another line of works focus on the effective utilization of multiple hops information, which is mainly accomplished by designing various skip-connections. Klicpera et al. (2018); Chen et al. (2020b) propose initial connection to prevent ego or local information from being "washed out" by stacking multiple GNN layers; inspired by ResNet (He et al., 2016) , some works (Li et al., 2019; Chen et al., 2020b; Cong et al., 2021) explores the application of residual connection on GNNs to improve the gradients; others combined the output of intermediate GNN layers with well-designed components, such as concat (Xu et al., 2018b; Zhu et al., 2020b) , learnable weights (Zhu et al., 2020b; Abu-El-Haija et al., 2019; Liu et al., 2020) , signed weights (Chien et al., 2020) , or RNN-like architectures (Xu et al., 2018b; Sun et al., 2019) . These works are simple yet effective, however, they can only model the information within few hops, but unable to model the information exactly at some orders, this lead to a mixing of features at different orders; besides, many of these approaches (Li et al., 2019; Abu-El-Haija et al., 2019; Chen et al., 2020b; Zhu et al., 2020b; Chien et al., 2020; Cong et al., 2021) are unable to make personalized decisions for each node. These deficiencies result in suboptimal performance. In addition to caring about the model side, other approaches focus on how to modify the graph structure. These methods are called "graph rewiring", including randomly removing edges (Rong et al., 2019) or nodes (Feng et al., 2020) , or computing a new graph with heuristic algorithms (Suresh et al., 2021; Zeng et al., 2021) . In general, these algorithms are not learnable and thus only applicable to certain graphs. Unlike the previous works, we address both problems by designing the combine stage of message passing and emphasize the importance of it. The key idea is to integrate an inductive bias from rooted-tree hierarchy, let GNN encode the neighborhood information exactly at some orders and avoid feature mixing within hops. The combine stage has been rarely focused before, most works simply implement it as a self-loop. This would result in an unreasonable mixing of node features (Zhu et al., 2020b) . To avoid this "mixing", Hamilton et al. ( 2017 2020b) concat the node representation and the aggregated message, which has been identified as an effective design to deal with heterophily (Zhu et al., 2020b) . However, keeping the embedding dimension constant across layers, the local information will be squeezed at an exponential rate. The most related work to ours is Gated GNN (Li et al., 2015) , it applys a GRU in the combine stage and strengthen the expressivity, but fails to prevent feature mixing, limiting the performance. In this paper, we present the message passing mechanism in an ordered form. That is, the neurons in the node embedding of a certain node is aligned with the hierarchies in the rooted-tree of the node. Here, by rooted-tree of a node we refer to the tree with the node itself as the root and its neighbors as its children. Recursively, for each child, its children are again the neighboring nodes of the child. (c.f. Figure 1 ). We achieve the alignment by proposing a novel ordered gating mechanism, which controls the assignment of neurons to encode subtrees with different depth. Experimental results on an extensive set of datasets show that our model could alleviate the heterophily and over-smoothing problem at the same time. Our model provides the following practical benefits: • We design the combine stage guided by the rooted-tree hierarchy, a very general topological inductive bias with least assumption about the neighbors' distribution, this allows a flexible integration of information at different orders, and leading superior performance in both heterophily and homophily. • The ordered gating mechanism prevent the mixing of node features within hops, enable us to model information at different orders. This open a door to extract similar neighborhood patterns for each node under heterophily; this also make it easy to preserve ego and local information, then effectively alleviating over-smoothing. • Our model aligns neighboring structures with blocks in node embeddings through an explicit gating mechanism, thus the gating mechanism could provide visualizations to reveal the connecting type of the data and offer explainability.



Figure 1: Aligning the hierarchy of a rooted-tree T (k) v underlying the graph with the node embedding of the root node v. Neighboring nodes within k hops of edges to v naturally form a depth k subtree. Messages passed to v from nodes within this subtree are restricted to the first P (k) v neurons in the node embedding of v.

); Xu et al. (2018b); Zhu et al. (

