FOSR: FIRST-ORDER SPECTRAL REWIRING FOR ADDRESSING OVERSQUASHING IN GNNS

Abstract

Graph neural networks (GNNs) are able to leverage the structure of graph data by passing messages along the edges of the graph. While this allows GNNs to learn features depending on the graph structure, for certain graph topologies it leads to inefficient information propagation and a problem known as oversquashing. This has recently been linked with the curvature and spectral gap of the graph. On the other hand, adding edges to the message-passing graph can lead to increasingly similar node representations and a problem known as oversmoothing. We propose a computationally efficient algorithm that prevents oversquashing by systematically adding edges to the graph based on spectral expansion. We combine this with a relational architecture, which lets the GNN preserve the original graph structure and provably prevents oversmoothing. We find experimentally that our algorithm outperforms existing graph rewiring methods in several graph classification tasks.

1. INTRODUCTION

Graph neural networks (GNNs) (Gori et al., 2005; Scarselli et al., 2008) are a broad class of models which process graph-structured data by passing messages between nodes of the graph. Due to the versatility of graphs, GNNs have been applied to a variety of domains, such as chemistry, social networks, knowledge graphs, and recommendation systems (Zhou et al., 2020; Wu et al., 2020) . GNNs broadly follow a message-passing framework, meaning that each layer of the GNN aggregates the representations of a node and its neighbors, and transforms these features into a new representation for that node. The aggregation function used by the GNN layer is taken to be locally permutationinvariant, since the ordering of the neighbors of a node is arbitrary, and its specific form is a key component of the GNN architecture; varying it gives rise to several common GNN variants (Kipf and Welling, 2017; Veličković et al., 2018; Li et al., 2015; Hamilton et al., 2017; Xu et al., 2019) . The output of a GNN can be used for tasks such as graph classification or node classification. Although GNNs are successful in computing dependencies between nodes of a graph, they have been found to suffer from a limited capacity to capture long-range interactions. For a fixed graph, this is caused by a variety of problems depending on the number of layers in the GNN. Since graph convolutions are local operations, a GNN with a small number of layers can only provide a node with information from nodes close to itself. For a GNN with l layers, the receptive field of a node (the set of nodes it receives messages from) is exactly the ball of radius l about the node. For small values of l, this results in "underreaching", and directly limits which functions the GNN can represent. On a related note, the functions representable by GNNs with l layers are limited to those computable by l steps of the Weisfeiler-Lehman (WL) graph isomorphism test (Morris et al., 2019; Xu et al., 2019; Barceló et al., 2020) . On the other hand, increasing the number of layers leads to its own set of problems. In contrast to other architectures that benefit from the expressivity of deeper networks, GNNs experience a decrease in accuracy as the number of layers increases (Li et al., 2018; Chen et al., 2020) . This phenomenon has partly been attributed to "oversmoothing", where repeated graph convolutions eventually render node features indistinguishable (Li et al., 2018; Oono and Suzuki, 2020; Cai and Wang, 2020; Zhao and Akoglu, 2020; Rong et al., 2020; Di Giovanni et al., 2022) . Separate from oversmoothing is the problem of "oversquashing" first pointed out by Alon and Yahav (2021) . As the number of layers of a GNN increases, information from (potentially) exponentiallygrowing receptive fields need to be concurrently propagated at each message-passing step. This leads to a bottleneck that causes oversquashing, when an exponential amount of information is squashed into fixed-size node vectors (Alon and Yahav, 2021). Consequently, for prediction tasks relying on long-range interactions, the GNN can fail. Oversquashing usually occurs when there are enough layers in the GNN to reach any node (the receptive fields are large enough), but few enough that the GNN cannot process all of the necessary relations between nodes. Hence, for a fixed graph, the problems of underreaching, oversquashing, and oversmoothing occur in three different regimes, depending on the number of layers of the GNN. A common approach to addressing oversquashing is to rewire the input graph, making changes to its edges so that it has fewer structural bottlenecks. A simple approach to rewiring is to make the last layer of the GNN fully adjacent, allowing all nodes to interact with one another (Alon and Yahav, 2021) . Alternatively, one can make changes to edges of the input graph, feeding the modified graph into all layers of the GNN (Topping et al., 2022; Banerjee et al., 2022) . The latter approaches can be viewed as optimizing the spectral gap of the input graph for alleviating structural bottlenecks and improving the overall quality of signal propagation across nodes (see Figure 1 ). While these rewiring methods improve the connectivity of the graph, there are drawbacks to making too many modifications to the input. The most obvious problem is that we are losing out on topological information about the original graph. If the structure of the original graph is indeed relevant, adding and removing edges diminishes that benefit to the task. Another issue arises from the smoothing effects of adding edges: If we add too many edges to the input graph, an ordinary GCN will suffer from oversmoothing (Li et al., 2018) . In other words, if we use this natural approach to rewiring, we experience a trade-off between oversquashing and oversmoothing. This observation, which does not seem to have been pointed out in earlier works, is the main motivation for the approach that we develop in this work.

1.1. MAIN CONTRIBUTIONS

This paper presents a new framework for rewiring a graph to reduce oversquashing in GNNs while preventing oversmoothing. Here are our main contributions: • We introduce a framework for graph rewiring which can be used with any rewiring method that sequentially adds edges. In contrast to previous approaches that only modify the input graph (e.g., Topping et al., 2022; Banerjee et al., 2022; Bober et al., 2022) , our solution gives special labels to the added edges. We then use a relational GNN on this new graph, with the relations corresponding to whether the edge was originally in the input graph or added during the rewiring. This allows us to preserve the input graph topology while using the new edges to improve its connectivity. In Theorem 3 we show that this approach also prevents oversmoothing. • We introduce a new rewiring method, FoSR (First-order Spectral Rewiring) aimed at optimizing the spectral gap of the graph input to the GNN (Algorithm 1). This algorithm computes the first-order change in the spectral gap from adding each edge, and then adds the edge which maximizes this (Theorem 4 and Proposition 5). • We empirically demonstrate that the proposed method results in faster spectral expansion (a marker of reduced oversquashing) and improved test accuracy against several baselines on several graph



Figure 1: Top: Schematic showing different rewiring methods, FoSR (ours), SDRF (Topping et al., 2022), and G-RLEF (Banerjee et al., 2022) for alleviating structural bottlenecks in the input graph. Our method adds new edges that are labeled differently from the existing ones so that the GNN can distinguish them in training. Bottom: Normalized spectral gap and training accuracy as functions of the number of rewiring iterations for a learning task modeled on the NEIGHBORSMATCH problem for a path-of-cliques input (for details, see Appendix B.1.1).

