GRADIENT GATING FOR DEEP MULTI-RATE LEARNING ON GRAPHS

Abstract

We present Gradient Gating (G 2 ), a novel framework for improving the performance of Graph Neural Networks (GNNs). Our framework is based on gating the output of GNN layers with a mechanism for multi-rate flow of message passing information across nodes of the underlying graph. Local gradients are harnessed to further modulate message passing updates. Our framework flexibly allows one to use any basic GNN layer as a wrapper around which the multi-rate gradient gating mechanism is built. We rigorously prove that G 2 alleviates the oversmoothing problem and allows the design of deep GNNs. Empirical results are presented to demonstrate that the proposed framework achieves state-of-the-art performance on a variety of graph learning tasks, including on large-scale heterophilic graphs.

1. INTRODUCTION

Learning tasks involving graph structured data arise in a wide variety of problems in science and engineering. Graph Neural Networks (GNNs) (Sperduti, 1994; Goller & Kuchler, 1996; Sperduti & Starita, 1997; Frasconi et al., 1998; Gori et al., 2005; Scarselli et al., 2008; Bruna et al., 2014; Defferrard et al., 2016; Kipf & Welling, 2017; Monti et al., 2017; Gilmer et al., 2017) are a popular deep learning architecture for graph-structured and relational data. GNNs have been successfully applied in domains including computer vision and graphics (Monti et al., 2017 ), recommender systems (Ying et al., 2018 ), transportation (Derrow-Pinion et al., 2021 ), computational chemistry (Gilmer et al., 2017) Oono & Suzuki, 2020) phenomena. Oversmoothing refers to the observation that all node features in a deep (multi-layer) GNN converge to the same constant value as the number of layers is increased. Thus, and in contrast to standard machine learning frameworks, oversmoothing inhibits the use of very deep GNNs for learning tasks. These phenomena are likely responsible for the unsatisfactory empirical performance of traditional GNN architectures in heterophilic datasets, where the features or labels of a node tend to be different from those of its neighbors (Zhu et al., 2020) . Given this context, our main goal is to present a novel framework that alleviates the oversmoothing problem and allows one to implement very deep multi-layer GNNs that can significantly improve performance in the setting of heterophilic graphs. Our starting point is the observation that in standard Message-Passing GNN architectures (MPNNs), such as GCN (Kipf & Welling, 2017) or GAT (Velickovic et al., 2018) , each node gets updated at exactly the same rate within every hidden layer. Yet, realistic learning tasks might benefit from having different rates of propagation (flow) of information on the underlying graph. This insight leads to a novel multi-rate message passing scheme capable of learning these underlying rates. Moreover, we also propose a novel procedure that harnesses graph gradients to ameliorate the oversmoothing problem. Combining these elements leads to a new architecture described in this paper, which we term Gradient Gating (G 2 ).



, drug discovery (Gaudelet et al., 2021), particle physics (Shlomi et al., 2020) and social networks. See Zhou et al. (2019); Bronstein et al. (2021) for extensive reviews. Despite the widespread success of GNNs and a plethora of different architectures, several fundamental problems still impede their efficiency on realistic learning tasks. These include the bottleneck (Alon & Yahav, 2021), oversquashing (Topping et al., 2021), and oversmoothing (Nt & Maehara, 2019;

