ωGNNS: DEEP GRAPH NEURAL NETWORKS EN-HANCED BY MULTIPLE PROPAGATION OPERATORS

Abstract

Graph Neural Networks (GNNs) are limited in their propagation operators. These operators often contain non-negative elements only and are shared across channels and layers, limiting the expressiveness of GNNs. Moreover, some GNNs suffer from over-smoothing, limiting their depth. On the other hand, Convolutional Neural Networks (CNNs) can learn diverse propagation filters, and phenomena like over-smoothing are typically not apparent in CNNs. In this paper, we bridge this gap by incorporating trainable channel-wise weighting factors ω to learn and mix multiple smoothing and sharpening propagation operators at each layer. Our generic method is called ωGNN, and we study two variants: ωGCN and ωGAT. For ωGCN, we theoretically analyse its behaviour and the impact of ω on the obtained node features. Our experiments confirm these findings, demonstrating and explaining how both variants do not over-smooth. Additionally, we experiment with 15 real-world datasets on node-and graph-classification tasks, where our ωGCN and ωGAT perform better or on par with state-of-the-art methods.

1. INTRODUCTION

Graph Neural Networks (GNNs) are useful for a wide array of fields, from computer vision and graphics (Monti et al., 2017; Wang et al., 2018; Eliasof & Treister, 2020) and social network analysis (Kipf & Welling, 2016; Defferrard et al., 2016) to bio-informatics (Hamilton et al., 2017; Jumper et al., 2021) . Most GNNs are defined by applications of propagation and point-wise operators, where the former is often fixed and based on the graph Laplacian (e.g., GCN (Kipf & Welling, 2016) ), or is defined by an attention mechanism (Veličković et al., 2018; Kim & Oh, 2021; Brody et al., 2022) . Most recent GNNs follow a general structure that involves two main ingredients -the propagation operator, denoted by S (l) , and a 1 × 1 convolution denoted by K (l) , as follows f (l+1) = σ(S (l) f (l) K (l) ), where f (l) denotes the feature tensor at the l-th layer. The main limitation of the above formulation is that the propagation operators in most common architectures are constrained to be non-negative. This leads to two drawbacks. First, this limits the expressiveness of GNNs. For example, the gradient of given graph node features can not be expressed by a non-negative operator, while a mixed-sign operator as in our proposed method can (see demonstrations in Fig. 1 and Fig. 2 ). Moreover, the utilization of strictly non-negative propagation operators yields a smoothing process, that may lead GNNs to suffer from over-smoothing. That is, the phenomenon where node features become indistinguishable from one and other as more GNN layers are stacked -causing severe performance degradation in deep GNNs (Li et al., 2018; Wu et al., 2019; Wang et al., 2019) . Both of the drawbacks mentioned above are not evident in Convolutional Neural Networks (CNNs), which can be interpreted as structured versions of GNNs (i.e., GNNs operating on a regular grid). The structured convolutions in CNNs allow to learn diverse propagation operators, and in particular it is known that mixed-sign kernels like sharpening filters are useful feature extractors in CNNs (Krizhevsky et al., 2012) , and such operators cannot be obtained by non-negative (smoothing) kernels only. In the context of GNNs, Eliasof et al. ( 2022) have shown the significance and benefit of employing mixed-sign propagation operators in GNNs as well. In addition, the over-smoothing phenomenon is typically not evident in standard CNNs where the propagation (spatial) filters are

