ωGNNS: DEEP GRAPH NEURAL NETWORKS EN-HANCED BY MULTIPLE PROPAGATION OPERATORS

Abstract

Graph Neural Networks (GNNs) are limited in their propagation operators. These operators often contain non-negative elements only and are shared across channels and layers, limiting the expressiveness of GNNs. Moreover, some GNNs suffer from over-smoothing, limiting their depth. On the other hand, Convolutional Neural Networks (CNNs) can learn diverse propagation filters, and phenomena like over-smoothing are typically not apparent in CNNs. In this paper, we bridge this gap by incorporating trainable channel-wise weighting factors ω to learn and mix multiple smoothing and sharpening propagation operators at each layer. Our generic method is called ωGNN, and we study two variants: ωGCN and ωGAT. For ωGCN, we theoretically analyse its behaviour and the impact of ω on the obtained node features. Our experiments confirm these findings, demonstrating and explaining how both variants do not over-smooth. Additionally, we experiment with 15 real-world datasets on node-and graph-classification tasks, where our ωGCN and ωGAT perform better or on par with state-of-the-art methods.

1. INTRODUCTION

Graph Neural Networks (GNNs) are useful for a wide array of fields, from computer vision and graphics (Monti et al., 2017; Wang et al., 2018; Eliasof & Treister, 2020) and social network analysis (Kipf & Welling, 2016; Defferrard et al., 2016) to bio-informatics (Hamilton et al., 2017; Jumper et al., 2021) . Most GNNs are defined by applications of propagation and point-wise operators, where the former is often fixed and based on the graph Laplacian (e.g., GCN (Kipf & Welling, 2016) ), or is defined by an attention mechanism (Veličković et al., 2018; Kim & Oh, 2021; Brody et al., 2022) . Most recent GNNs follow a general structure that involves two main ingredients -the propagation operator, denoted by S (l) , and a 1 × 1 convolution denoted by K (l) , as follows f (l+1) = σ(S (l) f (l) K (l) ), where f (l) denotes the feature tensor at the l-th layer. The main limitation of the above formulation is that the propagation operators in most common architectures are constrained to be non-negative. This leads to two drawbacks. First, this limits the expressiveness of GNNs. For example, the gradient of given graph node features can not be expressed by a non-negative operator, while a mixed-sign operator as in our proposed method can (see demonstrations in Fig. 1 and Fig. 2 ). Moreover, the utilization of strictly non-negative propagation operators yields a smoothing process, that may lead GNNs to suffer from over-smoothing. That is, the phenomenon where node features become indistinguishable from one and other as more GNN layers are stacked -causing severe performance degradation in deep GNNs (Li et al., 2018; Wu et al., 2019; Wang et al., 2019) . Both of the drawbacks mentioned above are not evident in Convolutional Neural Networks (CNNs), which can be interpreted as structured versions of GNNs (i.e., GNNs operating on a regular grid). The structured convolutions in CNNs allow to learn diverse propagation operators, and in particular it is known that mixed-sign kernels like sharpening filters are useful feature extractors in CNNs (Krizhevsky et al., 2012) , and such operators cannot be obtained by non-negative (smoothing) kernels only. In the context of GNNs, Eliasof et al. ( 2022) have shown the significance and benefit of employing mixed-sign propagation operators in GNNs as well. In addition, the over-smoothing phenomenon is typically not evident in standard CNNs where the propagation (spatial) filters are A third gap between GNNs and CNNs is the ability of the latter to learn and mix multiple propagation operators. In the scope of separable convolutions, CNNs typically learn a distinct kernel per channel, known as a depth-wise convolution (Sandler et al., 2018) -a key element in modern CNNs (Tan & Le, 2019; Liu et al., 2022) . On the contrary, the propagation operator S (l) from equation 1 acts on all channels (Chen et al., 2020b; Veličković et al., 2018) , and in some cases on all layers (Kipf & Welling, 2016; Wu et al., 2019) . We note that one exception is the multi-head GAT (Veličković et al., 2018) where several attention heads are learnt per layer. However, this approach typically employs only a few heads due to the high computational cost and is still limited by learning non-negative propagation operators only. In this paper we propose an effective modification to GNNs to directly address the three shortcomings of GNNs discussed above, by introducing a parameter ω to control the contribution and type of the propagation operator. We call our general approach ωGNN, and utilize GCN (Kipf & Welling, 2016) and GAT (Veličković et al., 2018) to construct two variants, ωGCN and ωGAT. We theoretically prove and empirically demonstrate that our ωGNN can prevent over-smoothing. Secondly, we show that by learning ω, our ωGNNs can yield propagation operators with mixed signs, ranging from smoothing to sharpening operators which do not exist in current GNNs (see Fig. 1 for an illustration). This approach enhances the expressiveness of the network, as demonstrated in Fig. 2 , and to the best of our knowledge, was not considered in the GNNs mentioned above that employ non-negative propagation operators only. Lastly, we propose and demonstrate that by learning different ω per layer and channel, similarly to a depth-wise convolution in CNNs, our ωGNNs obtains state-of-the-art accuracy. Our contributions are summarized as follows: • We propose ωGNN, an effective and computationally light modification to GNNs of a common and generic structure, that directly avoids over-smoothing and enhances the expressiveness of GNNs. Our method is demonstrated by ωGCN and ωGAT. • A theoretical analysis and experimental validation of the behaviour of ωGNN are provided to expose its improved expressiveness compared to standard propagation operators in GNNs. • We propose to learn multiple propagation operators by learning ω per layer and per channel and mixing them using a 1 × 1 convolution to enhance the performance of GNNs. • Our experiments with 15 real-world datasets on numerous applications and settings, from semi-and fully-supervised node classification to graph classification show that our ωGCN and ωGAT read on par or better performance than current state-of-the-art methods.

2. METHOD

We start by providing the notations that will be used throughout this paper, and displaying our general ωGNN in Sec. 2.1. Then we consider two popular GNNs that adhere to the structure presented in equation 1, namely GCN and GAT. We formulate and analyse the behaviour of their two counterparts ωGCN and ωGAT in Sec. 2.2 and 2.3, respectively.



Figure 1: The impulse response of ωGCN's propagation operator for different ω values. For ω = 0.5, 1.0 non-negative values are obtained, while for ω = 1.5 we see mixed-sign values. The dashed node starts from a feature of 1 and the rest with 0.

