DEEPGRAND: DEEP GRAPH NEURAL DIFFUSION

Abstract

We propose the Deep Graph Neural Diffusion (DeepGRAND), a class of continuous-depth graph neural networks based on the diffusion process on graphs. DeepGRAND leverages a data-dependent scaling term and a perturbation to the graph diffusivity to make the real part of all eigenvalues of the diffusivity matrix become negative, which ensures two favorable theoretical properties: (i) the node representation does not exponentially converge to a constant vector as the model depth increases, thus alleviating the over-smoothing issue; (ii) the stability of the model is guaranteed by controlling the norm of the node representation. Compared to the baseline GRAND, DeepGRAND mitigates the accuracy drop-off with increasing depth and improves the overall accuracy of the model. We empirically corroborate the advantage of DeepGRAND over many existing graph neural networks on various graph deep learning benchmark tasks.

1. INTRODUCTION

Graph neural networks (GNNs) and machine learning on graphs (Bronstein et al., 2017; Scarselli et al., 2008) have been successfully applied in a wide range of applications including physical modeling (Duvenaud et al., 2015; Gilmer et al., 2017; Battaglia et al., 2016) , recommender systems (Monti et al., 2017; Ying et al., 2018) , and social networks (Zhang & Chen, 2018; Qiu et al., 2018) . Recently, more advanced GNNs have been developed to further improve the performance of the models and extend their application beyond machine learning, which include the graph convolutional networks (GCNs) (Kipf & Welling, 2017 ), ChebyNet (Defferrard et al., 2016 ), GraphSAGE (Hamilton et al., 2017) , neural graph finger prints (Duvenaud et al., 2015) , message passing neural network (Gilmer et al., 2017) , graph attention networks (GATs) (Veličković et al., 2018) , and hyperbolic GNNs (Liu et al., 2019) . A well-known problem of GNNs is that the performance of the model decreases significantly with increasing depth. This phenomena is a common plight of most GNN architectures and referred to as the oversmoothing issue of GNNs (Li et al., 2018; Oono & Suzuki, 2020; Chen et al., 2020) .

1.1. MAIN CONTRIBUTIONS AND OUTLINE

In this paper, we propose Deep Graph Neural Diffusion (DeepGRAND), a class of continuousdepth graph neural networks based on the diffusion process on graphs that improves on various aspects of the baseline Graph Neural Diffusion (GRAND) (Chamberlain et al., 2021b) . At its core, DeepGRAND introduces a data-dependent scaling term and a perturbation to the diffusion dynamic. With this design, DeepGRAND attains the following advantages: 1. DeepGRAND inherits the diffusive characteristic of GRAND while significantly mitigates the over-smoothing issue. 2. DeepGRAND achieves remarkably greater performance than existing GNNs and other GRAND variants when fewer nodes are labelled as training data, meriting its use in lowlabelling rates situations. 3. Feature representation under the dynamic of DeepGRAND is guaranteed to remain bounded, ensuring numerical stability. Organization. In Section 2, we give a concise description of the over-smoothing issue of general graph neural networks and the GRAND architecture. A rigorous treatment and further discussions showing the inherent over-smoothing issue in GRAND is given in Section 3. The formulation for DeepGRAND is given in Section 4, where we also provide theoretical guarantees on the stability and ability to mitigate over-smoothing of DeepGRAND. Finally, we demonstrate the practical advantages of DeepGRAND in Section 5, showing reduced accuracy drop-off at higher depth and improved overall accuracy when compared to variants of GRAND and other popular GNNs. 1 

2. BACKGROUND

Notation. Let G = (V, E) denote a graph, where V is the vertex set with |V| = n and E is the edge set with |E| = e. For a vertex u in V, denote N u the set of vertices neighbor to u ∈ G. We denote d as the dimensionality of the features, i.e, number of features for each node. Following the convention of Goodfellow et al. ( 2016), we denote scalars by lower-or upper-case letters and vectors and matrices by lower-and upper-case boldface letters, respectively.

2.1. GRAPH NEURAL NETWORKS AND THE OVER-SMOOTHING ISSUE

As noted by Bronstein et al. (2021) , the vast majority of the literature on graph neural networks can be derived from just three basic flavours: convolutional, attentional and message-passing. The cornerstone architectures for these flavours are GCN (Kipf & Welling, 2017) , GAT (Veličković et al., 2018) and MPNN (Gilmer et al., 2017) , the last of which forms an overarching design over the other two. An update rule for all message passing neural networks can be written in the form H u = ξ X u , v∈Nu µ(X u , X v ) , where µ is a learnable message function, is a permutation-invariant aggregate function, and ξ is the update function. In the case of GCN or GAT, this can be further simplified to H u = σ   v∈Nu∪{u} a uv µ(X v )   , where a is either given by the normalized augmented adjacency matrix (GCN) or the attention mechanism (GAT), µ is a linear transformation, and σ is an activation function. The learning mechanism behind GCN was first analyzed by Li et al. (2018) , who showed that graph convolution is essentially a type of Laplacian smoothing. It was also noted that as deeper layers are stacked, the architecture risks suffering from over-smoothing. This issue makes features indistinguishable and hurts the classification accuracy, which goes against the common understanding that



.2 RELATED WORK Neural ODEs. Chen et al. (2018) introduced Neural ODEs, a class of continuous-depth neural networks with inherent residual connections. Follow-up works extended this framework through techniques such as augmentation (Dupont et al., 2019), regularization (Finlay et al., 2020), momentum (Xia et al., 2021). Continuous depth GNN was first proposed by Xhonneux et al. (2020).

