DEEPGRAND: DEEP GRAPH NEURAL DIFFUSION

Abstract

We propose the Deep Graph Neural Diffusion (DeepGRAND), a class of continuous-depth graph neural networks based on the diffusion process on graphs. DeepGRAND leverages a data-dependent scaling term and a perturbation to the graph diffusivity to make the real part of all eigenvalues of the diffusivity matrix become negative, which ensures two favorable theoretical properties: (i) the node representation does not exponentially converge to a constant vector as the model depth increases, thus alleviating the over-smoothing issue; (ii) the stability of the model is guaranteed by controlling the norm of the node representation. Compared to the baseline GRAND, DeepGRAND mitigates the accuracy drop-off with increasing depth and improves the overall accuracy of the model. We empirically corroborate the advantage of DeepGRAND over many existing graph neural networks on various graph deep learning benchmark tasks.

1. INTRODUCTION

Graph neural networks (GNNs) and machine learning on graphs (Bronstein et al., 2017; Scarselli et al., 2008) have been successfully applied in a wide range of applications including physical modeling (Duvenaud et al., 2015; Gilmer et al., 2017; Battaglia et al., 2016 ), recommender systems (Monti et al., 2017; Ying et al., 2018) , and social networks (Zhang & Chen, 2018; Qiu et al., 2018) . Recently, more advanced GNNs have been developed to further improve the performance of the models and extend their application beyond machine learning, which include the graph convolutional networks (GCNs) (Kipf & Welling, 2017 ), ChebyNet (Defferrard et al., 2016 ), GraphSAGE (Hamilton et al., 2017) , neural graph finger prints (Duvenaud et al., 2015) , message passing neural network (Gilmer et al., 2017) , graph attention networks (GATs) (Veličković et al., 2018) , and hyperbolic GNNs (Liu et al., 2019) . A well-known problem of GNNs is that the performance of the model decreases significantly with increasing depth. This phenomena is a common plight of most GNN architectures and referred to as the oversmoothing issue of GNNs (Li et al., 2018; Oono & Suzuki, 2020; Chen et al., 2020) .

1.1. MAIN CONTRIBUTIONS AND OUTLINE

In this paper, we propose Deep Graph Neural Diffusion (DeepGRAND), a class of continuousdepth graph neural networks based on the diffusion process on graphs that improves on various aspects of the baseline Graph Neural Diffusion (GRAND) (Chamberlain et al., 2021b) . At its core, DeepGRAND introduces a data-dependent scaling term and a perturbation to the diffusion dynamic. With this design, DeepGRAND attains the following advantages: 1. DeepGRAND inherits the diffusive characteristic of GRAND while significantly mitigates the over-smoothing issue. 2. DeepGRAND achieves remarkably greater performance than existing GNNs and other GRAND variants when fewer nodes are labelled as training data, meriting its use in lowlabelling rates situations. 3. Feature representation under the dynamic of DeepGRAND is guaranteed to remain bounded, ensuring numerical stability. Organization. In Section 2, we give a concise description of the over-smoothing issue of general graph neural networks and the GRAND architecture. A rigorous treatment and further discussions

