GRAPH NEURAL NETWORKS AS GRADIENT FLOWS: UNDERSTANDING GRAPH CONVOLUTIONS VIA ENERGY Anonymous

Abstract

Gradient flows are differential equations that minimize an energy functional and constitute the main descriptors of physical systems. We apply this formalism to Graph Neural Networks (GNNs) to develop new frameworks for learning on graphs as well as provide a better theoretical understanding of existing ones. We derive GNNs as a gradient flow equation of a parametric energy that provides a physics-inspired interpretation of GNNs as learning particle dynamics in the feature space. In particular, we show that in graph convolutional models (GCN), the positive/negative eigenvalues of the channel mixing matrix correspond to attractive/repulsive forces between adjacent features. We rigorously prove how the channel-mixing can learn to steer the dynamics towards low or high frequencies, which allows to deal with heterophilic graphs. We show that the same class of energies is decreasing along a larger family of GNNs; albeit not gradient flows, they retain their inductive bias. We experimentally evaluate an instance of the gradient flow framework that is principled, more efficient than GCN, and achieves competitive performance on graph datasets of varying homophily often outperforming recent baselines specifically designed to target heterophily.

1. INTRODUCTION

Graph neural networks (GNNs) (Sperduti, 1993; Goller & Kuchler, 1996; Gori et al., 2005; Scarselli et al., 2008; Bruna et al., 2014; Defferrard et al., 2016; Kipf & Welling, 2017; Battaglia et al., 2016; Gilmer et al., 2017) have become the standard ML tool for dealing with different types of relations and interactions. Limitations of GNNs that have recently attracted attention in the literature are over-smoothing (node features becoming increasingly similar with the depth of the model, see Nt & Maehara (2019) General motivations and contributions. In the spirit of neural ODEs (Haber & Ruthotto, 2018; Chen et al., 2018) , we regard (residual) GNNs as discrete dynamical systems. A fundamental idea in physics is that particles evolve by minimizing an energy: one can then study the dynamics through the functional expression of the energy. The class of differential equations that minimize an energy are called gradient flows and their extension and analysis in the context of GNNs represent the main focus of this work. We study two ways of understanding the dynamics induced by GNNs: starting from the energy functional or from the evolution equations. From energy to evolution equations: a new conceptual approach to GNNs. We propose a general framework where one parameterises an energy functional and then takes the GNN equations to follow the direction of steepest descent of such energy. We introduce a class of energy functionals that extend those adopted for label propagation (Zhou & Schölkopf, 2005) and whose gradient flow equations consist of generalized graph convolutions (GCN-type architectures, (Kipf & Welling, 2017)) with symmetric weights. We provide a physical interpretation for GNNs as multi-particle dynamics: this new framework sheds light on the role of the 'channel-mixing' matrix used in graph convolutional models as an edge-wise potential inducing attraction (repulsion) via its positive (negative) eigenvalues. We conduct theoretical analysis of the dynamics including explicit expansions of the GNN learned features, showing that differently from other continuous models, the gradient flow can learn to magnify either the low or high frequencies. This also establishes new links to techniques like residual connections and negative edge weights that have been previously used in heterophilic settings. We experimentally evaluate our framework using gradient flow equations yielding a principled variant of GCNs that is also more efficient due to weight symmetry and sharing across layers. Our experiments demonstrate competitive performance on homophilic and heterophilic graphs of varying size. From evolution equations to energy: understanding graph convolutions via multi-particle dynamics. Recent works of Cai & Wang (2020); Bodnar et al. ( 2022) studied the behaviour of the Dirichlet energy in graph convolutional models in order to determine if (over)smoothing of the features is occurring. The key idea is that the monotonicity of an energy functional along a system of equations conveys significant information about the dynamics, both in terms of its dominating effects and limit points. However, these results are restricted to the classical Dirichlet energy and assume (non-residual) graph convolutions activated by the ReLU nonlinearity. We extend this approach by proving that a much more general multi-particle energy is in fact decreasing along residual graphconvolutions with symmetric weights and with respect to a more general class of nonlinear activation functions. Our result sheds light onto the dynamics of non-linear graph convolutions showing that the 'channel-mixing' matrix used in GCN-type models can be interpreted as a potential in feature space that promotes alignment (repulsion) of adjacent node features depending on its spectrum. Outline. In Section 2 we review non-parametric instances of gradient flows on graphs: the heat equation and label propagation. In Section 3 we extend this approach to the parametric case by introducing a class of energies that generalize the one used for label propagation and whose associated gradient flows are continuous graph convolutions. We provide a physical interpretation for the energy showing that it can induce attraction and repulsion along edges. In Section 4 we discretize the gradient flow into GNN update equations and derive explicit expansions of the learned node representations highlighting how the spectrum of the channel-mixing W controls whether the dynamics is dominated by the low or high frequencies of the graph Laplacian. To our knowledge, ours is the first analysis that studies the interplay the spectral properties of the graph Laplacian and the channel mixing matrix. In Section 5 we extend the theory by showing that the same multi-particle energy introduced in Section 3 still decreases along more general graph convolutions with symmetric weights, meaning that the physics interpretation is preserved. In Section 6 we evaluate the framework for node classification on a broad range of datasets. Related work. Our analysis is related to studying GNNs as filters (Defferrard et al., 2016; Hammond et al., 2019; Balcilar et al., 2020; He et al., 2021) and adopts techniques similar to Oono & Suzuki (2020); Cai & Wang (2020). Gradient flows were adapted from geometry (Eells & Sampson, 1964) , to image processing (Kimmel et al., 1997 ), label propagation (Zhou & Schölkopf, 2005) and recently in ML (Sander et al., 2022) for the analysis of Transformers (Vaswani et al., 2017) . Our work follows the spirit of GNNs as continuous dynamical systems (Xhonneux et al., 2020; Zang & Wang, 2020; Chamberlain et al., 2021a; Eliasof et al., 2021; Chamberlain et al., 2021b; Bodnar et al., 2022; Rusch et al., 2022) . Notations. Let G = (V, E) be an undirected graph with n nodes. Its adjacency matrix A is defined as a ij = 1 if (i, j) ∈ E and zero otherwise. We let D = diag(d i ) be the degree matrix and define the normalized adjacency Ā := D -1/2 AD -1/2 . We denote by F ∈ R n×d the matrix of d-dimensional node features, by f i ∈ R d its i-th row (transposed), by f r ∈ R n its r-th column, and by vec(F) ∈ R nd the vectorization of F obtained by stacking its columns. Given a symmetric matrix B, we let λ B + , λ B denote its most positive and negative eigenvalues, respectively, and ρ B be its spectral radius. ḟ (t) denotes the temporal derivative, ⊗ is the Kronecker product and 'a.e.' means almost every w.r.t. Lebesgue measure. Proofs and additional results appear in the Appendix.



Figure1:Gradient flow dynamics: attractive and repulsive forces lead to a process able to separate heterophilic labels.

