LIMITLESS STABILITY FOR GRAPH CONVOLUTIONAL NETWORKS

Abstract

This work establishes rigorous, novel and widely applicable stability guarantees and transferability bounds for graph convolutional networks -without reference to any underlying limit object or statistical distribution. Crucially, utilized graphshift operators (GSOs) are not necessarily assumed to be normal, allowing for the treatment of networks on both directed-and for the first time also undirected graphs. Stability to node-level perturbations is related to an 'adequate (spectral) covering' property of the filters in each layer. Stability to edge-level perturbations is related to Lipschitz constants and newly introduced semi-norms of filters. Results on stability to topological perturbations are obtained through recently developed mathematicalphysics based tools. As an important and novel example, it is showcased that graph convolutional networks are stable under graph-coarse-graining procedures (replacing strongly-connected sub-graphs by single nodes) precisely if the GSO is the graph Laplacian and filters are regular at infinity. These new theoretical results are supported by corresponding numerical investigations.

1. INTRODUCTION

Graph Convolutional Networks (GCNs) (Kipf & Welling, 2017; Hammond et al., 2011; Defferrard et al., 2016) generalize Euclidean convolutional networks to the graph setting by replacing convolutional filters by functional calculus filters; i.e. scalar functions applied to a suitably chosen graph-shift-oprator capturing the geometry of the underlying graph. A key concept in trying to understand the underlying reasons for the superior numerical performance of such networks on graph learning tasks (as well as a guiding principle for the design of new architectures) is the concept of stability. In the Euclidean setting, investigating stability essentially amounts to exploring the variation of the output of a network under non-trivial changes of its input (Mallat, 2012; Wiatowski & Bölcskei, 2018) . In the graph-setting, additional complications are introduced: Not only input signals, but now also the graph shift operators facilitating the convolutions on the graphs may vary. Even worse, there might also occur changes in the topology or vertex sets of the investigated graphs -e.g. when two dissimilar graphs describe the same underlying phenomenon -under which graph convolutional networks should also remain stable. This last stability property is often also referred to as transferability (Levie et al., 2019a) . Previous works investigated stability under changes in graph-shift operators for specific filters (Levie et al., 2019b; Gama et al., 2020) or the effect of graph-rewiring when choosing a specific graph shift operator (Kenlay et al., 2021) . Stability to topological perturbations has been established for (large) graphs discretising the same underlying topological space (Levie et al., 2019a) , the same graphon (Ruiz et al., 2020; Maskey et al., 2021) or for graphs drawn from the same statistical distribution (Keriven et al., 2020; Gao et al., 2021) . Common among all these previous works are two themes limiting practical applicability: First and foremost, the class of filters to which results are applicable is often severely restricted. The same is true for the class of considered graph shift operators; with non-normal operators (describing directed graphs) either explicitly or implicitly excluded. Furthermore -when investigating transferability properties -results are almost exclusively available under the assumption that graphs are large and either discretize the same underlying 'continuous' limit object suffieciently well, or are drawn from the same statistical distributions. While these are of course relevant regimes, they do not allow to draw conclusions beyond such asymptotic settings, and are for example unable to deal with certain spatial graphs, inapplicable to small-to-medium sized social networks and incapable of capturing the inherent multi-scale nature of molecular graphs (as further discussed below). Finally, hardly any work has been done on relating the stability to input-signal perturbations to network properties such as the interplay of utilized filters or employed non-linearities. The main focus of this work is to provide alleviation in this situation and develop a 'general theory of stability' for GCNs -agnostic to the types of utilized filters, graph shift operators and non-linearities; with practically relevant transferability guarantees not contingent on potentially underlying limit objects. To this end, Section 2 recapitulates the fundamentals of GCNs in a language adapted to our endeavour. Sections 3 and 4 discuss stability to node-and edge-level perturbations. Section 5 discusses stability to structural perturbations. Section 6 discusses feature aggregation and Section 7 provides numerical evidence.

2. GCNS VIA COMPLEX ANALYSIS AND OPERATOR THEORY

Throughout this work, we will use the label G to denote both a graph and its associated vertex set. Taking a signal processing approach, we consider signals on graphs as opposed to graph embeddings: Node-Signals: Node-signals on a graph are then functions from G to the complex numbers; i.e. elements of C |G| (with |G| the cardinality of G). We allow nodes i P G in a given graph to have weights µ i not necessarily equal to one and equip the space C |G| with an inner product according to xf, gy " ř iPG f piqgpiqµ i to account for this. We denote the hence created Hilbert space by 2 pGq. Characteristic Operators: Fixing an indexing of the vertices, information about connectivity within the graph is encapsulated into the set of edge weights, collected into the adjacency matrix W and (diagonal) degree matrix D. Together with the weight matrix M :" diag ´tµ i u

|G|

i"1 ¯, various standard geometry capturing characteristic operators -such as weighted adjacency matrix M ´1W , graph Laplacian ∆ :" M ´1pD ´W q and normalized graph Laplacian L :" M ´1D ´1 2 pD Ẃ qD ´1 2 can then be constructed. For undirected graphs, all of these operators are self-adjoint. On directed graphs, they need not even be normal (T ˚T " T T ˚). We shall remain agnostic to the choice of characteristic operator; differentiating only between normal and general operators in our results. Functional Calculus Filters: A crucial component of GCNs are functional calculus filters, which arise from applying a function g to an underlying characteristic operator T ; creating a new operator gpT q. Various methods of implementations exist, all of which agree if multiple are applicable: GENERIC FILTERS: If (and only if) T is normal, we may apply generic complex valued functions g to T : Writing normalized eigenvalue-eigenvector pairs of T as pλ i , φ i q |G| i"1 one defines gpT qψ " ř |G| i"1 gpλ i qxφ i , ψy 2 pGq φ i for any ψ P 2 pGq. One has }gpT q} op " sup λPσpT q |gpλq|, with σpT q denoting the spectrum of T . If g is bounded, one may obtain the T -independent bound }gpT q} op ď }g} 8 . Keeping in mind that g being defined on all of σpT q (as opposed to all of C) is clearly sufficient, we define a space of filters which will harmonize well with our concept of transferability discussed in Section 5. The introduced semi-norm will quantify the stability to perturbations in coming sections. Definition 2.1. Fix ω P C and C ą 0. Define the space F cont ω,C of continuous filters on Cztω, ωu, to be the space of multilinear power-series' gpzq " ř 8 µ,ν"0 a µν pω ´zq ´µ pω ´zq ´µ for which the semi-norm }g} F cont ω,C :" ř 8 µ,νą0 |µ `ν|C µ`ν´1 |a µν | is finite. Denoting by B pωq Ď C the open ball of radius around ω, one can show that for arbitrary δ ą 0 and every continuous function g defined on CzpB pωq Y B pωqq which is regular at infinity -i.e. satisfies lim rÑ`8 gprzq " c P C independent of which z ‰ 0 is chosen -there is a function f P F cont ω,C so that |f pzq ´gpzq| ď δ for all z P CzpB pωq Y B pωqq. In other words, functions in F cont ω,C can approximate a wide class of filters to arbitrary precision. More details are presented in Appendix B. ENTIRE FILTERS: If T is not necessarily normal, one might still consistently apply entire (i.e. everywhere complex differentiable) functions to T . Detail details on the mathematical background are given in Appendix C. Here we simply note that such a function g is representable as an (everywhere convergent) power series gpzq :" ř 8 k"0 a g k z k so that we may simply set gpT q " ř 8 k"0 a g k ¨T k . For the norm of the derived operator one easily finds }gpT q} op ď ř 8 k"0 |a g k |}T } k op using the triangle

