LIMITLESS STABILITY FOR GRAPH CONVOLUTIONAL NETWORKS

Abstract

This work establishes rigorous, novel and widely applicable stability guarantees and transferability bounds for graph convolutional networks -without reference to any underlying limit object or statistical distribution. Crucially, utilized graphshift operators (GSOs) are not necessarily assumed to be normal, allowing for the treatment of networks on both directed-and for the first time also undirected graphs. Stability to node-level perturbations is related to an 'adequate (spectral) covering' property of the filters in each layer. Stability to edge-level perturbations is related to Lipschitz constants and newly introduced semi-norms of filters. Results on stability to topological perturbations are obtained through recently developed mathematicalphysics based tools. As an important and novel example, it is showcased that graph convolutional networks are stable under graph-coarse-graining procedures (replacing strongly-connected sub-graphs by single nodes) precisely if the GSO is the graph Laplacian and filters are regular at infinity. These new theoretical results are supported by corresponding numerical investigations.

1. INTRODUCTION

Graph Convolutional Networks (GCNs) (Kipf & Welling, 2017; Hammond et al., 2011; Defferrard et al., 2016) generalize Euclidean convolutional networks to the graph setting by replacing convolutional filters by functional calculus filters; i.e. scalar functions applied to a suitably chosen graph-shift-oprator capturing the geometry of the underlying graph. A key concept in trying to understand the underlying reasons for the superior numerical performance of such networks on graph learning tasks (as well as a guiding principle for the design of new architectures) is the concept of stability. In the Euclidean setting, investigating stability essentially amounts to exploring the variation of the output of a network under non-trivial changes of its input (Mallat, 2012; Wiatowski & Bölcskei, 2018) . In the graph-setting, additional complications are introduced: Not only input signals, but now also the graph shift operators facilitating the convolutions on the graphs may vary. Even worse, there might also occur changes in the topology or vertex sets of the investigated graphs -e.g. when two dissimilar graphs describe the same underlying phenomenon -under which graph convolutional networks should also remain stable. This last stability property is often also referred to as transferability (Levie et al., 2019a) . Previous works investigated stability under changes in graph-shift operators for specific filters (Levie et al., 2019b; Gama et al., 2020) or the effect of graph-rewiring when choosing a specific graph shift operator (Kenlay et al., 2021) . Stability to topological perturbations has been established for (large) graphs discretising the same underlying topological space (Levie et al., 2019a) , the same graphon (Ruiz et al., 2020; Maskey et al., 2021) or for graphs drawn from the same statistical distribution (Keriven et al., 2020; Gao et al., 2021) . Common among all these previous works are two themes limiting practical applicability: First and foremost, the class of filters to which results are applicable is often severely restricted. The same is true for the class of considered graph shift operators; with non-normal operators (describing directed graphs) either explicitly or implicitly excluded. Furthermore -when investigating transferability properties -results are almost exclusively available under the assumption that graphs are large and either discretize the same underlying 'continuous' limit object suffieciently well, or are drawn from the same statistical distributions. While these are of course relevant regimes, they do not allow to draw conclusions beyond such asymptotic settings, and are for example unable to deal with certain spatial graphs, inapplicable to small-to-medium sized social networks and incapable of capturing the inherent multi-scale nature of molecular graphs (as further discussed below). Finally, hardly any work has been done on relating the stability to input-signal perturbations to network properties such as the interplay of utilized filters or employed non-linearities. The main focus of this work is to provide alleviation in this situation and develop a 'general theory of stability' for GCNs -agnostic to the types of utilized filters, graph shift operators and non-linearities; with practically relevant transferability guarantees not contingent on potentially underlying limit objects. To this end, Section 2 recapitulates the fundamentals of GCNs in a language adapted to our endeavour. Sections 3 and 4 discuss stability to node-and edge-level perturbations. Section 5 discusses stability to structural perturbations. Section 6 discusses feature aggregation and Section 7 provides numerical evidence.

2. GCNS VIA COMPLEX ANALYSIS AND OPERATOR THEORY

Throughout this work, we will use the label G to denote both a graph and its associated vertex set. Taking a signal processing approach, we consider signals on graphs as opposed to graph embeddings: Node-Signals: Node-signals on a graph are then functions from G to the complex numbers; i.e. elements of C |G| (with |G| the cardinality of G). We allow nodes i P G in a given graph to have weights µ i not necessarily equal to one and equip the space C |G| with an inner product according to xf, gy " ř iPG f piqgpiqµ i to account for this. We denote the hence created Hilbert space by 2 pGq. Characteristic Operators: Fixing an indexing of the vertices, information about connectivity within the graph is encapsulated into the set of edge weights, collected into the adjacency matrix W and (diagonal) degree matrix D. Together with the weight matrix M :" diag ´tµ i u |G| i"1 ¯, various standard geometry capturing characteristic operators -such as weighted adjacency matrix M ´1W , graph Laplacian ∆ :" M ´1pD ´W q and normalized graph Laplacian L :" M ´1D ´1 2 pD Ẃ qD ´1 2 can then be constructed. For undirected graphs, all of these operators are self-adjoint. On directed graphs, they need not even be normal (T ˚T " T T ˚). We shall remain agnostic to the choice of characteristic operator; differentiating only between normal and general operators in our results. Functional Calculus Filters: A crucial component of GCNs are functional calculus filters, which arise from applying a function g to an underlying characteristic operator T ; creating a new operator gpT q. Various methods of implementations exist, all of which agree if multiple are applicable: GENERIC FILTERS: If (and only if) T is normal, we may apply generic complex valued functions g to T : Writing normalized eigenvalue-eigenvector pairs of T as pλ i , φ i q |G| i"1 one defines gpT qψ " ř |G| i"1 gpλ i qxφ i , ψy 2 pGq φ i for any ψ P 2 pGq. One has }gpT q} op " sup λPσpT q |gpλq|, with σpT q denoting the spectrum of T . If g is bounded, one may obtain the T -independent bound }gpT q} op ď }g} 8 . Keeping in mind that g being defined on all of σpT q (as opposed to all of C) is clearly sufficient, we define a space of filters which will harmonize well with our concept of transferability discussed in Section 5. The introduced semi-norm will quantify the stability to perturbations in coming sections. Definition 2.1. Fix ω P C and C ą 0. Define the space F cont ω,C of continuous filters on Cztω, ωu, to be the space of multilinear power-series' gpzq " ř 8 µ,ν"0 a µν pω ´zq ´µ pω ´zq ´µ for which the semi-norm }g} F cont ω,C :" ř 8 µ,νą0 |µ `ν|C µ`ν´1 |a µν | is finite. Denoting by B pωq Ď C the open ball of radius around ω, one can show that for arbitrary δ ą 0 and every continuous function g defined on CzpB pωq Y B pωqq which is regular at infinity -i.e. satisfies lim rÑ`8 gprzq " c P C independent of which z ‰ 0 is chosen -there is a function f P F cont ω,C so that |f pzq ´gpzq| ď δ for all z P CzpB pωq Y B pωqq. In other words, functions in F cont ω,C can approximate a wide class of filters to arbitrary precision. More details are presented in Appendix B. ENTIRE FILTERS: If T is not necessarily normal, one might still consistently apply entire (i.e. everywhere complex differentiable) functions to T . Detail details on the mathematical background are given in Appendix C. Here we simply note that such a function g is representable as an (everywhere convergent) power series gpzq :" ř 8 k"0 a g k z k so that we may simply set gpT q " ř 8 k"0 a g k ¨T k . For the norm of the derived operator one easily finds }gpT q} op ď ř 8 k"0 |a g k |}T } k op using the triangle inequality. While entire filters have the advantage that they are easily and efficiently implementablemaking use only of matrix multiplication and addition -they suffer from the fact that it is impossible to give a }T } op -independent bound for }gpT q} op as for continuous filters. This behaviour can be traced back to the fact that no non-constant bounded entire function exists (Bak & Newman, 2017) . HOLOMORPHIC FILTERS: To define functional calculus filters that are both applicable to nonnormal T and boundable somewhat more controlably in terms of T , one may relax the condition that g be entire to demanding that g be complex differentiable (i.e. holomorphic) only on an open subset U Ď C of the complex plane. Here we assume that U extends to infinity in each direction (i.e. is the complement of a closed and bounded subset of C). For any g holomorphic on U and regular at infinity we set (with pzId ´T q ´1 the so called reolvent of T at z) gpT q :" gp8q ¨Id `1 2πi ¿ BD gpzq ¨pzId ´T q ´1dz, for any T whose spectrum σpT q is completely contained in U . Here we have used the notation gp8q " lim rÑ`8 gprzq and taken D to an open set with nicely behaved boundary BD (more precisely a Cauchy domain; c.f. Appendix C). We assume that D completely contains σpT q and that its closure D is completely contained in U . The orientation Figure 1 : Set-Visualisations of the boundary BD is the usual positive orientation on D (such that D 'is on the left' of BD; cf. Fig. 1 ). Using elementary facts from complex analysis it can be shown that the resulting operator gpT q in (1) is independent of the specific choice of D (Gindler, 1966) . While we will present results below in terms of this general definition -remaining agnostic to numerical implementation methods for the most part -it is instructive to consider a specific exemplary setting with definite and simple numerical implementation of such filters: To this end, chose an arbitrary point ω P C and set U " Cztωu in the definitions above. Any function g that is holomorphic on U and regular at 8 may then be represented by its Laurent series, which is of the form gpzq " ř 8 k"0 b g k pz ´ωq ´k (Bak & Newman, 2017) . For any T with σpT q Ď U (i.e. ω R σpT q) evaluating the integral in (1) yields (c.f. Appendix C): gpT q " 8 ÿ k"0 b g k ¨pT ´ωIdq ´k (2) Such filters have already been employed successfully, e.g. in the guise of Cayley filters (Levie et al., 2019c) , which are polynomials in z`i z´i " 1 `2i z´i . We collect them into a designated filter space: Definition 2.2. For a function gpzq " ř 8 k"0 b g k pz ´ωq ´k on U :" Cztωu define the semi-norm }g} F hol ω,C :" ř 8 k"1 |b g k |kC k´1 for C ą 0. Denote the set of such g for which }g} F hol ω,C ă 8 by F hol ω,C . In order to derive }T } op -independent bounds for }gpT q} op , we will need to norm-bound the resolvents appearing in (1) and ( 2). If T is normal, we simply have }pzId ´T q ´1} op " 1{distpz, σpT qq. In the general setting, following Post (2012) , we call any positive function γ T satisfying }pzId´T q ´1} op ď γ T pzq on CzσpT q a resolvent profile of T . Various methods (e.g. Szehr (2014) ; MichaelGil (2012)) to find resolvent profiles. Most notably Bandtlow (2004b) gives a resolvent profile solely in terms of 1{distpz, σpT qq and the departure from normality of T . We then find the following result: Lemma 2.3. For holomorphic g and generic T we have }gpT q} op ď |gp8q|`1 2π ű BD |gpzq|γ T pzqd|z|. Furthermore we have for any T with γ T pωq ď C, that }gpT q} op ď }g} F hol ω,C as long as g P F hol ω,C . Lemma 2.3 (proved in Appendix D) finally bounds }gpT q} op independently of T , as long as appearing resolvents are suitably bounded; which -importantly -does not force }T } op to be bounded.

Non-Linearities & Connecting Operators:

To each layer of our GCN, we associate a (possibly) non-linear and L n -Lipschitz-continuous function ρ n : C Ñ C satisfying ρ n p0q " 0 which acts point-wise on signals in 2 pG n q. This definition allows to choose ρ n " | ¨|, ReLu, Id or any sigmoid function shifted to preserve zero. To account for recently proposed networks where input-and 'processing' graphs are decoupled (Alon & Yahav, 2021; Topping et al., 2021) , and graph pooling layers (Lee et al., 2019) , we also allow signal representations in the hidden network layers n to live in varying graph signal spaces 2 pG n q. Connecting operators are then (not necessarily linear) operators P n : 2 pG n´1 q Ñ 2 pG n q connecting the signal utilized of subsequent layers. We assume them to be R n -Lipschitz-continuous (}P n pf q ´Pn pgq} 2 pGn´1q ď R n }f ´g} 2 pGnq q and triviality preserving (P n p0q " 0). For our original node-signal space we also write 2 pGq " 2 pG 0 q. Graph Convolutional Networks: A GCN with N layers is then constructed as follows: (3) Further let us write the collection of functional calculus filters utilized to generate the representation of this layer by tg n ij p¨q : 1 ď j ď K n´1 ; 1 ď i ď K n u. Further denoting the characteristic operator of this layer by T n , the update rule (c.f. also Fig. 2 ) from the representation in L n´1 to L n is then defined on each constituent in the direct sum L n as f n`1 i " ρ n`1 ˜Kn ÿ j"1 g n`1 ij pT n`1 qP n`1 pf n j q ¸, @1 ď i ď K n . We also denote the initial signal space by L in :" L 0 and the final one by L out :" L N . The hence constructed map from the initial to the final space is denoted by Φ : L in Ñ L out .

3. STABILITY TO INPUT SIGNAL PERTURBATIONS

In order to produce meaningful signal representations, a small input signal change should produce only a small variation in the output of our GCN. This property is quantified by the Lipschitz constant of the map Φ associated to the network, which is estimated by our first result below. Theorem 3.1. With the notation of Section 2 let Φ N : L in Ñ L out be the map associated to an N -layer GCN. We have with B n :" b sup λPσpTnq ř jPKn´1 ř iPKn |g n ij pλq| 2 for all f, h P L in that }Φ N pf q ´ΦN phq} Lout ď ˜N ź n"1 L n R n B n ¸¨}f ´h} Lin if T n is normal. For general T n we have for all tg ij u entire, holomorphic and in F hol ω,C respectively: B n :" $ ' ' ' ' & ' ' ' ' % 8 ř k"0 b ř jPKn´1 ř iPKn |pa gn ij q k | 2 ¨}T n } k op b ř jPKn´1 ř iPKn }g n ij p8q} 2 `1 2π ű BD γ T pzq b ř jPKn´1 ř iPKn |g n ij pzq| 2 d|z| b ř jPKn´1 ř iPKn }g n ij } 2 F hol ω,C Appendix E contains the corresponding proof and discusses how the derived bound are not necessarily tight for sparsely connected layers. After Lipschitz constants of connecting operators and nonlinearities are fixed, the stability constant of the network is completely controlled by the tB n u; which for normal T n in turn are controlled by the interplay of the utilized filters on the spectrum of T n . This allows to combine filters with sup λPσpTnq |g n ij pλq| " Op1q but supported on complimentary parts of the spectrum of T n while still maintaining B n " Op1q instead of Op a K n ¨Kn´1 q. In practice one might thus penalize a 'multiple covering' of the spectrum by more than one filter at a time during training in order to increase stability to input signal perturbations. If T n is not normal but filters are holomorphic, an interplay persists -with filters now evaluated on a curve and at infinity.

4. STABILITY TO EDGE PERTURBATIONS

Operators capturing graph-geometries might only be known approximately in real world tasks; e.g. if edge weights are only known to a certain level of precision. Hence it is important that graph convolutional networks be insensitive to small changes in the characteristic operators tT n u. Since we consider graphs with arbitrary vertex weights tµ g u gPG , we also have to consider the possibility that these weights are only known to a certain level of precision. In this case, not only do the characteristic operators T n , r T n differ, but also the the spaces 2 pGq, 2 p r Gq on which they act. To capture this setting mathematically, we assume in this section that there is a linear operator J : 2 pGq Ñ 2 p r Gq facilitating contact between signal spaces (of not-necessarily the same dimension). We then measure closeness of characteristic operators in the respective spaces by considering the generalized normdifference }pJT ´r T Jq}; with J translating between the respective spaces. Before investigating the stability of entire networks we first comment on single-filter stability. For normal operators we then find the following result, proved in Appendix A building on ideas first developed in (Wihler, 2009) Each K g itself is interpretable as a semi-norm. For GCNs we find the following (c.f. Appendix F): Theorem 4.3. Let Φ N , r Φ N be the maps associated to N -layer graph convolutional networks with the same non-linearities and filters, but based on different graph signal spaces 2 pGq, 2 p r Gq, characteristic operators T n , r T n and connecting operators P n , r P n . Assume B n , r B n ď B as well as R n , r R n ď R and L n ď L for some B, R, L ą 0 and all n ě 0. Assume that there are identification operators J n : 2 pG n q Ñ 2 p r G n q (0 ď n ď N ) commuting with non-linearities and connecting operators in the sense of } r P n J n´1 f ´Jn P n f } 2 p r Gnq " 0 and }ρ n pJ n f q ´Jn ρ n pf q} 2 p r Gnq " 0. Depending on whether normal or arbitrary characteristic operators are used, define D 2 n :" ř jPKn´1 ř iPKn D 2 g n ij or D 2 n :" ř jPKn´1 ř iPKn K 2 g n ij . Choose D such that D n ď D for all n. Finally assume that }J n T n ´r T n J n } ˚ď δ and with ˚" F if both operators are normal and ˚" op otherwise. Then we have for all f P L in and with J n the operator that the K n copies of J n induce through concatenation that } r ΦpJ 0 f q ´JN Φpf q} Ă Lout ď N ¨DRL ¨pBRLq N ´1 ¨}f } Lin ¨δ. The result persists with slightly altered constants, if identification operators only almost commute with non-linearities and/or connecting operators, as Appendix G further elucidates. Since we estimated various constants (B n , D n , ...) of the individual layers by global ones, the derived stability constant is clearly not tight. However it portrays requirements for stability to edge level perturbations well: While the (spectral) interplay of Section 3 remains important, it is now especially large single-filter stability constants in the sense of Lemmata 4.1 and 4.2 that should be penalized during training.

5. STABILITY TO STRUCTURAL PERTURBATIONS: TRANSFERABILITY

While the demand that } r T J ´JT } be small in some norm is well adapted to capture some notions of closeness of graphs and characteristic operators, it is too stringent to capture others. As an illustrative example, further developed in Section 5.2 and numerically investigated in Section 7 below, suppose we are given a connected undirected graph with all edge weights of order Op1{δq. With the Laplacian as characteristic operator (governing heat-flow in Physics (Cole, 2011 )), we may think of this graph as modelling an array of coupled heat reservoirs with edge weights corresponding to heat-conductivities. As 1{δ Ñ 8, the conductivities between respective nodes tend to infinity, heat exchange is instantaneous and all nodes act as if they are fused together into a single large entity -with the graph together with its characteristic operator behaving as an effective one-dimensional system. This 'convergent' behaviour is however not reflected in our characteristic operator, the graph Laplacian ∆ δ : Clearly }∆ δ } op " 1{δ ¨}∆ 1 } op Ñ 8 as 1{δ Ñ 8. Moreover, we would also expect a Cauchylike behaviour from a 'convergent system', in the sense that if we for example keep 1{δ a ´1{δ b " 1 constant but let p1{δ a q, p1{δ b q Ñ 8 we would expect }∆ δa ´∆δ b } op Ñ 0 by a triangle-inequality argument. However, we clearly have }∆ δa ´∆δ b } op " |1{δ a ´1{δ b | ¨}∆ 1 } op " }∆ 1 } op , which does not decay. The situation is different however, when considering resolvents of the graph Laplacian. An easy calculation (c.f. Appendix H) yields }pωId ´∆δ b q ´1 ´pωId ´∆δa q ´1} op " Opδ a ¨δb q so that we recover the expected Cauchy behaviour. What is more, we also find the convergence pωId ´∆δ q ´1 Ñ P 0 ¨pω ´0q ´1; where P 0 denotes the projection onto the one-dimensional lowest lying eigenspace of the ∆ δ s (spanned by the vectors with constant entries). We may interpret pω´0q ´1 as the resolvent of the graph Laplacian of a singleton (since such a Laplacian is identically zero) and thus now indeed find our physical intuition about convergence to a one-dimensional system reflected in our formulae. Motivated by this example, Section 5.1 develops a general theory for the difference in outputs of networks evaluated on graphs for which the resolvents R ω :" pωId ´T q ´1 and r R ω :" pωId ´r T q ´1 of the respective characteristic operators are close in some sense. Subsequently, Section 5.2 then further develops our initial example while also considering an additional setting.

5.1. GENERAL THEORY

Throughout this section we fix a complex number ω P C and for each operator T assume ω, ω R σpT q. This is always true for ω with |ω| ě }T } op , but if T is additionally self adjoint one could set ω " i. If T is non-negative one might choose ω " p´1q). As a first step, we then note that the conclusion of Lemma 4.1 can always be satisfied if we chose J " 0. To exclude this case -where the application of J corresponds to losing too much information -we follow Post (2012) in making the following definition: Definition 5.1. Let J : 2 pGq Ñ 2 p r Gq and r J : 2 p r Gq Ñ 2 pGq be linear, and let T ( r T ) be operators on ( 2 pGq) ( 2 p r Gq). We say that J and r J are -quasi-unitary with respect to T , r T and ω if }Jf } 2 p r Gq ď 2}f } 2 pGq , }pJ ´r J ˚qf } 2 p r Gq ď }f } 2 pGq , }pId ´r JJqR ω f } 2 pGq ď }f } 2 pGq , }pId ´J r Jq r R ω u} 2 p r Gq ď }u} 2 p r Gq . The motivation to include the resolvents in the norm estimates (4) comes from the setting where T " ∆ is the graph Laplacian and ω " p´1q. In that case, the left equation in (4 is for example automatically fulfilled when demanding }pId ´r JJqf } 2 2 pGq ď p}f } 2 `E∆ pf qq 1 2 , with E ∆ p¨q " x¨, ∆¨y 2 pGq the (positive) energy form induced by the Laplacian ∆ (Post, 2012) . This can thus be interpreted as a relaxation of the standard demand }pId ´r JJq} op ď . Relaxing the demands of Section 4, we now demand closeness of resolvents instead of closeness of operators: Definition 5.2. If, for ω P C and linear J : 2 pGq Ñ 2 p r Gq the resolvents R ω and r R ω satisfy }p r R ω J ´JR ω qf } 2 p r Gq ď }f } 2 pGq for all f P 2 pGq, T and r T are called ω--close with identification operator J. If additonally }p r R ω J ´JR ω qf } 2 p r Gq ď }f } 2 pGq , they are doubly ω--close. Our first result establishes that operators being (doubly-)ω--close indeed has useful consequences: Lemma 5.3. Let T ( r T ) be operators on 2 pGq ( 2 p r Gq). If these operators are ω--close with identification operator J, and }R ω } op , } r R ω } op ď C we have }JgpT q ´gp r T qJ} op ď K g ¨}p r R ω J JR ω q} op with K g " 1 2π ű BD p1 `|z ´ω|γ T pzqqp1 `|z ´ω|γ r T pzqq|gpzq|d|z| for holomorphic g, K g " }g} F hol ω,C if g P F hol ω,C and K g " }g} F cont ω,C for T , r T normal and doubly ω--close. This result may then be extended to entire networks, as detailed in Theorem 5.4 below whose statement persists with slightly altered stability constants, if identification operators only almost commute with non-linearities and/or connecting operators. Proofs are contained in Appendix I. Theorem 5.4. Let Φ N , r Φ N be the maps associated to N -layer graph convolutional networks with the same non-linearities and functional calculus filters, but based on different graph signal spaces 2 pG n q, 2 p r G n q, characteristic operators T n , r T n and connecting operators P n , r P n . Assume B n , r B n ď B as well as R n , r R n ď R and L n ď L for some B, R, L ą 0 and all n ě 0. Assume that there are identification operators J n : 2 pG n q Ñ 2 p r G n q (0 ď n ď N ) commuting with nonlinearities and connecting operators in the sense of } r P n J n´1 f ´Jn P n f } 2 p r Gnq " 0 and }ρ n pJ n f q Jn ρ n pf q} 2 p r Gnq " 0. define D 2 n :" ř jPKn´1 ř iPKn K 2 g n ij with K g n ij as in Lemma 5.3. Choose D such that D n ď D for all n. Finally assume that }J n pωId ´Tn q ´1 ´pωId ´r T n q ´1J n } op ď . If filters in F cont ω,C are used, assume additionally that }J n ppωId´T n q ´1q ˚´ppωI d´r T n q ´1q ˚Jn } op ď . Then we have for all f P L in and with J n the operator that the K n copies of J n induce through concatenation that } r Φ N pJ 0 f q ´JN Φ N pf q} Ă Lout ď N ¨DRL ¨pBRLq N ´1 ¨}f } Lin ¨ .

5.2. EXEMPLARY APPLICATIONS

Collapsing Strong Edges: We first pick our example from the beginning of section 5 up again and generalize it significantly: We now consider the graph that we collapse to a single node to be a sub-graph (of strong edges) embedded into a larger graph. Apart from coupled heat reservoirs, this setting also e.g. captures the grouping of close knit communities within social networks into single entities, the scale-transition of changing the description of (the graph of) a molecule from individual atoms interacting via the coulomb potential Z 1 Z 2 {R (with R the distance and Z 1 , Z 2 atomic charges) to the interaction of (functional) groups comprised of closely co-located atoms, or spatial networks if weights are set to e.g. inverse distances. In what follows, we shall consider two graphs with vertex sets G and r G. We consider G to be a subset of the vertex set r G and think of the graph corresponding to G as arising in a collapsing procedure from the 'larger' graph r G. More precisely, we assume that the vertex set r G can be split into three disjoint subsets r G " r G Latin Ť r G Greek Ť t‹u (c. f. also Fig. 3 ). We assume that the adjacency matrix Ă W when restricted to Latin vertices or a Latin vertex and the exceptional node '‹' is of order unity p Ą W ab , Ă W a‹ " Op1q, @a, b P r G Latin q. For Greek indices, we assume that we may write Ă W αβ " ω αβ δ and Ă W α‹ " ωα‹ δ such that pω αβ , ω α‹ " Op1q for all α, β P r G Greek . We also assume that the sub-graph corresponding to vertices in r G Greek Ť t‹u is connected. We then take G " r G Latin Ť t‹u (c.f. again Fig. 3 ). The adjacency matrix W on this graph is constructed by defining W ab " Ă W ab , @a, b P r G Latin and setting (with W a‹ " W ‹a ) W ‹a :" Ă W a‹ `ÿ βP r GGreek Ă W aβ ´@a P r G Latin ¯. We also allow our graph r G to posses node-weights tr µ (5) Given the boundary conditions, what is left to determine in the above optimization program are the 'Greek entries' ψ δ g pαq of each ψ δ g . As Appendix J further elucidates, these can be calculated explicitly and purely in terms of the inverse of ∆ r G restricted to Greek indices as well as (sub-)columns of the adjacency matrix Ă W . Node-weights on G are then defined as µ δ g :" ř hP r G ψ δ g phq ¨r µ h . We denote the corresponding signal space by 2 pGq. Importantly, one has µ δ a Ñ r µ a for any Latin index and µ δ ‹ Ñ r µ ‹ `řαP r GGreek r µ α as δ Ñ 0; which recovers our physical intuition about heat reservoirs. To translate signals from 2 pGq to 2 p r Gq and back, we define two identification operators J : 2 pGq Ñ 2 p r Gq and r J : 2 p r Gq Ñ 2 pGq via Jf :" ř gPG f pgq ¨ψδ g and p r Juqpgq :" xu, ψ δ g y 2 p r Gq {µ δ g for all f P 2 pGq, u P 2 p r Gq and g P G. Our main theorem then states the following: Theorem 5.6. With definitions and notation as above, there are constants K 1 , K 2 ě 0 such that the operators J and r J are pK 1 ? δq-quasi-unitary with respect to ∆ r G , ∆ G and ω " p´1q. Furthermore, the operators ∆ r G and ∆ G are p´1q-pK 2 ? δq close. with identification operator J. Appendix J presents the (fairly involved) proof of this result. Importantly, the size of the constants K 1 , K 2 is independent of the cardinality (or more precisely the total weight) of r G Latin , implying that Theorem 5.6 also remains applicable in the realm of large graphs. Finally we note, that this stability result is contingent on the use of the (un-normalized) graph Laplacian (c.f. Appendix K): Theorem 5.7. In the setting of Theorem 5.6 denote by T ( r T ) adjacency matrices or normalized graph Laplacians on 2 pGq ( 2 pGq). There are no functions η 1 , η 2 : r0, 1s Ñ R ě0 with η i pδq Ñ 0 as δ Ñ 0 (i " 1, 2), families of identification operators J δ , r J δ and ω P C so that J δ and r J δ are η 1 pδq-quasi-unitary with respect to r T , T and ω while the operators r T and T remain ω-η 2 pδq close. The Realm of Large Graphs: In order to relate our transferability framework to the literature, we consider an 'increasing' sequence of graphs (G n Ď G n`1 ) approximating a limit object, so that the transferability framework of Levie et al. (2019a) is also applicable. We choose the limit object to be the circle of circumference 2π and our approximating graphs to be the closed path-graph on N vertices Figure 4 : Closed Path-Graphs equidistantly embedded into the circle (c.f. Fig 4 ). With h " 2π{N the node-distance, we set weights to 1{h 2 ; ensuring consistency with the 'continuous' Laplacian in the limit N Ñ 8. More details are presented in Appendix L, which also contains the proof of the corresponding transferability result: Theorem 5.8. In the above setting choose all node-weights equal to one and N to be odd for definiteness. There exists constants K 1 , K 2 " Op1q so that for each N ě 1, there exist identification operators J, r J mapping between 2 pG N q and 2 pG N `1q so that J and r J are pK 1 {N q-quasi-unitary with respect to ∆ G N , ∆ G N `1 and ω " p´1q. Furthermore, the operators ∆ G N and ∆ G N `1 are p´1q-pK 2 {N q close with identification operator J. Lemma 5.3 then implies an Op 1 N q-decay of }gpT qJ ´Jgp r T q} op for fixed g. This reduces to an Op 

6. GRAPH LEVEL STABILITY

To solve tasks such as graph classification or regression over multiple graphs, graphs of varying sizes need to be represented in a common feature space. Here we show that aggregating node-level features into such graph level features via p-norms (}f } p pGq :" p ř gPG |f g | p µ g q 1{p ) preserves stability. To Figure 5 : Graph Level Aggregation this end, let L out be a target space of a GCN in the sense of (3). On each of the (in total K out ) 2 pG out q summands of L out , we may apply the map f i Þ Ñ }f i } p pGoutq . Stacking these maps, we build a map from L out to R Kout . Concatenating the map Φ N associated to an N -layer GCN with this map yields a map from L in to R Kout . We denote it by Ψ p N and find: Theorem 6.1. For p ě 2 we have in the setting of Theorem 3.1 that }Ψ p N pf q ´Ψp N phq} R Kout ď ´śN n"1 L n R n B n ¯¨}f ´h} Lin . In the setting of Theorem 4.3 or 5.4 and under the additional assumption that the 'final' identification operator J N satisfies ˇˇ}J N f i } k p r G N q ´}f i } k pG N q ˇˇď δ ¨K ¨}f i } 2 pG N q for all f i P 2 pG N q, we have }Ψ p N pf q ´r Ψ p N pJ 0 f q} R Kout ď pN ¨DRL `K pBRLqq ¨pBRLq N ´1 ¨}f } Lin ¨δ. Derived stability results thus persist (under mild assumptions) if graph level features are aggregated via p-norms. Appendix M contains the corresponding proof.

7. NUMERICAL RESULTS

We focus on investigating structural perturbations, as corresponding results are most involved and novel: We first consider a graph on 5 nodes with an adjacency matrix A with Op1q-entries (c.f. 30 in Appendix N). We then scale A by 1{δ a and 1{δ b (with 1 δa ´1 δ b " 1) respectively and consider the norm-difference between associated Laplacians and resolvents. Fig. 6 (a) then illustrate the theoretical result (c.f. Section 5) that resolvent-instead of Laplacian-differences capture the convergence behaviour. Embedding the considered graph into a larger graph ( Ă W P R 8ˆ8 ; c.f. ( 31) in Appendix N), we consider the collapsing edge setting of Section 5.2 in Fig. 6 (b ). As expected, the corresponding resolvents do approach each other as δ Ñ 0. Contrary to the theoretical bound in Lemma 5.3, differences of resolvent-monomials decrease as their power k increases. Beyond small graphs -inaccessible to traditional asymptotic methods -our method is also applicable to the large-graph setting: Fig. 7 picks up the example of an 'increasing' graph sequence 'approximating' the circle again. As predicted in Section 5.2, the difference in resolvents decays (9 1 N ).  Ă W ij " Z i Z j {}x i ´xj } with Z i (x i ) the atomic charge (equilibrium position) of atom i. We choose node-weights as r µ i " Z i and the Laplacian as characteristic operator. Leading up to Fig. 8 we consider the graph of methane (5 Nodes; one Carbon (Z 1 " 6) and four Hydrogen nodes (Z ią1 " 1)) and deflect one of the Hydrogen atoms (i " 2) out of equilibrium and along a straight line towards the Carbon atom. We then consider the transferability of the entire GCN between the resulting graph and an effective graph combining Carbon and deflected Hydrogen into a single node "‹" with weight µ ‹ " Z 1 `Z2 " 7 located at the equilibrium position of Carbon. With J translating from effective to original description, we consider }Ψ p 2 pf q ´Ψp 2 pJf q} R 16 (averaged over 100 random unit-norm choices of f ) as a function of }x 1 ´x2 } ´1. At equilibrium the transferability error is Op1q. It decreases fast with decreasing Carbon-Hydrogen distance, with the choice of Figure 8 : GCN Transferability representation (effective vs. original) quickly becoming insignificant for generated feature vectors.

8. DISCUSSION

A theoretically well founded framework capturing stability properties of GCNs was developed. We related node-level stability to (spectral) covering properties and edge-level stability to introduced semi-norms of employed filters. For non-normal characteristic operators, tools from complex analysis provided grounds for derived stability properties. We introduced a new notion of stability to structural perturbations, highlighted the importance of the resolvent and detailed how the developed line of thought captures relevant settings of structural changes such as the collapse of a strongly connected sub-graph to a node. There -precisely if the graph Laplacian was employed -the transferability error could be bounded in terms of the inverse characteristic coupling strength on the sub-graph. Hilbert Spaces: To us, a Hilbert space -often denoted by H -is a vector space over the complex numbers which also has an inner product -often denoted by i"1 H i is made up of those elements a " pa 1 , a 2 , a 3 , ...q with a i P H i for which the norm }a} 2 ' 8 i"1 Hi :" 8 ÿ i"1 }a i } 2 Hi is finite. This means for example that the vector p1, 0, 0, 0, ...q is in ' 8 i"1 C, while p1, 1, 1, 1, ...q is not. Direct Sums of Maps: Suppose we have two collections of Hilbert spaces tH i u Γ i"1 , t r H i u Γ i"1 with Γ P N or Γ " 8. Suppose further that for each i ď Γ (resp. i ă Γ) we have a (not necessarily linear) map J i : H i Ñ r H i . Then the collection tJ i u Γ i"1 of these 'component' maps induce a 'composite' map J : ' Γ i"1 H i ÝÑ ' Γ i"1 r H i between the direct sums. Its value on an element a " pa 1 , a 2 , a 3 , ...q P ' Γ i"1 H i is defined by J paq " pJ 1 pa 1 q, J 2 pa 2 q, J 3 pa 3 q, ...q P ' Γ i"1 r H i . Strictly speaking, one has to be a bit more careful in the case where Γ " 8 to ensure that }J paq} ' 8 i"1 r Hi ‰ 8. This can however be ensured if we have }J i pa i q} r Hi ď C}a i } Hi for all 1 ď i and some C independent of all i, since then }J paq} ' 8 i"1 r Hi ď C}a} ' 8 i"1 Hi ď 8. If each J i is a linear operator, such a C exists precisely if the operator norms (defined below) of all J i are smaller than some constant.  U ˚∆U " diagpλ 1 , ...λ n q, with eigenvalues in C. We call the collection of eigenvalues the spectrum σp∆q of ∆. If dim H " d, we may write σp∆q " tλu d i"1 . It is a standard exercise to verify that each eigenvalue satisfies |λ i | ď }∆} op . Associated to each eigenvalue is an eigenvector φ i . The collection of all (normalized) eigenvectors forms an orthonormal basis of H. We may then write ∆f " d ÿ i"1 λ i xφ i , f y H φ i . Resolvent of an Operator: Given an operator T on some Hilbert space H, we have by definition that the operator pT ´zq : H Ñ H is invertible precisely if z ‰ σpT q. In this case we write R z pT q " pzId ´T q ´1 and call this operator the resolvent of T at z. If T is normal it can be proved that the norm of the resolvent satisfies }R z pT q} op " 1 distpz, σp∆qq , where distpz, σp∆qq denotes the minimal distance between z and any eigenvalue of ∆. For nonnormal operators, one can prove }R z pT q} op ď γ T pzq with γ T pzq " exp r2}T } 1 {dpz, σpT qqs {dpz, σpT qq as is proved in Bandtlow (2004a) . Frobenius Norm: Given two finite dimensional Hilbert spaces H 1 and H 2 with orthonormal bases tφ 1 i u d1 i"1 and tφ 1 i u d1 i"1 , the Frobenius norm } ¨}F of an operator A : H 1 Ñ H 2 may be defined as }A} 2 2 :" d2 ÿ i"1 d1 ÿ j"1 |A ij | 2 with A ij the matrix representation of A with respect to the bases tφ 1 i u d1 i"1 and tφ 1 i u d1 i"1 . It is a standard exercise to verify that this norm is indeed independent of any choice of basis and hence invariant under multiplying A with a unitary on either the left or the right side. More precisely, if U : H 2 Ñ H 2 and V : H 1 Ñ H 1 are unitary, we have }U AV } 2 F " }A} 2 F . Frobenius norms can be used to transfer Lipschitz continuity properties of complex functions to the setting of functions applied to normal operators: Lemma A.1. Let g : C Ñ C be Lipschitz continuous with Lipschitz constant D g . This implies }gpXqJ ´JgpY q} F ď D g ¨}X ´Y } F . for normal operators X on H 2 , Y on H 1 and any linear map J : H 1 Ñ H 2 . Proof. This proof is a modified version of the proof in Wihler (2009) . Let U, W be unitary (with respect to the inner product x¨, ¨yH ) operators diagonalizing the normal operators X and Y as V ˚X V " diagpλ 1 , ...λ d2 q ": DpXq W ˚Y W " diagpµ 1 , ...µ d1 q ": DpY q. Since the Frobenius norm is invariant under unitary transformations we find }gpXqJ ´JgpY q|| 2 F " ||gpV DpXqV ˚q ´gpW DpY qW ˚q} 2 F " }V gpDpXqqV ˚J ´JW gpDpY qqW ˚}2 F " }gpDpXqqV ˚J W ´V ˚J W gpDpY qq} 2 F " ÿ i,j |pgpDpXqqV ˚J W ´V ˚J W gpDpY qqq ij | 2 " ÿ i,j ˇˇˇˇÿ k rgpDpXqqs ik rV ˚J W s kj ´rV ˚J W s ik rgpDpY qqs kj ˇˇˇˇ2 " ÿ i,j |rV ˚W s ij | 2 |gpλ j q ´gpµ i q| 2 ď ÿ i,j |rV ˚W s ij | 2 D 2 g |λ j ´µi | 2 " D 2 g }X ´Y } 2 F .

B APPROXIMATING BOUNDED CONTINUOUS FILTERS

Let us recall Definition 2.1: Definition B.1. Fix ω P C and C ą 0. Define the space F cont ω,C of continuous filters on Cztω, ωu, to be the space of multilinear power-series' gpzq " ř 8 µ,ν"0 a µν pω ´zq ´µ pω ´zq ´µ for which the norm }g} F cont ω,C :" ř 8 µ,ν"0 |µ `ν|C µ`ν |a µν | is finite. We now prove that upon denoting by B pωq Ď C the open ball of radius around ω, one can show that for arbitrary δ ą 0 and every continuous function g defined on CzpB pωq Y B pωqq which is regular at infinity -i.e. satisfies lim rÑ`8 gprzq " c P C independent of which z ‰ 0 is chosenthere is a function f P F cont ω,C so that |f pzq ´gpzq| ď δ for all z P CzpB pωq Y B pωqq. Making use of the Stone-Weierstrass theorem for complex functions, it suffices to prove that for every point z in CzpB pωq Y B pωqq there are functions f and g in F cont ω,C for which f pzq ‰ gpzq. But this is obvious since pω ´zq ´1 is injective on CzpB pωq Y B pωqq.

C COMPLEX ANALYSIS

A general reference for topics discussed in this section is Bak & Newman (2017) . For a complex valued function f of a single complex variable, the derivative of f at a point z 0 P C in its domain of definition is defined as the limit f 1 pz 0 q :" lim zÑz0 f pzq ´f pz 0 q z ´z0 . For this limit to exist, it needs to be independent of the 'direction' in which z approaches z 0 , which is a stronger requirement than being real-differentiable. A function is called holomorphic on an open set U if it is complex differentiable at every point in U . It is called entire if it is complex differentiable at every point in C. Every entire function has an everywhere convergent power series representation gpzq " 8 ÿ k"0 a g z k . ( ) If a function g is analytic (i.e. can be expanded into a power series), we have gpλq " ´1 2πi ¿ S gpzq λ ´z dz for any circle S Ď C encircling λ by Cauchy's integral formula. In fact, the integration contour need not be a circle S, but may be the boundary of any so called Cauchy domain containing λ: Definition C.1. A subset D of the complex plane C is called a Cauchy domain if D is open, has a finite number of components (the closure of two of which are disjoint) and the boundary of BD of D is composed of a finite number of closed rectifiable Jordan curves, no two of which intersect. Equation ( 7) forms the backbone of complex analysis. Since the integral I :" ´1 2πi ¿ BD gpzqpzId ´T q ´1dz (8) is well defined for holomorphic gp¨q and any operator T for which σpT q and BD are disjoint (c.f. e.g. Post (2012) for details), we can essentially take (8) as a defining equation through which one might apply holomorphic functions to operators. While functions that are everywhere complex differentiable have a series representation according to (6), complex functions that are holomorphic only on Cztωu have a series representation (called Laurent series) according to gpzq " 8 ÿ k"´8 a k pz ´ωq k . If these functions are assumed to be regular at infinity, no terms with positive exponent are permitted and (changing the indexing) we may thus write gpzq " 8 ÿ k"0 a k pz ´ωq ´k. Motivated by this, we now prove the following consistency result: Lemma C.2. With the notation of Section 2 we have for any k ě 1 and ω R σpT q that pω ¨Id ´T q ´k :" 1 2πi ¿ BD pω ´zq ´k ¨pzId ´T q ´1dz, where we interpret the left hand side of the equation in terms of inversion and matrix powers. Proof. We first note that we may write R λ pT q " 8 ÿ n"0 pλ ´ωq n p´1q n R ω ptq n`1 for |λ ´ω| ď }R ω pT q} using standard results in matrix analysis (namely the 'Neumann Characterisation of the Resolvent' which is obtained by repeated application of a resolvent identity; c.f. Post (2012) for more details). We thus find 1 2πi ¿ BD ˆ1 ω ´z ˙k 1 zId ´T dz " 1 2πi ¿ BD ˆ1 ω ´z ˙k 8 ÿ n"0 pω ´zq n R ω pT q n`1 . Using the fact that 1 2πi as long as g P F C,ω . Proof. We first note › › › › › › gp8q ¨Id `1 2πi ¿ BD gpzq ¨pzId ´T q ´1dz › › › › › › op ď }gp8q ¨Id} op `› › › › › › 1 2πi ¿ BD gpzq ¨pzId ´T q ´1dz › › › › › › op ď |gp8q| `1 2π ¿ BD |gpzq| › › ¨pzId ´T q ´1› › op d|z|. The first claim thus follows together with }R z pT q} op ď γ T pzq. The second claim can be derived as follows: }gpT q} op " › › › › › 8 ÿ k"0 b g k pT ´ωq ´k› › › › › op ď 8 ÿ k"0 |b g k | › › pT ´ωq ´k› › op ď 8 ÿ k"0 |b g k |γ T pωq k ď 8 ÿ k"0 |b g k |C k . E PROOF OF THEOREM 3.1 AND TIGHTNESS OF RESULTS . We want to prove the following: Theorem E.1. With the notation of Section 2 let Φ N : L in Ñ L out be the map associated to an N -layer GCN. We have }Φ N pf q ´ΦN phq} Lout ď ˜N ź n"1 L n R n B n ¸¨}f ´h} Lin with B n :" b sup λPσpTnq ř jPKn´1 ř iPKn |g n ij pλq| 2 if T n is normal. For general T n we have for all tg ij u entire, holomorphic and in F ω,C respectively: B n :" $ ' ' ' ' & ' ' ' ' % 8 ř k"0 b ř jPKn´1 ř iPKn |pa gn ij q k | 2 ¨}T n } k op b ř jPKn´1 ř iPKn }g n ij p8q} 2 `1 2π ű Γ γ T pzq b ř jPKn´1 ř iPKn |g n ij pzq| 2 d|z| b ř jPKn´1 ř iPKn }g n ij } 2 ω,C Proof. Given input signals f, h n P L in , let us -sticking to the notation introduced in Section 2denote the intermediate signal representations in the intermediate layers L n by f n , h n P L n . With the update rule described in Section 2 and the norm induced on each L n as described in Appendix A, we then have }f n`1 ´hn`1 } 2 Ln`1 " Kn`1 ÿ i"1 › › › › › ρ n`1 ˜Kn ÿ j"1 g n`1 ij pT n`1 qP n`1 pf n j q ¸´ρ n`1 ˜Kn ÿ j"1 g n`1 ij pT n`1 qP n`1 ph n j q ¸› › › › › 2 2 pGn`1q ďL 2 n`1 Kn`1 ÿ i"1 › › › › › Kn ÿ j"1 g n`1 ij pT n`1 qP n`1 pf n j q ´Kn ÿ j"1 g n`1 ij pT n`1 qP n`1 ph n j q › › › › › 2 2 pGn`1q "L 2 n`1 Kn`1 ÿ i"1 › › › › › Kn ÿ j"1 g n`1 ij pT n`1 q " P n`1 pf n j q ´Pn`1 ph n j q ‰ › › › › › 2 2 pGn`1q .

We next note

Kn`1 ÿ i"1 › › › › › Kn ÿ j"1 g n`1 ij pT n`1 q " P n`1 pf n j q ´Pn`1 ph n j q ‰ › › › › › 2 2 pGn`1q ď Kn`1 ÿ i"1 ˜Kn ÿ j"1 }g n`1 ij pT n`1 q} op } " P n`1 pf n j q ´Pn`1 ph n j q ‰ } 2 pGn`1q ¸2 ď ˜Kn`1 ÿ i"1 Kn ÿ j"1 }g n`1 ij pT n`1 q} 2 op ¸Kn ÿ j"1 }} " P n`1 pf n j q ´Pn`1 ph n j q ‰ } 2 2 pGn`1q ďR 2 n`1 ˜Kn`1 ÿ i"1 Kn ÿ j"1 }g n`1 ij pT n`1 q} 2 op ¸}}f n ´hn j } 2

Ln

where the second to last step is an application of the Cauchy Schwarz inequality. Proceeding inductively and using our previously established estimates, this proves the claim for all settings in which T n is nor normal (using an additional application of the triangle inequality for the case of holomorphic filters). To prove the claim for normal T n as well, we note that in this setting we have (writing pφ α , λ α q |G| α"1 for a normalozed eigenvalue-eigenvector sequence of T n`1 ) that we have Kn`1 ÿ i"1 › › › › › Kn ÿ j"1 g n`1 ij pT n`1 q " P n`1 pf n j q ´Pn`1 ph n j q ‰ › › › › › 2 2 pGn`1q " Kn`1 ÿ i"1 › › › › › Kn ÿ j"1 ÿ α g n`1 ij pλ α qxφ α , " P n`1 pf n j q ´Pn`1 ph n j q ‰ y 2 pGn`1q φ α › › › › › 2 2 pGn`1q " Kn`1 ÿ i"1 Kn ÿ j"1 ÿ α |g n`1 ij pλ α q| 2 |xφ α , " P n`1 pf n j q ´Pn`1 ph n j q ‰ y 2 pGn`1q | 2 ď ÿ α ˜ÿ i,j |g ij pλ α q| 2 ¸Kn ÿ j"1 |xφ α , " P n`1 pf n j q ´Pn`1 ph n j q ‰ y 2 pGn`1q | 2 ď B n`1 R n`1 }}f n ´hn j } 2 Ln . Here we applied Cauchy Schwarz once more in the second to last step and bounded ˜ÿ i,j |g ij pλ α q| 2 ¸ď ˜sup λPσpT q ÿ i,j |g ij pλq| 2 ¸. To see that these bounds are not necessarily tight, we may simply note that if we have a simple one-layer Network as depicted in Fig. 9 below, the stability can be tightened to }Φ N pf q ´ΦN phq} Lout ď LRB ¨}f ´h} Lin with with B n :" max i"a,b psup λPσpT q |g i pλq|q as opposed to with B n :" b sup λPσpT q ř i"a,b |g i pλq| 2 if T is normal; as an easy calculation shows.

F PROOF OF LEMMA 4.2

We want to prove the following: Proof. Let us first verify the claim for entire g. We first note that r T k J ´JT k " r T k´1 p r T J ´JT q `p r T k´1 J ´JT k´1 qT " r T k´1 p r T J ´JT q `r T k´2 p r T J ´JT qT `p r T k´2 J ´JT k´2 qT 2 . Thus, with }T } op , } r T } op ď C we find } r T k J ´JT k } op ď kC k´1 } r T J ´JT } op . The claim now follows from applying the triangle inequality. Now let us prove the bound for holomorphic g. We first note the following: 1 r T ´z p r T J ´JT q 1 T ´z " 1 r T ´z r T J 1 T ´z ´1 r T ´z JT 1 T ´z " " 1 r T ´z p r T ´zqJ `z r T ´z  1 T ´z ´1 r T ´z " 1 T ´z pT ´zqJ `z T ´z  "z ˆJ 1 T ´z ´1 r T ´z J ˙. Thus we have }gp r T qJ´JgpT q} op ď 1 2π ¿ BD 1 |z| }R z pT q} op }R z p r T q} op |gpzq|d|z| ď 1 2π ¿ BD 1 |z| γ T pzqγ r T pzq|gpzq|d|z|.

G PROOF OF THEOREM 4.3

We prove the following generalization of Theorem 4.3: Theorem G.1. Let Φ N , r Φ N be the maps associated to N -layer graph convolutional networks with the same non-linearities and functional calculus filters, but based on different graph signal spaces 2 pGq, 2 p r Gq, characteristic operators T n , r T n and connecting operators P n , r P n . Assume B n , r B n ď B as well as R n , r R n ď R and L n ď L for some B, R, L ą 0 and all n ě 0. Assume that there are identification operators J n : 2 pG n q Ñ 2 p r G n q (0 ď n ď N ) almost commuting with nonlinearities and connecting operators in the sense of } r P n J n´1 f ´Jn P n f } 2 p r Gnq ď δ 2 }f } 2 pGnq and }ρ n pJ n f q´J n ρ n pf q} 2 p r Gnq ď δ 1 }f } 2 pGnq . Depending on whether normal or arbitrary characteristic operators are used, define D 2 n :" ř jPKn´1 ř iPKn D 2 g n ij or D 2 n :" ř jPKn´1 ř iPKn K 2 g n ij . Choose D such that D n ď D for all n. Finally assume that }J n T n ´r T n J n } ˚ď δ and with ˚" F if both operators are normal and ˚" op otherwise. Then we have for all f P L in and with J N the operator that the K N copies of J N induced through concatenation that } r ΦpJ 0 f q ´JN Φpf q} Ă Lout ď N ¨rRLDδ `δ1 BR `δ2 BLs ¨pBRLq N ´1 ¨}f } Lin . Proof. For simplicity in notation, let us denote the hidden representation of J 0 f in Ă L n by r f n . We then note the following }J n`1 f n`1 ´r f n`1 } Ă Ln`1 " ¨Kn`1 ÿ i"1 › › › › › J n`1 ρ n`1 ˜Kn ÿ j"1 g n`1 ij pT n`1 qP n`1 pf n j q ¸´ρ n`1 ˜Kn ÿ j"1 g n`1 ij pT n`1 q r P n`1 p r f n j q ¸› › › › › 2 2 pGn`1q ‚1 2 ď ¨Kn`1 ÿ i"1 › › › › › J n`1 ρ n`1 ˜Kn ÿ j"1 g n`1 ij pT n`1 qP n`1 pf n j q ¸´ρ n`1 ˜Jn`1 Kn ÿ j"1 g n`1 ij pT n`1 qP n`1 pf n j q ¸› › › › › 2 2 pGn`1q ‚1 2 `L ¨Kn`1 ÿ i"1 › › › › › J n`1 Kn ÿ j"1 g n`1 ij pT n`1 qP n`1 pf n j q ´Kn ÿ j"1 g n`1 ij pT n`1 q r P n`1 p r f n j q › › › › › 2 2 pGn`1q ‚1 2 We can bound the first term by δ 1 B ¨R ¨pBRLq n ¨}f } Lin . For the second term we find L ¨Kn`1 ÿ i"1 › › › › › J n`1 Kn ÿ j"1 g n`1 ij pT n`1 qP n`1 pf n j q ´Kn ÿ j"1 g n`1 ij pT n`1 q r P n`1 p r f n j q › › › › › 2 2 pGn`1q ‚1 2 ďL ¨Kn`1 ÿ i"1 › › › › › Kn ÿ j"1 pJ n`1 g n`1 ij pT n`1 q ´gn`1 ij p r T n`1 qJ n`1 qP n`1 pf n j q › › › › › 2 2 pGn`1q ‚1 2 `LB ˜Kn ÿ j"1 › › ›Jn`1Pn`1pf n j q ´r P n`1 p r f n j q › › › 2 2 pGn`1q ¸1 2 Arguing as in the proof of 3.1 we can bound the first term by LD ¨δR ¨pBRLq n }f } Lin . For the second term we find, LB ˜Kn ÿ j"1 › › ›Jn`1Pn`1pf n j q ´r P n`1 p r f n j q › › › 2 2 pGn`1q ¸1 2 ď LBδ 2 pBRLq n `}J n f n ´r f n } Ă Ln arguing as above. Iterating from n " N to n " 0 then yields the claim.

H TRANSFERABILITY: GENERAL CONSIDERATIONS

We first prove the statement made at the beginning of Section 5 that }pωId ´∆δ b q ´1 ´pωId ´∆δa q ´1} op " Opδ a ¨δb q. To this end denote the increasing sequence of eigenvalues (counted without multiplicity) of ∆ 1 by tλ i u M i"0 . Recall that λ 0 " 0 Denote the sequence of projections on the corresponding eigenspaces by tP i u M i"0 . We have for the resolvent that 1 ωId ´∆δ " 1 ωId ´δ ¨∆1 " M ÿ i"0 1 ω ´1 δ λ i P i . Thus we have for δ a , δ b small enough that › › › › 1 ωId ´∆δa ´1 ωId ´∆δ b › › › › op " ˇˇˇˇ1 ω ´1 δa λ 1 ´1 ω ´1 δ b λ 1 ˇˇˇˇ" ˇˇˇˇλ 1 1 δa ´1 δ b pω ´1 δa λ 1 qpω ´1 δ b λ 1 q ˇˇˇ" λ 1 1 |pω ´1 δa λ 1 qpω ´1 δ b λ 1 q| " Opδ a ¨δb q. Next we note the convergence pωId ´∆δ q ´1 Ñ P 0 ¨pω ´0q ´1. But this is obvious, since for λ i ‰ 0 we have 1 ω ´λi δ Ñ 0 as δ Ñ 0. I PROOFS OF LEMMA 5.3 AND THEOREM 5.4 Lemma I.1. Let T and r T be characteristic operators on 2 pGq and 2 p r Gq be respectively. If these operators are ω-δ-close with identification operator J, and }R ω } op , R ω } op ď C we have }JgpT q ´gp r T qJ} op ď K g ¨}p r R ω J ´JR ω q} op with K g " ű BD p1 `|z ´ω|γ T pzqqp1 `|z ´ω|γ r T pzqq|gpzq|d|z| if g is holomorphic and K g " }g} F hol ω,C if g P F hol ω,C . If T and r T are normal as well as doubly ω-δ-close and g P F cont ω,C , we have K g " }g} F cont ω,C . Proof. We first deal with the statement concerning holomorphic g. To this end we note that Lemma 4.5.9 of Post (2012) proves } r R z J ´JR z } op ď p1 `|z ´ω|γ T pzqqp1 `|z ´ω|γ r T pzqq ¨} r R ω J ´JR ω } op . The claim then follows from }JgpT q ´gp r T qJ} op ď 1 2π ¿ BD |gpzq|} r R z J ´JR z } op d|z|. For g P F hol ω,C the claim is proved exactly as in the proof of Lemma 2.3. For g P F cont ω,C we note that p r R ω q µ p r R ω q ν J ´J pR ω q µ pR ω q ν " p r R ω q µ " p r R ω q ν J ´J pR ω q ν ı `rp r R ω q µ J ´JpR ω q µ s pR ω q ν . Together with the result } r T k J ´JT k } op ď kC k´1 } r T J ´JT } op . established in the proof of Lemma 4.2, the claim then follows from the triangle inequality together with the definition of the semi-norm }g} F cont ω,C . As in the previous section, we state a slightly more general version of our main theorem of this section: Theorem I.2. Let Φ, r Φ be the maps associated to N -layer graph convolutional networks with the same non-linearities and functional calculus filters, but based on different graph signal spaces 2 pG n q, 2 p r G n q, characteristic operators T n , r T n and connecting operators P n , r P n . Assume B n , r B n ď B as well as R n , r R n ď R and L n ď L for some B, R, L ą 0 and all n ě 0. Assume that there are identification operators J n : 2 pG n q Ñ 2 p r G n q (0 ď n ď N ) almost commuting with nonlinearities and connecting operators in the sense of } r P n J n´1 f ´Jn P n f } 2 p r Gnq ď δ 2 }f } 2 pGnq and }ρ n pJ n f q ´Jn ρ n pf q} 2 p r Gnq δ 1 }f } 2 pGnq . define D 2 n :" ř jPKn´1 ř iPKn K 2 g n ij with K g n ij as in Lemma 5.3. Choose D such that D n ď D for all n. Finally assume that }J n pωId ´Tn q ´1 ´pωId ŕ T n q ´1J n } op ď δ. If filters in F cont ω,C are used, assume additionally that }J n ppωId ´Tn q ´1q ˚ṕpωI d ´r T n q ´1q ˚Jn } op ď δ. Then we have for all f P L in and with J N the operator that the K N copies of J N induced through concatenation that } r ΦpJ 0 f q ´JN Φpf q} Ă Lout ď N ¨rRLDδ `δ1 BR `δ2 BLs ¨pBRLq N ´1 ¨}f } Lin . Proof. The proof proceeds in complete analogy to the one of Theorem 4.3.

J COLLAPSING STRONG EDGES: PROOFS AND FURTHER DETAILS

We utilize the notation introduced in Section 5.2. Beyond this, we denote the positive semi-definite form induced by the energy functional E As a first step we note that all entries of ψ g are real and non-negative, which follows since each summand in ( 9) is non-increasing under the map u Þ Ñ |u| due to the reverse triangle ||a|´|b|| ď |a´b|. To find the explicit form of ψ g , fix g P r G Latin Ť t‹u and denote by χ g P 2 p r Gq the signal defined by setting it to χ η phq " δ hg for h P r G Latin Ť t‹u and η g pαq " η α g with tη α g u αP r G Greek a set of | r G Greek | free parameters in R ď0 . We then have E r G pχ g q "2 ÿ aP r GLatin Ă W ag `2 ÿ αP r GGreek Ă W αg |1 ´ηα g | 2 `2 ÿ αP r GGreek bP r GLatin Ť t‹u Ă W αb |η α g | 2 `ÿ α,βP r GGreek Ă W αβ |η α g ´ηβ g | 2 . By definition, χ g depends smoothly on the parameters tη α g u αP r G Greek . Finding the minimizer of the convex optimization program ( 5) is then equivalent to finding the values tη α g u αP r G Greek at which we have BE r G pχ g q Bη α g " 0. We note 1 4 BE r G pχ g q Bη ξ g " ¨Ă W gξ `ÿ aP r GLatin a‰g Ť t‹u Ă W gξ `ÿ αP r GGreek Ă W αg ‹ ‹ ‚ η g ξ ´ÿ αP r GGreek Ă W αg η g α ´Ă W gξ Collecting these equations for all parameters into a matrix equation, we find that the 'Greek entries' of the vector ψ g are given explicitly by ¨ψg pαq ψ g pβq . . . ‹ ‚" ¨r d α ´Ă W αβ . . . ´Ă W βα r d β . . . . . . . . . . . . ‹ ‹ ‚ ´1 ¨¨Ă W gα Ă W gβ . . . ‹ ‚, with degrees in r G denoted by r d α . Let us denote the restriction of ψ δ g to Greek entries, thought of as a vector in C | r G Greek | by η δ g . Given the degree r d α corresponding to a Greek index, we decompose it as r d α " r d r α `Ă W α‹ `Vα with r d r α accounting for edges from α to other greek vertices r d r α " ÿ βP r G Greek Ă W αβ " 1 δ ÿ βP r G Greek ω αβ , and V α accounting for edges from α to Latin vertices V α " ÿ aP r GLatin Ă W aα . Recall that we also may write Ă W α‹ " 1 δ ω α‹ . We may then write ¨r d α ´Ă W αβ . . . ´Ă W βα r d β . . . . . . . . . . . . ‹ ‹ ‚ " ¨r d r α ´Ă W αβ . . . ´Ă W βα r d r β . . . . . . . . . . . . ‹ ‹ ‚ `1 δ ¨ωα‹ 0 . . . 0 ω β‹ . . . . . . . . . . . . ‹ ‹ ‚ `¨V α 0 . . . 0 V β . . . . . . . . . . . . ‹ ‹ ‚ ": 1 δ L `1 δ diagp ω ‹ q `V, where we made the obvious definitions for the matrices L and V and denoted by ω ‹ the vector with entries ω α‹ . Let us also use the notation h :" L `diagpω ‹ q. Next we want to establish that h is invertible. For this we first note that that L is the graph Laplacian of the subgraph r G Greek ; which we assume to be connected. Hence L is positive semi-definite with the eigenspace corresponding to the eigenvalue zero being spanned by (entry-wise) constant vectors. Since all entries of ω ‹ are non-negative, the operator h is also positive semi-definite. Since we assume that the vertex ‹ is connected to at least one other vertex in r G Greek , there is at least one entry in ω ‹ that is strictly greater than zero. We show that this already implies that h is in fact also positive definite and hence invertible. Indeed, for any v P C | r G Greek | we have x v, L ¨ vy C | Ă G Greek | " x v, h ¨ vy C | Ă G Greek | `x v, diagp ω ‹ q ¨ vy C | Ă G Greek | . Both terms on the right hand side are non-negative. If v is a constant (non-zero) vector, the first term vanishes, but since at least one entry of ω ‹ is strictly positive, with all others being non-negative, the second term on the right hand side is strictly positive. If v is non-constant, the first term on the right hand side is larger than zero. Hence h is positive definite and thus invertible. Similarly one proves that (for any δ ě 0) the operator h `δV is positive definite and hence invertible. Thus we now know that the operator 10) is indeed invertible. We note (again with the restriction of ψ δ g to Greek entries thought of as a vector in C | r G Greek | denoted by η δ g ) that we may equivalently write (10) as 1 δ ph `δV q " ¨r d α ´Ă W αβ . . . ´Ă W βα r d β . . . . . . . . . . . . ‹ ‹ ‚ utilized in ( ph `δV q ´1 η δ g " δ Ă W g and Ă W g :" ¨Ă W gα Ă W gβ . . . ‹ ‚ thought of as an element of C | r G Greek | . To proceed, we now first focus on the case g " ‹, for which we may write (11) equivalently as ph `δV q ´1 η δ ‹ " ω ‹ . (12) Since ω ‹ is independent of δ, we may take the limit δ Ñ 0 and arrive at pL `diagp ω ‹ qq η 0 ‹ " ω ‹ which is uniquely solved by η 0 ‹ " p1, 1, 1, ....q " 1 Greek . Since we assume δ ! 1, we can now investigate the solution η δ g for non-zero δ through perturbation theory. We write η δ ‹ " 1 r GGreek ´ ζ δ ‹ with ζ 0 ‹ " 0 and find from (12) -using h ¨1Greek " η δ ‹ -the defining equation ζ δ ‹ " δph `δV q ´1 ¨V ¨1 r GGreek . From this we obtain the estimate } ζ δ ‹ } 2 p r GGreekq ď }ph `δV q} op ¨}V ¨1 r GGreek } 2 p r GGreekq ¨δ, where we denote by 2 p r G Greek q the space graph signal space C | r G Greek | equipped with node weights tr µ g u gP r G Greek . We note that both h and V are positive semi-definite and we thus obtain λ min phq ď λ min ph `δV q for the minimal eigenvalues of the respective operators. Hence }ph `δV q ´1} op ď }h ´1} op , and thus also } ζ δ ‹ } 2 p r GGreekq ď }h ´1} op ¨}V ¨1 r GGreek } 2 p r GGreekq looooooooooooooooomooooooooooooooooon ":K ¨δ. Since }h ´1} op " 1{λ min phq we may write K " }V ¨1Greek } 2 p r GGreekq λ min phq . From ( 11) we know that for g ‰ ‹ we have η δ g " 0. We now also want to bound } η δ g } 2 p r GGreekq in terms of δ. We will do this by establishing the relationship ÿ gP r GLatin η δ g " ζ δ ‹ . and then utilizing our estimate on } ζ δ ‹ } 2 p r GGreekq established above. To prove (15), we will need the concept of harmonic extensions: Definition J.2. Denote by 2 p r G Latin Y t‹uq the graph signal space C | r GLatinYt‹u| equipped with the node weights tr µ g u gP r GLatinYt‹u . Given an arbitrary signal u P 2 p r G Latin Y t‹uq a harmonic extension of u to all of 2 p r Gq is a signal u P 2 p r Gq satisfying p∆ r G uqpαq " 0 @α P r G Greek and uphq " uphq @ h P r G Latin ď t‹u. We first note that the concept of harmonic extensions is both well-defined an well-behaved: Lemma J.3. Fix u P 2 p r G Latin Y t‹uq. There exists a unique harmonic extension u P 2 p r Gq of u. It is given as the solution to the convex optimization program min E r G puq subject to uphq " δ hg for all h P r G Latin ď t‹u. Furthermore if u and v are the harmonic extensions of u and v, then pu `vq is the (unique) harmonic extension of pu `vq. Proof. We write a signal ψ P 2 p r Gq as ψ " pψ, ηq with ψ P 2 p r G Latin Y t‹uq and η P 2 p r G Greek q. We then notice ψ " argminE r G puq subject to ψphq " ψphq for all h P r G Latin ď t‹u ô BE r G pψq Bη α " 0 @α P r G Greek and ψphq " ψphq for all h P r G Latin ď t‹u ô ÿ yP r G Ă W αy pψpαq ´ψpyqq " 0 @α P r G Greek and ψphq " ψphq for all h P r G Latin ď t‹u ôp∆ r G ψqpαq " 0 @α P r G Greek and ψphq " ψphq for all h P r G Latin ď t‹u. Here, we treated η α and its complex conjugate as independent variables and used that E r G p¨q is a real-valued functional for the first equivalence. As harmonic extensions are thus equivalently characterised as the solutions of convex minimization programs, they are unique. To prove the last statement, we note that by linearity of the graph Laplacian, pu `vq certainly is a harmonic extension of pu `vq. Since harmonic extensions are unique, it is the only one. After this preparatory effort, we are now ready to prove (15): Lemma J.4. For any δ ě 0 the signals t η δ g u gP r GLatin Ť t‹u form a partition of unity of 2 p r G Greek q: ÿ gP r GLatin Ť t‹u η δ g " 1 r GGreek Equivalently we have ÿ gP r GLatin η δ g " ζ δ ‹ . As an immediate Corollary we obtain Corollary J.5. For any δ ě 0 the signals tψ δ g u gP r GLatin Ť t‹u form a partition of unity of 2 p r Gq: ÿ gP r GLatin Ť t‹u η δ g " 1 r G . Proof. Using the 'boundary conditions' in (5), it is straightforward to verify that ( 16) is equivalent to (17). From Lemma J.3 we now know that ψ δ g , originally characterised as the solution of the problem min E r G puq subject to uphq " δ hg for all h P r G Latin ď t‹u, is equivalently characterised as the harmonic extension of uphq " δ hg . From the last statement of Lemma J.3, we know that ř gP r GLatin Ť t‹u η δ g is the unique harmonic extension of ÿ gP r GLatin Ť t‹u δ hg " 1 r G Ă G Latin Ť t‹u . But this -in turn -is the unique solution of the problem min E r G puq subject to uphq " 1 for all h P r G Latin ď t‹u. Since we have E r G p1 r G q " 0, which is the lowest possible attainable value of E r G p¨q, and setting u " 1 r G is compatible with the 'boundary condition' uphq " 1 for all h P r G Latin Ť t‹u, we know that is the (unique) harmonic extension of 1 r GLatin Ť t‹u . By the last statement of Lemma J.3 we thus have ÿ gP r GLatin Ť t‹u η δ g " 1 r G . Having established that we may write ÿ gP r GLatin η δ g " ζ δ ‹ , together with the fact that every entry of each η δ g is non-negative, we now know that 0 ď η δ g pαq, ζ δ ‹ ď 1. Furthermore -using our earlier estimate (13) -we now easily obtain › › › › › › ÿ gP r GLatin η δ g › › › › › › 2 p r GGreekq ď K ¨δ. Hence -by positivity of the entries -we also have for each individual g P r G Latin that › › η δ g › › 2 p r GGreekq ď K ¨δ. f pgq xψ δ g , ψ δ y y 2 p r Gq µ δ y ˇˇˇˇˇˇ. We thus find }f ´r JJf } 2 pGq ď g f f f f e ÿ yPG y‰‹ ¨ˆ1 ´r µ y µ δ y ˙|f pyq| `ˇˇˇˇˇˇÿ gPG g‰y f pgq xψ δ g , ψ δ y y 2 p r Gq µ δ y ˇˇˇˇˇˇ‹ ‚ 2 `ˇˇˇˇf p‹q ´ÿ gPG f pgq xψ δ g , ψ δ ‹ y 2 p r Gq µ δ ‹ ˇˇˇď g f f f e ÿ yPG y‰‹ ˆˆ1 ´r µ y µ δ y ˙|f pyq| ˙2 `g f f f f e ÿ yPG y‰‹ ¨ˇˇˇˇˇˇÿ gPG g‰y f pgq xψ δ g , ψ δ y y 2 p r Gq µ δ y ˇˇˇˇˇˇ‹ ‚ 2 `ˇˇˇˇf p‹q ´ÿ gPG f pgq xψ δ g , ψ δ ‹ y 2 p r Gq µ δ ‹ ˇˇˇŤ o bound the first term of the estimate, we note (for y ‰ ‹) and δ small enough: ˆ1 ´r µ y µ δ y ˙ď ˜1 ´r µ y r µ y `δK r µp r G Greek q ¸" δK r µp r G Greek q δK r µ y `r µp r G Greek q ď δ K r µp r G Greek q min gP r G Latin r µ g . We also note (for y ‰ ‹) |f pyq| ď 1 min gP r G Latin ? µ g |f pyq| ? µ y ď 1 min gP r G Latin a r µ y |f pyq| ? µ y Thus we find g f f f e ÿ yPG y‰‹ ˆˆ1 ´r µ y µ δ y ˙|f pyq| ˙2 ď δ ¨K r µp r G Greek q min gP r G Latin r µ 3 2 g ‹ ‹ ‚ g f f e ÿ yPG y‰‹ |f pyq| 2 µ y ď δ ¨K r µp r G Greek q min gP r G Latin r µ 3 2 g ‹ ‹ ‚ }f } 2 pGq . To estimate the second term, we estimate |f pgq| ď 1 min gP r G Latin Yt‹u a r µ g }f } 2 pGq to obtain ˇˇˇˇˇˇÿ  x η δ g , η δ y y 2 p r GGreekq ď 1 min gP r G Latin Yt‹u r µ 3 2 g }f } 2 pGq ¨ÿ yPG y‰‹ ÿ gPG x η δ g , η δ y y 2 p r GGreekq ď 1 min gP r G Latin Yt‹u r µ 3 2 g }f } 2 pGq ¨x1 r GGreek , ζ δ ‹ y 2 p r GGreekq ď 1 min gP r G Latin Yt‹u r µ 3 2 g }f } 2 pGq ¨}1 r GGreek } 2 p r GGreekq ¨} ζ δ ‹ } 2 p r GGreekq ď δ ¨¨K ¨br µp r G Greek q min gP r G Latin Yt‹u r µ 3 2 g ‹ ‹ ‚ }f } 2 pGq Let us thus turn to the remaining term; corresponding to y " ‹: We have ˇˇˇˇf p‹q ´ÿ gPG f pgq xψ δ g , ψ δ ‹ y 2 p r Gq µ δ ‹ ˇˇˇˇď ˇˇˇˇ1 ´xψ δ ‹ , ψ δ ‹ y 2 p r Gq µ δ ‹ ˇˇˇˇ| f p‹q| `ˇˇˇˇˇˇÿ gPG g‰‹ f pgq xψ δ g , ψ δ ‹ y 2 p r Gq µ δ ‹ ˇˇˇˇˇˇ(

22)

We first deal with the left summand. We note ˇˇˇˇ1 ´xψ δ ‹ , ψ δ ‹ y 2 p r Gq µ δ ‹ ˇˇˇˇ" ˇˇˇˇµ δ ‹ ´r µ ‹ ´x1 r GGreek ´ ζ δ ‹ , 1 r GGreek ´ ζ δ ‹ y 2 p r GGreekq µ δ ‹ ˇˇˇď ˇˇˇˇµ δ ‹ ´r µ ‹ ´x1 r GGreek ´ ζ δ ‹ , 1 r GGreek ´ ζ δ ‹ y 2 p r GGreekq r µ ‹ `r µp r G Greek q ´δK r µp r G Greek q ˇˇˇď ˇˇˇˇˇ´µ δ ‹ ´r µ ‹ ´x1 r GGreek , 1 r GGreek y 2 p r GGreekq ¯`´x ζ δ ‹ , ζ δ ‹ y 2 p r GGreekq ´2x1 r GGreek , ζ δ ‹ y 2 p r GGreekq r µ ‹ `r µp r G Greek q ´δK r µp r G Greek q ˇˇˇˇď pδKq `ˇˇx ζ δ ‹ , ζ δ ‹ y 2 p r GGreekq ´2x1 r GGreek , ζ δ ‹ y 2 p r GGreekq ˇř µ ‹ `r µp r G Greek q ´δK r µp r G Greek q ď pδKq `δ2 K 2 `2}1 r GGreek } 2 p r GGreekq ¨} ζ δ ‹ } 2 p r GGreekq r µ ‹ `r µp r G Greek q ´δK r µp r G Greek q ď pδKq `ˇˇx ζ δ ‹ , ζ δ ‹ y 2 p r GGreekq ´2x1 r GGreek , ζ δ ‹ y 2 p r GGreekq ˇř µ ‹ `r µp r G Greek q ´δK r µp r G Greek q ď pδKq `δ2 K 2 `2b r µp r G Greek qKδ r µ ‹ `r µp r G Greek q ´δK r µp r G Greek q ď pδKq `δ2 K 2 `2b r µp r G Greek qKδ r µ ‹ Thus, under the assumption δ ď 1 (implying δ 2 ď δ), we have ˇˇˇˇ1 ´xψ δ ‹ , ψ δ ‹ y 2 p r Gq µ δ ‹ ˇˇˇˇď K `K2 `2b r µp r G Greek qK r µ ‹ ¨δ. This implies that we have ˇˇˇˇf p‹q ´ÿ gPG f pgq xψ δ g , ψ δ ‹ y 2 p r Gq µ δ ‹ ˇˇˇˇď δ ¨K `K2 `2b r µp r G Greek qK r µ 3 2 ‹ ¨}f } 2 pGq . For the right-hand-side summand of the estimate in ( 22  G Greek q min gP r G Latin Yt‹u r µ 3 2 g ‹ ‹ ‚ }f } 2 pGq . Putting it all together, we find for δ ď 1 that }f ´r JJf } 2 pGq ď δ ¨KA ¨}f } 2 pGq with K A :" ¨K r µp r G Greek q min gP r G Latin r µ 3 2 g ‹ ‹ ‚ `2 ¨K ¨br µp r G Greek q min gP r G Latin Yt‹u r µ 3 2 g ‹ ‹ ‚ `K `K2 `2b r µp r G Greek qK r µ 3 2 ‹ . Thus the left hand side of (19) holds with " K A ¨δ. Right-hand-side of ( 19): Hence let us now check the right hand side of (19). We note pu ´J r Juq " u ´ÿ xPG xψ δ x , uy 2 p r Gq µ δ x ψ δ x . Let us denote by M the matrix representation M δ " Id ´r JJ " Id ´ÿ xPG xψ δ x , ¨y 2 p r Gq µ δ x ψ δ x . We use the triangle inequality to arrive at › › ›pu ´J r Juq › › › 2 p r Gq ď › › M 0 ¨u› › 2 p r Gq `› › M δ ´M 0 › › op ¨}u} 2 p r Gq . Using the fact that for g ‰ ‹ we have η δ g Ñ 0 an η 0 ‹ " 1 r GGreek we find in the (δ Ñ 0)-limit that M 0 " ˜0| r GLatin|ˆ| r GLatin| 0 | r GLatin|ˆ| r GGreekYt‹u| 0 | r GGreekYt‹u|ˆ| r GLatin| M 0 with Proof. Fix i and j. Let ti, g 1 , ..., g n , ju be the vertices traversed by a path of minimal length determining C r GGreekYt‹u pi, jq. We then have |upiq ´upjq| ď|upiq ´upg 1 q| `|upg 1 q ´upg 2 q| `... `|upg n q ´upjq| ďδ 1 2 1 ? Ω ˆbĂ W ig1 |upiq ´upg 1 q| 2 `bĂ W g1g2 |upg 1 q ´upg 2 q| 2 `... `bĂ W gnj ||upg n q ´upjq| 2 ďδ 1 2 1 ? Ω ´bE r G puq `bE r G puq `... `bE r G puq "δ 1 2 C r GGreekYt‹u pi, jq ? Ω b E r G puq ďδ 1 2 C r GGreekYt‹u ? Ω b E r G puq. With the help of this Lemma we then find › › M 0 ¨u› › 2 p r Gq ď δ 1 2 C r GGreekYt‹u ? Ω b E r G puq ¨g f f e ÿ i,jP r GGreekYt‹u ˜r µ i r µ j r µp r G Greek q `r µ ‹ " δ 1 2 ¨¨C r GGreekYt‹u ¨br µp r G Greek q `r µ ‹ ? Ω ‚¨bE r G puq. To derive a bound for › › M δ ´M 0 › › op in the second term of the estimate (23), we write M δ ´M 0 " ˆB A A : D ˙. Here we denote by A : : 2 p r G Latin q ÝÑ 2 p r G Greek Y t‹uq the adjoint of the operator A : 2 p r G Greek Y t‹uq ÝÑ 2 p r G Latin q. Clearly }A} op " }A : | op so that we have › › M δ ´M 0 › › op ď }B} op `2 }A} op `}D} op . To bound }B} op we note that B is diagonal and we have B " ¨r µ a ´1 µ δ a ´1 µ 0 a ¯r µ b ´1 µ δ b ´1 µ 0 b ¯. . . ‹ ‹ ‹ ‚ so that }B} op ď " max aP r G Latin r µ a ˇˇˇ1 µ δ a ´1 µ 0 a ˇˇˇ " " max aP r G Latin r µ a ˇˇˇ1 µ δ a ´1 µ 0 a ˇˇˇ " " max aP r G Latin r µ a ˇˇˇµ δ a ´µ0 a µ δ a ¨µ0 a ˇˇˇ ď " max aP r G Latin r µ a ˇˇˇµ δ a ´µ0 a r µ 2 a ˇˇˇ ď « max aP r G Latin r µ a ˇˇˇˇK δr µp r G Greek q r µ 2 a ˇˇˇˇff ď δ ¨» -K ¨r µp r G Greek q min aP r G Latin µ a fi fl . To estimate }A} op we note A " ¨0 η δ a pαq µ δ a η δ a pβq µ δ a ¨¨0 η δ b pαq µ δ b η δ a pβq µ δ b ¨¨0 η δ c pαq µ δ c η δ c pβq µ δ c ¨¨. . . . . . . . . ‹ ‹ ‹ ‹ ‹ ‚ . We can consider the map A : 2 p r G Greek Y t‹uq ÝÑ 2 p r G Latin q. as a composition of maps A : 2 p r G Greek Y t‹uq Id Ý Ñ C | r GGreekYt‹u| A Ý Ñ C | r GLatin| Id Ý Ñ 2 p r G Latin q. For the map Id :  µ δ g ‹ ‚¨max αP r GGreek " ζ δ ‹ pαq ı " δ ¨K ¨b| r G Greek Y t‹u| ¨¨1 min gP r GLatin µ δ g ¨min αP r GGreek a r µ α ‹ ‚ ď δ ¨K ¨b| r G Greek Y t‹u| ¨¨1 min gP r GLatin r µ g ¨max αP r GGreek a r µ α ‹ ‚. Here we estimated max αP r GGreek " ζ δ ‹ pαq ı ď 1 min αP r GGreek a r µ α } ζ δ ‹ } 2 p r GGreekq . In total, we find for the operator-norm of Thus let us now investigate }D} op . As before. let us denote by u P 2 p r G Greek Y t‹uq the restriction of an element u P 2 p r G to r G Greek Y t‹u. We have A : 2 p r G Greek Y t‹uq ÝÑ 2 p r G Latin q. }D} op " › › › › › › ÿ xP r GLatinYt‹u xψ δ x , ¨y 2 p r GGreekYt‹uq µ δ x ψ δ x ´ÿ xP r GLatinYt‹u xψ 0 x , ¨y 2 p r GGreekYt‹uq µ 0 x ψ 0 x › › › › › › ď › › › › › › ÿ xP r GLatin xψ δ x , ¨y 2 p r GGreekYt‹uq µ δ x ψ δ x ´ÿ xP r GLatin xψ 0 x , ¨y 2 p r GGreekYt‹uq µ 0 x ψ 0 x › › › › › › `› › › › › xψ δ ‹ , ¨y 2 p r GGreekYt‹uq µ δ ‹ ψ δ ‹ ´xψ 0 ‹ , ¨y 2 p r GGreekYt‹uq µ 0 ‹ ψ 0 ‹ › › › › › " › › › › › › ÿ xP r GLatin xψ δ x , ¨y 2 p r GGreekYt‹uq µ δ x ψ δ x › › › › › › `› › › › › xψ δ ‹ , ¨y 2 p r GGreekYt‹uq µ δ ‹ ψ δ ‹ ´xψ 0 ‹ , ¨y 2 p r GGreekYt‹uq µ 0 ‹ ψ 0 ‹ › › › › › . We note for the matrix representation of the first term, that (with α, β P r G Greek Y t‹u) we have ¨ÿ xP r GLatin xψ δ x , ¨y 2 p r GGreekYt‹uq µ δ x ψ δ x ‚αβ " ¨ÿ xP r GLatin 1 µ δ x η δ x pαq η δ x pβqr µ β ‚. Using the 'maximal row sum trick' complementary to the 'maximal column sum trick' already used for A above and recalling the definition of the weights µ δ g :" ÿ hP r G ψ δ g phq ¨r µ h we find › › › › › › ÿ xP r GLatin xψ δ x , ¨y 2 p r GGreekYt‹uq µ δ x ψ δ x › › › › › › ď b | r G Greek Y t‹u| ¨max xP r GGreekYt‹u a r µ x min xP r GGreekYt‹u a r µ x ¨max βP r GGreekYt‹u ¨ÿ αP r GGreekYt‹u ¨ÿ xP r GLatin 1 µ δ x η δ x pαq η δ x pβqr µ β ‚‚ ď b | r G Greek Y t‹u| ¨max xP r GGreekYt‹u a r µ x min yP r GGreekYt‹u a r µ y ¨max αP r GGreekYt‹u ¨ÿ xP r GLatin 1 µ δ x η δ x pαq ‚ ď b | r G Greek Y t‹u| ¨max xP r GGreekYt‹u a r µ x min yP r GGreekYt‹u a r µ y ¨max αP r GGreekYt‹u ¨ÿ xP r GLatin η δ x pαq ‚ ď b | r G Greek Y t‹u| ¨max xP r GGreekYt‹u a r µ x min yP r GGreekYt‹u a r µ y ¨max αP r GGreekYt‹u ζ δ ‹ pαq ď b | r G Greek Y t‹u| ¨max xP r GGreekYt‹u a r µ x min yP r GGreekYt‹u r µ 3 2 y ¨max αP r GGreekYt‹u } ζ δ ‹ | 2 p r GGreekq ď b | r G Greek Y t‹u| ¨max xP r GGreekYt‹u a r µ x min yP r GGreekYt‹u r µ 3 2 y ¨K ¨δ. It remains to bound the second term. We find (using › › ›ψ δ ‹ › › › 2 p r GGreekYt‹uq ď › › ›ψ 0 ‹ › › › 2 p r GGreekYt‹uq ): › › › › › xψ δ ‹ , uy 2 p r GGreekYt‹uq µ δ ‹ ψ δ ‹ ´xψ 0 ‹ , uy 2 p r GGreekYt‹uq µ 0 ‹ ψ 0 ‹ › › › › › 2 p r GGreekYt‹uq ď › › › › ˆ1 µ δ ‹ ´1 µ 0 ‹ ˙xψ δ ‹ , uy 2 p r GGreekYt‹uq ψ δ ‹ › › › › 2 p r GGreekYt‹uq `1 µ 0 ‹ › › ›xψ δ ‹ , uy 2 p r GGreekYt‹uq ψ δ ‹ ´xψ 0 ‹ , uy 2 p r GGreekYt‹uq ψ 0 ‹ › › › 2 p r GGreekYt‹uq ď ˇˇˇ1 µ δ ‹ ´1 µ 0 ‹ ˇˇˇ¨› › ›ψ δ ‹ › › › 2 2 p r GGreekYt‹uq ¨}u} 2 p r GGreekYt‹uq `1 µ 0 ‹ › › › ´xψ δ ‹ , uy 2 p r GGreekYt‹uq ´xψ 0 ‹ , uy 2 p r GGreekYt‹uq ¯ψ0 ‹ `xψ δ ‹ , uy 2 p r GGreekYt‹uq ´ψδ ‹ ´ψ0 ‹ ¯› › › 2 p r GGreekYt‹uq ď ˇˇˇ1 µ δ ‹ ´1 µ 0 ‹ ˇˇˇ¨› › ›ψ 0 ‹ › › › 2 2 p r GGreekYt‹uq ¨}u} 2 p r GGreekYt‹uq `2 1 µ 0 ‹ › › ›ψ δ ‹ ´ψ0 ‹ › › › 2 p r GGreekYt‹uq ¨› › ›ψ 0 ‹ › › › 2 p r GGreekYt‹uq ¨}u} 2 p r GGreekYt‹uq ď ¨δ ¨K ¨r µp r G Greek q ´r µ ‹ `r µp r G Greek q ¯r µ ‹ ‚¨´r µ ‹ `r µp r G Greek q ¯¨}u} 2 p r GGreekYt‹uq `2 1 r µ ‹ `r µp r G Greek q ¨´δ ¨K ¨r µp r G Greek q ¯¨b r µ ‹ `r µp r G Greek q ¨}u} 2 p r

GGreekYt‹uq

Thus we find }D} op ď δ ¨K ¨r µp r G Greek q ¨¨1 r µ ‹ `2 1 b r µ ‹ `r µp r G Greek q ‚ In total, using ( 23) and ( 24), we find › › ›pu ´J r Juq › › › 2 p r Gq ďδ 1 2 ¨¨C r GGreekYt‹u ¨br µp r G Greek q `r µ ‹ ? Ω ‚¨bE r G puq `δ ¨» -K ¨r µp r G Greek q min aP r G Latin µ a fi fl ¨}u} 2 p r Gq `2 ¨δ ¨K ¨b| r G Greek Y t‹u| ¨¨m ax gP r GLatin r µ g min gP r GLatin r µ g ¨max αP r GGreekYt‹u r µ 3 2 α ‹ ‹ ‚ ¨}u} 2 p r Gq `δ ¨K ¨r µp r G Greek q ¨¨1 r µ ‹ `2 1 b r µ ‹ `r µp r G Greek q ‚¨}u} 2 p r

Gq

Here we applied Lemma J.7. Comparing the δ ą 0 and δ " 0 terms, we find ˇˇˇ1 µ δ ‹ xψ δ ‹ , uy 2 p r Gq ´1 µ 0 ‹ xψ 0 ‹ , uy 2 p r Gq ˇˇď 1 µ δ ‹ ˇˇxψ δ ‹ ´ψ0 ‹ , uy 2 p r Gq ˇˇ`ˇˇˇˇ1 µ δ ‹ ´1 µ 0 ‹ ˇˇˇ¨ˇˇˇx ψ 0 ‹ , uy 2 p r Gq ˇď 1 r µ ‹ }u} 2 p r Gq ¨} ζ δ ‹ } 2 p r GGreekq `ˇˇˇ1 µ δ ‹ ´1 µ 0 ‹ ˇˇˇ¨´r µ ‹ `r µp r G Greek q ¯}u} 2 p r G ď Kδ r µ ‹ }u} 2 p r Gq `¨K δ r µ ‹ ´r µ ‹ `r µp r G Greek q ¯‚ ¨´r µ ‹ `r µp r G Greek q ¯}u} 2 p r G "δ 2K r µ ‹ }u} 2 p r Gq . Thus we have ˇˇˇ1 µ δ ‹ xu, ψ δ ‹ y ´up‹q ˇˇˇb µ δ ‹ ďδ 1 2 ¨| r G Greek Y t‹u| b r µ ‹ `r µp r G Greek q ˜C r GGreekYt‹u ? Ω ¸bE r G puq `δ 2K r µ ‹ }u} 2 p r Gq . For the remaining term in (25) we note g f f e ÿ xP r GLatin ˇˇˇ1 µ x xψ δ x , uy 2 p r Gq ´upxq ˇˇˇ2 ď g f f e ÿ xP r GLatin ˇˇˇ1 ´r µ x µ δ x ˇˇˇ2 ¨|upxq| 2 µ δ x `ÿ xP r GLatin ˇˇxψ δ x , uy 2 p r GGreekYt‹uq ˇˇbµ δ x ď Kδ r µ ‹ ¨}u} 2 p r Gq `ÿ xP r GLatin xψ δ x , |u|y 2 p r GGreekYt‹uq b µ δ x ď Kδ r µ ‹ ¨}u} 2 p r Gq `} ζ δ ‹ } 2 p r GGreekq ¨" max xP r GLatin b µ δ x  }u} 2 p r Gq ď Kδ r µ ‹ ¨}u} 2 p r Gq `δK r µp r G Greek q ¨" max xP r GLatin b Ă µ x `δK r µp r G Greek q  }u} 2 p r Gq ď Kδ r µ ‹ ¨}u} 2 p r Gq `δK r µp r G Greek q ¨«c max xP r GLatin Ă µ x `bδK r µp r G Greek q ff }u} 2 p r Gq .

Equation (21):

It finally only remains to prove the energy differences of ( 21) and establish |E r G pJ 1 f, uq ´EG pf, r J 1 uq| ď ¨a}f } 2 `EG pf q ¨b}u} 2 `E r G puq. We note that the (unique) operator associated to the energy E G via E G pg, f q " xg, ∆ G f y 2 pGq is given by p∆ G f qpxq " 1 µ x ÿ y" G x W xy pf pxq ´f pyqq. Here the notation "y " G x" signifies that nodes x and y are connected within G through edges with positive edge-weights W xy ą 0. Similarly the operator associated to E Remembering that we have  J 1 f " Jf " ÿ xPG f pxqψ x and p r J 1 uqpxq " upxq, we note ˇˇE r G pJ 1 f, uq ´EG pf, r J 1 uq ˇˇď ˇˇˇˇˇÿ xP r GLatinYt‹u f pxq " E r G pψ x , uq ´EG pψ x , uq ‰ ˇˇˇˇď ¨1 c min xP r GLatinYt‹u r µ x ‹ ‹ ‚ ¨}f } 2 pGq ¨ÿ xP r For I x we find -using Lemma J.7 -that |I x | ď ¨ÿ αP r GGreek Ă W xα ‚¨δ 1 2 ˜C r GGreekYt‹u ? Ω ¸bE r G puq and hence ÿ xPG x‰‹ |I x | ď ¨ÿ xPG x‰‹ ÿ αP r GGreek Ă W xα ‹ ‚¨δ 1 2 ˜C r GGreekYt‹u ? Ω ¸bE r G puq. Denote the orthogonal projections onto the corresponding eigenspaces by tP ´, P `u. Take the function g to be defined as gpλq :" 1 ´i i ´λ . Then since gp0q " 0 we have gpW q " 0. Furthermore we have gp Ă W q " " 1 ´i i ´1 δ  P ``" 1 ´i i `1 δ  P " P ``P ´´δ 1 δ `i P `´δ 1 δ ´i P " Id ´δ 1 δ `i P `´δ 1 δ ´i P " Id " 1 ´δ 1 δ `i  `"δ 1 δ `i ´δ 1 δ ´i  P " Id " 1 ´δ 1 δ `i  ´"δ 2i δ 2 `1  P Ẃe are interested in › › ›gp Ă W qJ δ ´Jδ gpW q › › › op " › › ›gp Ă W qJ δ › › › op " › › › › J δ ´δ " 1 δ `i P ``1 δ ´i P ´ J δ › › › › op . Assuming › › ›gp Ă W qJ δ ´Jδ gpW q › › › op " › › ›gp Ă W qJ δ › › › op ď η 1 pδq we also find ˇˇˇ› › J δ › › op ˆi δ `i ˙´› › J δ P ´› › op ˆδ2i δ 2 `1 ˙ˇˇˇď η 1 pδq. Thus also › › J δ › › op ˆi δ `i ˙ď η 1 pδq `› › J δ P ´› › op ˆδ2i δ 2 `1 ˙. Taking the limit and using the condition }J δ } op ď 2, we find that › › J δ › › Ñ 0 as δ Ñ 0. Since we demand }pJ ´r J ˚q} op ď η 2 pδq with lim δÑ0 η 2 pδq " 0, we also find } r J} op " } r J ˚}op Ñ 0. Next we note that we have R ω " 1 ω and demand }pId ´r J δ J δ qR ω } op Ñ 0. However }pId ´r J δ J δ qR ω } op " 1 |ω| }Id ´r J δ J δ } op ě 1 |ω| p1 ´} r J ˚}op }J} op q Ñ 1 |ω| ą 0. Thus we have our contradiction. Hence let us now choose T ( r T ) as the normalized graph Laplacians associated to the adjacency matrices W ( Ă W ) from above. We thus have L " 0 and Ă L " ˆ1 ´1 ´1 1 ˙. The eigenvectors and eigenvalues of Ă L are given by t0, 2u and v 0 " ˆ1 1 ˙and v 2 " ˆ1 ´1˙. Denote the orthogonal projections onto the corresponding eigenspaces by tP 0 , P 2 u. Then Ă L " 2P 2 . Chose a function g such that gp0q " 0 and without loss of generality assume gp2q " 1. Then 0 ÐÝ › › ›gp Ă L qJ δ ´Jδ gpL q › › › op " › › P 2 J δ › › op . ( ) Next we consider the demand }pId ´Jδ r J δ q r R ω u} ď η 3 ¨}u}. L PROOF OF THEOREM 5.8 We first note how the graph Laplacian ∆ G N as we have defined it, is consistent with the underlying positive (in the sense of non-negative eigenvalues) Laplacian " ´∆S 1 " ´B2 Bθ 2 " on the unit circle S 1 . To this end, fix 0 ă h ăă 1. Fix a point x P S 1 . For any suitable function f -by means of Taylor expansions -we may write f px `hq " f pxq `h ¨rB θ f spxq `h2 2 ¨r∆ S 1 f spxq `Oph 3 q f px ´hq " f pxq ´h ¨rB θ f spxq `h2 2 ¨r∆ S 1 f spxq `Oph 3 q. Adding these two terms, we find r´∆ S 1 f spxq " 2f pxq ´f px `hq ´f px ´hq h 2 `Ophq. This motivates setting our edgeweights on G N to 1{h 2 with h " 2π{N the distance between evenly spaced nodes on the unit-circle S 1 . Remark L.1. It should be noted that this consistency property -while given a heuristic to choose weights -does not (immediately) imply 'convergence' of ∆ G N to ´∆S 1 in the sense needed to e.g. apply Levie et al. (2019a) . As our proof of Theorem L proceeds completely without reference to the limit-circle, we do not proceed beyond the above heuristic in investigating in what (relevant) sense ∆ G N approximates ´∆S 1 . We thus now want to prove the following result: Theorem L.2. In the large graph setting of Section 5.2 choose all node-weights equal to one and N to be odd for definiteness. There exists constants K 1 , K 2 " Op1q so that for each N ě 1, there exist identification operators J, r J mapping between 2 pG N q and 2 pG N `1q so that J and r J are pK 1 {N q-quasi-unitary with respect to ∆ G N , ∆ G N `1 and ω " p´1q. Furthermore, the operators ∆ G N and ∆ G N `1 are p´1q-pK 2 {N q close with identification operator J. Proof. We first note that the normalized eigenvectors of G N are given by φ N k pxq " 1 ? N e i 2πk N x 0 ď k ă N. The corresponding eigenvalues are easily found to be ¯ˇˇˇˇǐ s bounded on the rectangle r0, 1s ˆr0, 1 2 s. We change variables y " πx{p1 `aq and consider F pa, yq " ˇˇˇs in pypa `2qq sin pyq ˇˇˇ¨ˇˇˇs in pyaq a ¨sin pyq ˇˇǒ n r0, 1s ˆr0, π 2 s instead. Away from y " 0 this is obvious. Close to y " 0 we might Taylor expand in numerators and denominators respectively and then (formally) divide them both respectively by y to see that the function F pa, yq is indeed regular at y " 0 too and hence on the entire compact set r0, 1s ˆr0, π 2 s. As a continuous function, F attains its supremum on this set. Denote it by K. Hence we now know }JR ´1 ´r R ´1J } op ď r2 `Ks ¨a " r2 `Ks ¨1 N . Thus we have established the desired Op1{N q-decay. M PROOF OF THEOREM 6.1 Theorem M.1. For p ě 2 we have in the setting of Theorem 3.1 that }Ψ p N pf q ´Ψp N phq} R Kout ď ´śN n"1 L n R n B n ¯¨}f ´h} Lin . In the setting of Theorem 4.3 or 5.4 and under the additional assumption that the 'final' identification operator J N satisfies ˇˇ}J N f i } k p r G N q ´}f i } k pG N q ˇˇď O NOTATIONAL CONVENTIONS We provide a summary of employed notational conventions: Coulomb interaction between atoms i and j }x i ´xj } Euclidean distance between x i and x j



Figure 2: Update Rule for a GCN

Figure 3: Collapsed (left) and original (right) Graphs

decay for Levie et al. (2019a) (ibid. Theorem 5, pt. 3) assuming a similar decay of operatordistances. Our framework might this capture transferability properties other approaches could miss.

Figure 7: The Large-N Regime

Figure 9: Sparsely connected Layer

pu, vq :" xu, ∆ G vy 2 p r Gq . We further use the notation E r G puq :" E r G pu, uq. With E when r G is replaced by G. Let us next solve the convex optimization program (5) introduced in Definition 5.5, restated here for convenience: Definition J.1. For each g P G, define the signal ψ δ g P 2 p r Gq as the unique solution to the convex optimization program min E r G puq subject to uphq " δ hg for all h P r G Latin ď t‹u.

(using that x η δ g , η δ y y 2 p r GGreekq is a non-negative number and we have } ¨}2 ď } ¨}1 )

2 p r G Greek Y t‹uq Ñ C | rGGreekYt‹u| we find }Id} op " ˜min find for the map Id :2 p r G Latin q Ñ C | rGLatin| that }Id} op " ˜max gP r GLatin r µ g ¸. To bound the operator norm of the map A : C | r GGreekYt‹u| Ñ C | r GLatin| , we use that the operator-norm is smaller than the maximal column-sum times b | r G Greek Y t‹u|. Hence for A as a map from C | r GGreekYt‹u| to C | r GLatin| we find

´upyqq with the equivalence relation " r G precisely signifying that Ă W xy ą 0. As before. let us denote by u P 2 p r G Greek Yt‹uq the restriction of an element u P 2 p r G to r G Greek Yt‹u. We noteE G pψ x , uq " xψ x , ∆ G uy 2 pGq " ÿ y" G xW xy pupxq ´upyqq on the smaller graph G. For the graph r

pψ x , uq ´EG pψ x , uq ˇĽet us first bound the terms corresponding to x ‰ ‹: We have E G pψ x , uq " ÿ y" G x y‰‹ W xy pupxq ´upyqq `Wx‹ pupxq ´up‹qq " ÿ y" G x y‰‹ Ă W xy pupxq ´upyqq `Wx‹ pupxq ´up‹qq, x ‰ ‹) E G pψ x , uq ´E r G pψ x , uq " W x‹ pupxq ´up‹qq ´Ă W x‹ pupxq ´up‹qq ´ÿ αP r

L ´ωIdq is bijective, (28) is implies }pId ´Jδ r J δ qv} ď η 3 pδq ¨r|ω|}v} `} Ă L } ¨}v}s " η 3 pδq ¨r|ω| `2s ¨}v}. not yet know the behaviour of f p¨q, a δ , b δ as δ Ñ 0.With the above notation, we find from (29) that}pId ´Jδ r J δ qv} " › › › › ˆva ´f pδqa δ v a ´f pδqb δ v b v b ´f pδqa δ v a ´f pδqb δ v b ˙´η 4 pδq Bˆv a ´f pδqa δ v a ´f pδqb δ v b v b ´f pδqa δ v a ´f pδqb δ v b ˙› › › › ´η4 pδq ¨4 ¨}v}.Thus, combining this result with (29), we know that› › › › ˆva ´f pδqa δ v a ´f pδqb δ v b v b ´f pδqa δ v a ´f pδqb δ v b˙› › › › ÝÑ 0. Thus, since both entries of the above vector need to tend to zero, we need both f pδq ¨aδ Ñ 1 and f pδq ¨bδ Ñ 0 as well as f pδq ¨aδ Ñ 0 and f pδq ¨bδ Ñ 1 which yields the desired contradiction.

Lemma 4.1. Denote by } ¨}F the Frobenius norm and let T and r T be normal on 2 pGq and 2 p r Gq respectively. Let g be Lipschitz continuous with Lipschitz constant D g . For any linear J : 2 pGq Ñ 2 p r Gq we have }gp r T qJ ´JgpT q} F ď D g } r T J ´JT } F . |kC k´1 for g entire, we have }gpT qJ ´Jgp r T q} op ď K g ¨}JT ´r T J} op .

x¨, ¨yH . Prototypical examples are given by the Euclidean spaces C d with inner product xx, yy C d :" ř d i"1 x i y i . Associated to an inner product is a norm, denoted by } ¨}H and defined by }x} H :" a xx, xy H for x P H. Direct Sums of Spaces: Given two potentially different Hilbert spaces H and p H, one can form their direct sum H ' p H. Elements of H ' p H are vectors of the form pa, bq, with a P H and b P p H. Addition and scalar multiplication are defined in the obvious way by pa, bq `λpc, dq :" pa `λc, b `λdq for a, c P H, b, d P p H and λ P C. The inner product on the direct sum is defined by xpa, bq, pc, dqy H' p H :" xa, cy H `xb, dy

Operator Norm: Let J : H Ñ r H be a linear operator between Hilbert spaces. We measure its 'size' by what is called the operator norm, denoted by } ¨}op and defined by Normal Operators: If a linear operator ∆ : H Ñ H maps from and to the same Hilbert space, we can compare it directly with its adjoint. If ∆∆ ˚" ∆ ˚∆, we say that the operator ∆ is normal. Special instances of normal operators are self-adjoint operators, for which we have the stronger property ∆ " ∆ ˚. If an operator is normal, there are unitary maps U : H Ñ H diagonalizing ∆ as

PROOF OFLEMMA 2.3    We want to prove the following: Lemma D.1. For holomorphic g and generic T we have }gpT q} op ď |gp8q| 1 BD |gpzq|γ T pzqd|z|. Furthermore we have for any T with γ T pωq ď C, that }gpT q} op ď }g} F hol

pπaq 2 ˇˇˇˇˇp 1 `aq 2 sin 2 ´πx 1 1`a ¯´sin 2 pπxq rpπaq 2 `sin 2 pπxqs ¨rpπaq 2 `p1 `aq 2 sin 2 ´πx 1 pπaq 2 ˇˇˇˇˇs in 2 ´πx 1 1`a ¯´sin 2 pπxq `a sin 2 ´πx 1 1`a ¯`a 2 sin 2 ´πx 1 1`a rpπaq 2 `sin 2 pπxqs ¨rpπaq 2 `p1 `aq 2 sin 2 ´πx 1 pπaq 2 ˇˇˇˇˇs in ´πx a 1`a ¯¨sin ´πx a`2 a`1 ¯`a sin 2 ´πx 1 1`a ¯`a 2 sin 2 ´πx 1 1`a rpπaq 2 `sin 2 pπxqs ¨rpπaq 2 `p1 `aq 2 sin 2 ´πx 1

Classification Accuracies on Social Network Datasets ´1, R ω the resolvent of T at ω γ

annex

For the weights tµ δ g u gPG we then find r µ g ď µ δ g ď r µ g `δKWe also write r µp r G Greek q :" ř αP rGGreek r µ α . If g " ‹, we have r µ δ ‹ `p1 ´δqr µp r G Greek q ď µ δ ‹ ď r µ δ ‹ `r µp r G Greek q.Having set the scene, we are now ready to prove Theorem 5.4. Following Post & Simmer (2017) , instead of checking the conditions of Definition 5.1 and Definition 5.2 it is instead sufficient to check the following, with J r J as defined in Section 5.2 to establish Theorem 5.6:Lemma J.6. In addition to identification operators J, r J, assume that there exist additional operators J 1 : 2 pGq Ñ 2 p r Gq and r J 1 : 2 p r Gq Ñ 2 pGq so that the following set of equations is satisfied with " Opδ 1 2 q }Jf } ď p1 ` 1 q}f }, |xJf, uy ´xf, rJuy| ď 1 }f } (18)Then the (normal) operators ∆ and r ∆ are (doubly) (-1)-( " 12 1 ) -close with identification-operator J.Here, we always have u P 2 p r Gq and f P 2 pGq)Proof. This follows immediately after combining Proposition 4.4.12 with Theorem 4.4.15 of Post (2012) .We set J 1 f " Jf and p r J 1 uqpxq " upxq and now determine the individual " pδq values for which these equations are satisfied:Left-hand-side of (18): For the left hand side of (18) we note (using 2ab ď a 2 `b2 and the fact that the ψ g form a partition of unity):Here the second to last inequality follows from the definition of the weights µ δ g . Thus the left hand side of (18) holds with " 0.Right-hand-side of (18):The right hand side of (18) holds trivially with " 0 since we have chosen J ˚" r J.Left-hand-side of ( 19): Now let us check the l.h.s. of ( 19). We have:Using the constant K defined in ( 14) we haveWe also write r µp rWe next noteThus for y ‰ ‹ we findacting on 2 p r G Greek Y t‹uq . For any element v P 2 p r Gq, let us denote its restriction to r G Greek Y t‹uby v P 2 p r G Greek Y t‹uq . We thus findTo proceed, we prove the following Lemma:Lemma J.7. Let i, j P r G Greek Y t‹u. Denote by C r GGreekYt‹u pi, jq the minimum number of edges for which ω ij ŋ 0 needed to connect i and j by a path. SetFurthermore setWe haveWe call C r GGreekYt‹u the connectivity constant of the sub-graph r G Greek Y t‹u and note that it is well-defined since we assume r G Greek Y t‹u to be connected.and may hence setThe left hand side of ( 20) is true with " 0 by definition.Right-hand-side of ( 20):Let us thus check the right hand side of (20):We haveWe noteWe first deal with the left hand term of the estimate and note that for x " ˚we haveThus we find -using Cauchy-Schwarz -thatHere we denoted by r d α the degree of the node α. We further note It remains to bound the x " ‹ term in (26). To this end we noteFor the difference of the energy forms we thus findWe have.with the last term vanishing by symmetry. This implies Continuing, we findThis -in turn -we can write asFor the first term, we findFor the second term we noteK PROOF OF THEOREM 5.7We prove the following theorem:Theorem K.1. In the setting of Theorem 5.6 denote by T ( r T ) adjacency matrices or normalized graph Laplacians on 2 pGq ( 2 pGq). There are no functions η 1 , η 2 : r0, 1s Ñ R ě0 with η i pδq Ñ 0 as δ Ñ 0 (i " 1, 2), families of identification operators J δ , r J δ and ω P C so that J δ and r J δ are η 1 pδq-quasi-unitary with respect to r T , T and ω while the operators r T and T remain ω-η 2 pδq close.Proof. We prove these two result through contradiction on a graph with two vertices and one edge with weight 1{δ, which we collapse. First fix T ( r T ) to be the adjacency matricesThe eigenvectors and eigenvalues of Ă W are given by t´1 δ , 1 δ u andFor definiteness, we have assumed N to be odd, so that pN `1q is even. We define the identification operator J : 2 pG N q Ñ 2 pG N `1q viaon the orthonormal basis tφ N k u 0ďkăN and extend it to all of 2 pG N q via normality. This implies that precisely the eigenspace spanned by φ N `1does not lie in the image of J. We set r J to be the adjoint J ˚of J. Choosing ω " 1, we shall now first check the equations of Definition 5.1. Since J is isometric, we have }Jf } " }f } ď 2}f } as desired. Since r J " J ˚, we have } r J ´J˚} " 0.Since r JJ " Id 2 pG N q , what remains to be checked is the demandWe haveThus let us now check that the conditions of Definition 5.2 are fulfilled. We note that with our identification operator and by symmetry (λ N k " λ N N ´k), we havesin 2 ´π pN `1q k ¯ˇˇˇˇˇ.We now need to bound the right hand side uniformly in k as N Ñ 8. To this end we write a :" 1{N (which implies N `1 N " 1 `a) and x " k N (which for our allowed values of k implies 0 ď x ă 1 2 ). With this we have δ ¨K ¨}f i } 2 pG N q for all f i P 2 pG N q, we have }Ψ p N pf q ´r Ψ p N pJ 0 f q} R Kout ď pN ¨DRL `K pBRLqq ¨pBRLq N ´1 ¨}f } Lin ¨δ.Proof. To prove the first claim, we note" }Φ p N pf q ´Φp N pgq} R Kout where we used the reverse triangle inequality and the fact that } ¨} p p r Goutq ď } ¨} 2 p r Goutq for 2 ď p. To finish the proof we now only need to apply Theorem 3.1.To prove the second claim we noteand the claim follows as before.The proof of the third claim proceed in complete analogy.

N ADDITIONAL DETAILS ON EXPERIMENTAL SETUP

Scaling Operators: The adjacency matrix fo the given graph is given by The exceptional vertex ‹ here carries index "4" ("‹ " 4"). Node weights are set to unity.The Realm of Large Graphs: We also plot the difference in characteristic operators as opposed to their resolvents: Experiments on Molecules: The dataset we consider is the QM7 dataset, introduced in Blum & Reymond (2009); Rupp et al. (2012) . This dataset contains descriptions of 7165 organic molecules, each with up to seven heavy atoms, with all non-hydrogen atoms being considered heavy. A molecule is represented by its Coulomb matrix C Clmb , whose off-diagonal elementscorrespond to the Coulomb-repulsion between atoms i and j, while diagonal elements encode a polynomial fit of atomic energies to nuclear charge Rupp et al. (2012) :For each atom in any given molecular graph, the individual Cartesian coordinates R i and the atomic charge Z i are also accessible individually. To each molecule an atomization energy -calculated via density functional theory -is associated. The objective is to predict this quantity, the performance metric is mean absolute error. Numerically, atomization energies are negative numbers in the range ´600 to ´2200. The associated unit is rkcal/mols.

