SIGN AND BASIS INVARIANT NETWORKS FOR SPECTRAL GRAPH REPRESENTATION LEARNING

Abstract

We introduce SignNet and BasisNet-new neural architectures that are invariant to two key symmetries displayed by eigenvectors: (i) sign flips, since if v is an eigenvector then so is -v; and (ii) more general basis symmetries, which occur in higher dimensional eigenspaces with infinitely many choices of basis eigenvectors. We prove that under certain conditions our networks are universal, i.e., they can approximate any continuous function of eigenvectors with the desired invariances. When used with Laplacian eigenvectors, our networks are provably more expressive than existing spectral methods on graphs; for instance, they subsume all spectral graph convolutions, certain spectral graph invariants, and previously proposed graph positional encodings as special cases. Experiments show that our networks significantly outperform existing baselines on molecular graph regression, learning expressive graph representations, and learning neural fields on triangle meshes. Our code is available at https://github.com/cptq/SignNet-BasisNet.

1. INTRODUCTION

Numerous machine learning models process eigenvectors, which arise in various settings including principal component analysis, matrix factorizations, and operators associated to graphs or manifolds. An important example is the use of Laplacian eigenvectors to encode information about the structure of a graph or manifold (Belkin & Niyogi, 2003; Von Luxburg, 2007; Lévy, 2006) . Positional encodings that involve Laplacian eigenvectors have recently been used to generalize Transformers to graphs (Kreuzer et al., 2021; Dwivedi & Bresson, 2021) , and to improve the expressive power and empirical performance of graph neural networks (GNNs) (Dwivedi et al., 2022) . Furthermore, these eigenvectors are crucial for defining spectral operations on graphs that are foundational to graph signal processing and spectral GNNs (Ortega et al., 2018; Bruna et al., 2014) . However, there are nontrivial symmetries that should be accounted for when processing eigenvectors, as has been noted in many fields (Eastment & Krzanowski, 1982; Rustamov et al., 2007; Bro et al., 2008; Ovsjanikov et al., 2008) . For instance, if v is an eigenvector, then so is -v, with the same eigenvalue. More generally, if an eigenvalue has higher multiplicity, then there are infinitely many unit-norm eigenvectors that can be chosen. Indeed, a full set of linearly independent eigenvectors is only defined up to a change of basis in each eigenspace. In the case of sign invariance, for any k eigenvectors there are 2 k possible choices of sign. Accordingly, prior works on graph positional encodings randomly flip eigenvector signs during training in order to approximately learn sign invariance (Kreuzer et al., 2021; Dwivedi et al., 2020; Kim et al., 2022) . However, learning all 2 k invariances is challenging and limits the effectiveness of Laplacian eigenvectors for encoding positional information. Sign invariance is a special case of basis invariance when all eigenvalues are distinct, but general basis invariance is even more difficult to deal with. In Appendix C.2, we show that higher dimensional eigenspaces are abundant in real datasets; for instance, 64% of molecule graphs in the ZINC dataset have a higher dimensional eigenspace. In this work, we address the sign and basis ambiguity problems by developing new neural networks-SignNet and BasisNet. Under certain conditions, our networks are universal and can approximate any continuous function of eigenvectors with the proper invariances. Moreover, our networks are theoretically powerful for graph representation learning-they can provably approximate and go beyond both spectral graph convolutions and powerful spectral invariants, which allows our networks to express graph properties like subgraph counts that message passing neural networks cannot. Laplacian eigenvectors with SignNet and BasisNet can provably approximate many previously proposed graph positional encodings, so our networks are general and remove the need for choosing one of the many positional encodings in the literature. Experiments on molecular graph regression tasks, learning expressive graph representations, and texture reconstruction on triangle meshes illustrate the empirical benefits of our models' approximation power and invariances. A neural network applied to the eigenvectors matrix (middle) should be invariant or equivariant to permutation of the rows (left product with a permutation matrix P ) and invariant to the choice of eigenvectors in each eigenbasis (right product with a block diagonal orthogonal matrix

2. SIGN AND BASIS INVARIANT NETWORKS

Diag(Q 1 , Q 2 , Q 3 )). For an n × n symmetric matrix, let λ 1 ≤ . . . ≤ λ n be the eigenvalues and v 1 , . . . , v n the corresponding eigenvectors, which we may assume to form an orthonormal basis. For instance, we could consider the normalized graph Laplacian L = I -D -1/2 AD -1/2 , where A ∈ R n×n is the adjacency matrix and D is the diagonal degree matrix of some underlying graph. For undirected graphs, L is symmetric. Nonsymmetric matrices can be handled very similarly, as we show in Appendix B.1. Motivation. Our goal is to parameterize a class of models f (v 1 , . . . , v k ) taking k eigenvectors as input in a manner that respects the eigenvector symmetries. This is because eigenvectors capture much information about data; for instance, Laplacian eigenvectors of a graph capture clusters, subgraph frequencies, connectivity, and many other useful properties (Von Luxburg, 2007; Cvetković et al., 1997) . A major motivation for processing eigenvector input is for graph positional encodings, which are additional features appended to each node in a graph that give information about the position of that node in the graph. These additional features are crucial for generalizing Transformers to graphs, and also have been found to improve performance of GNNs (Dwivedi et al., 2020; 2022) . Figure 2 illustrates a standard pipeline and the use of our SignNet within it: the input adjacency, node features, and eigenvectors of a graph are used to compute a prediction about the graph. Laplacian eigenvectors are processed before being fed into this prediction model. Laplacian eigenvectors have been widely used as positional encodings, and many works have noted that sign and/or basis invariance should be addressed in this case (Dwivedi & Bresson, 2021; Beaini et al., 2021; Dwivedi et al., 2020; Kreuzer et al., 2021; Mialon et al., 2021; Dwivedi et al., 2022; Kim et al., 2022) . Sign invariance. For any eigenvector v i , the sign flipped -v i is also an eigenvector, so a function f : R n×k → R dout (where d out is an arbitrary output dimension) should be sign invariant: f (v 1 , . . . , v k ) = f (s 1 v 1 , . . . , s k v k ) for all sign choices s i ∈ {-1, 1}. That is, we want f to be invariant to the product group {-1, 1} k . This captures all eigenvector symmetries if the eigenvalues λ i are distinct and the eigenvectors are unit-norm. Basis invariance. If the eigenvalues have higher multiplicity, then there are further symmetries. Let V 1 , . . . , V l be bases of eigenspaces-i.e., V i = v i1 . . . v i d i ∈ R n×di has orthonormal columns and spans the eigenspace associated with the shared eigenvalue µ i = λ i1 = . . . = λ i d i . Any other orthonormal basis that spans the eigenspace is of the form V i Q for some orthogonal Q ∈ O(d i ) ⊆ R di×di (see Appendix F.2). Thus, a function f : R n× l i=1 di → R dout that is invariant to changes of basis in each eigenspace satisfies f (V 1 , . . . , V l ) = f (V 1 Q 1 , . . . , V l Q l ), Q i ∈ O(d i ). In other words, f is invariant to the product group O(d 1 ) × . . . × O(d l ). The number of eigenspaces l and the dimensions d i may vary between matrices; we account for this in Section 2.2. As O(1) = {-1, 1}, sign invariance is a special case of basis invariance when all eigenvalues are distinct. Permutation equivariance. For GNN models that output node features or node predictions, one typically further desires f to be invariant or equivariant to permutations of nodes, i.e., along the rows of each vector. Thus, for f : R n×d → R n×dout , we typically require f (P V 1 , . . . , P V l ) = P f (V 1 , . . . , V l ) for any permutation matrix P ∈ R n×n . Figure 1 illustrates all of the symmetries. Universal Approximation. We desire universal models, which can approximate any function in a target class. Formally, we say that a class of functions F model of domain X and output space Y universally approximates a class of functions F target if for any ϵ > 0, any compact Ω ⊆ X , and any target function f ∈ F target , there exists an f ∈ F model such that ∥f (x) -f (x)∥ < ϵ for all x ∈ Ω. Regarding (b), permuting the rows of V permutes rows and columns of V V ⊤ ∈ R n×n . Hence, we desire the function ϕ : R n×n → R n on V V ⊤ to be equivariant to simultaneous row and column permutations: ϕ(P V V ⊤ P ⊤ ) = P ϕ(V V ⊤ ). To parameterize such a mapping from matrices to vectors, we use an invariant graph network (IGN) (Maron et al., 2018) -a neural network mapping to and from tensors of arbitrary order R n d 1 → R n d 2 that has the desired permutation equivariance. We thus parameterize a family with the requisite invariance and equivariance as follows: h(V ) = IGN(V V ⊤ ). Proposition 2 states that this architecture universally approximates O(d) invariant and permutation equivariant functions. The full approximation power requires high order tensors to be used for the IGN; in practice, we restrict the tensor dimensions for efficiency, as discussed in the next section. Proposition 2. Any continuous, O(d) invariant h : R n×d → R dout is of the form h(V ) = ϕ(V V ⊤ ) for a continuous ϕ. For a compact Z ⊆ R n×d , maps of the form V → IGN(V V ⊤ ) universally approximate continuous h : Z ⊆ R n×d → R n that are O(d) invariant and permutation equivariant.

2.2. NEURAL NETWORKS ON MULTIPLE EIGENSPACES

To develop a method for processing multiple eigenvectors (or eigenspaces), we first prove a general decomposition theorem (see Appendix A for more details). Our result reduces invariance for a large product group G 1 × . . . × G k to the much simpler invariances for the smaller constituent groups G i . Theorem 1 (Informal). Let a product of groups G = G 1 × . . . × G k act on X 1 × . . . × X k . Under mild conditions, any continuous G-invariant function f can be written f (x 1 , . . . , x k ) = ρ(ϕ 1 (x 1 ), . . . , ϕ k (x k )) , where ϕ i is G i invariant, and ϕ i and ρ are continuous If X i = X j and G i = G j , then we can take ϕ i = ϕ j . The key consequence of this result is that if we know how to design invariant models for the smaller groups G i (of size 2 for sign invariance), then we can combine them in a simple way to get invariant models for the larger and more complex G (of size 2 k for sign invariance), without losing any expressive power. For eigenvector data, the ith eigenvector (or eigenspace) is in X i , and its symmetries are described by G i . Thus, we can reduce the multiple-eigenspace case to the single-eigenspace case, and leverage the models we developed in the previous section. SignNet. We parameterize our sign invariant network f : R n×k → R dout on eigenvectors v 1 , . . . , v k as: f (v 1 , . . . , v k ) = ρ [ϕ(v i ) + ϕ(-v i )] k i=1 , where ϕ and ρ are unrestricted neural networks, and [•] i denotes concatenation of vectors. The form ϕ(v i ) + ϕ(-v i ) induces sign invariance for each eigenvector. Since we do not yet impose permutation equivariance here, we term this model Unconstrained-SignNet. To obtain a sign invariant and permutation equivariant f that outputs vectors in R n×dout , we restrict ϕ and ρ to be permutation equivariant networks from vectors to vectors, such as elementwise MLPs, DeepSets (Zaheer et al., 2017) , Transformers (Vaswani et al., 2017) , or most standard GNNs. We name this permutation equivariant version SignNet. If desired, we can use eigenvalues λ i , an adjacency matrix A ∈ R n×n , and node features X ∈ R n×d feat by adding them as arguments to ϕ: f (v 1 , . . . , v k , λ 1 , . . . , λ k , X) = ρ [ϕ(v i , λ i , A, X, ) + ϕ(-v i , λ i , A, X)] k i=1 . BasisNet. For basis invariance, let V i ∈ R n×di be an orthonormal basis of a d i dimensional eigenspace. Then we parameterize our Unconstrained-BasisNet f by f (V 1 , . . . , V l ) = ρ [ϕ di (V i V ⊤ i )] l i=1 , where each ϕ di is shared amongst all subspaces of the same dimension d i , and l is the number of eigenspaces (i.e., number of distinct eigenvalues, which can differ from the number of eigenvectors k). As l differs between graphs, we may use zero-padding or a sequence model like a Transformer to parameterize ρ. Again, ϕ di and ρ are generally unrestricted neural networks. To obtain permutation equivariance, we make ρ permutation equivariant and let ϕ di = IGN di : R n 2 → R n be IGNs from matrices to vectors. For efficiency, we will only use matrices and vectors in the IGNs (that is, no tensors in R n p for p > 2), i.e., we use 2- IGN (Maron et al., 2018) . Our resulting BasisNet is f (V 1 , . . . , V l ) = ρ [IGN di (V i V ⊤ i )] l i=1 . Expressive-BasisNet. While we restrict SignNet to only use vectors and BasisNet to only use vectors and matrices, higher order tensors are generally required for universally approximating permutation equivariant or invariant functions (Keriven & Peyré, 2019; Maron et al., 2019; Maehara & NT, 2019) . Thus, we will consider a theoretically powerful but computationally impractical variant of our model, in which we replace ρ and IGN di in BasisNet with IGNs of arbitrary tensor order. We call this variant Expressive-BasisNet. Universal approximation requires O(n n ) sized intermediate tensors (Ravanbakhsh, 2020) . We study Expressive-BasisNet due to its theoretical interest, and to juxtapose with the computational efficiency and strong expressive power of SignNet and BasisNet. In the multiple subspace case, we can prove universality for some instances of our models through our decomposition theorem-see Section A for details. For a summary of properties and more details about our models, see Appendix B.

3. THEORETICAL POWER FOR GRAPH REPRESENTATION LEARNING

Next, we establish that our SignNet and BasisNet can go beyond useful basis invariant and permutation equivariant functions on Laplacian eigenvectors for graph representation learning, including: spectral graph convolutions, spectral invariants, and existing graph positional encodings. Expressive-BasisNet can of course compute these functions, but this section shows that the practical invariant architectures SignNet and BasisNet can compute them as well.

3.1. SIGNNET AND BASISNET STRICTLY GENERALIZE SPECTRAL GRAPH CONVOLUTION

For node features X ∈ R n×d feat and an eigendecomposition V ΛV ⊤ , a spectral graph convolution takes the form f (V, Λ, X) = n i=1 θ i v i v ⊤ i X = V Diag(θ)V ⊤ X, for some parameters θ i , that may optionally be continuous functions h(λ i ) = θ i of the eigenvalues (Bruna et al., 2014; Defferrard et al., 2016) . This family includes important functions like heat kernels and generalized PageRanks on graphs (Li et al., 2019) . A spectral GNN is defined as multiple layers of spectral graph convolutions and node-wise linear maps, e.g. V Diag(θ 2 )V ⊤ σ V Diag(θ 1 )V ⊤ XW 1 W 2 is a two layer spectral GNN. It can be seen (in Appendix H.1) that spectral graph convolutions are permutation equivariant and sign invariant, and if θ i = h(λ i ) (i.e. the transformation applied to the diagonal elements is parametric) they are additionally invariant to a change of bases in each eigenspace. Our SignNet and BasisNet can be viewed as generalizations of spectral graph convolutions, as our networks universally approximate all spectral graph convolutions of the above form. For instance, SignNet with ρ(a 1 , . . . , a k ) = k i=1 a k and ϕ(v i , λ i , X) = 1 2 θ i v i v ⊤ i X directly yields the spectral graph convolution. This is captured in Theorem 2, which we prove in Appendix H.1. In fact, we may expect SignNet to learn spectral graph convolutions well, according to the principle of algorithmic alignment (Xu et al., 2020) (see Appendix H.1); this is supported by numerical experiments in Appendix J.3, in which our networks outperform baselines in learning spectral graph convolutions. Theorem 2. SignNet universally approximates all spectral graph convolutions. BasisNet universally approximates all parametric spectral graph convolutions. In fact, SignNet and BasisNet are strictly stronger than spectral graph convolutions; there are functions computable by SignNet and BasisNet that cannot be approximated by spectral graph convolutions or spectral GNNs. This is captured in Proposition 3: our networks can distinguish bipartite graphs from non-bipartite graphs, but spectral GNNs cannot for certain choices of graphs and node signals.foot_0 Proposition 3. There exist infinitely many pairs of non-isomorphic graphs that SignNet and BasisNet can distinguish, but spectral graph convolutions or spectral GNNs cannot distinguish.

3.2. BASISNET CAN COMPUTE SPECTRAL INVARIANTS

Many works measure the expressive power of graph neural networks by comparing their power for testing graph isomorphism (Xu et al., 2019; Sato, 2020) , or by comparing their ability to compute certain functions on graphs like subgraph counts (Chen et al., 2020; Tahmasebi et al., 2020) . These works often compare GNNs to combinatorial invariants on graphs, especially the k-Weisfeiler-Leman (k-WL) tests of graph isomorphism (Morris et al., 2021) . While we may also compare with these combinatorial invariants, as other GNN works that use spectral information have done (Beaini et al., 2021) , we argue that it is more natural to analyze our networks in terms of spectral invariants, which are computed from the eigenvalues and eigenvectors of graphs. There is a rich literature of spectral invariants from the fields of spectral graph theory and complexity theory (Cvetković et al., 1997) . For a spectral invariant to be well-defined, it must be invariant to permutations and changes of basis in each eigenspace, a characteristic shared by our networks. The simplest spectral invariant is the multiset of eigenvalues, which we give as input to our networks. Another widely studied, powerful spectral invariant is the collection of graph angles, which are defined as the values α ij = ∥V i V ⊤ i e j ∥ 2 , where V i ∈ R n×di is an orthonormal basis for the ith adjacency matrix eigenspace, and e j is the jth standard basis vector, which is zero besides a one in the jth component. These are easily computed by our networks (Appendix H.3), so our networks inherit the strength of these invariants. We capture these results in the following theorem, which also lists a few properties that graph angles determine (Cvetković, 1991) . Theorem 3. BasisNet universally approximates the graph angles α ij . The eigenvalues and graph angles (and thus BasisNet) can determine the number of length 3, 4, or 5 cycles, whether a graph is connected, and the number of length k closed walks from any vertex to itself. Relation to WL and message passing. In contrast to this result, message passing GNNs are not able to express any of these properties (see (Arvind et al., 2020; Garg et al., 2020) and Appendix H.3). Although spectral invariants are strong, Fürer (2010) shows that the eigenvalues and graph angles-as well as some strictly stronger spectral invariants-are not stronger than the 3-WL test (or, equivalently, the 2-Folklore-WL test). Using our networks for node positional encodings in message passing GNNs allows us to go beyond graph angles, as message passing can distinguish all trees, but there exist non-isomorphic trees with the same eigenvalues and graph angles (Fürer, 2010; Cvetković, 1988) .

3.3. SIGNNET AND BASISNET GENERALIZE EXISTING GRAPH POSITIONAL ENCODINGS

Many graph positional encodings have been proposed, without any clear criteria on which to choose for a particular task. We prove (in Appendix H.2) that our efficient SignNet and BasisNet can approximate many previously used graph positional encodings, as we unify these positional encodings by expressing them as either a spectral graph convolution matrix or the diagonal of a spectral graph convolution matrix. Proposition 4. SignNet and BasisNet can approximate node positional encodings based on heat kernels (Feldman et al., 2022) and random walks (Dwivedi et al., 2022) . BasisNet can approximate diffusion and p-step random walk relative positional encodings (Mialon et al., 2021) , and generalized PageRank and landing probability distance encodings (Li et al., 2020) .

4. EXPERIMENTS

We demonstrate the strength of our networks in various experiments. Appendix B shows simple pseudo-code and Figure 2 is a diagram detailing the use of SignNet as a node positional encoding.

4.1. GRAPH REGRESSION

We study the effectiveness of SignNet for learning positional encodings (PEs) from the eigenvectors of the graph Laplacian on the ZINC dataset of molecule graphs (Irwin et al., 2012) (using the subset of 12,000 graphs from Dwivedi et al. ( 2020)). We primarily consider three settings: 1) No positional encoding, 2) Laplacian PE (LapPE)-the k eigenvectors of the graph Laplacian with smallest eigenvalues are concatenated with existing node features, 3) SignNet positional featurespassing the eigenvectors through a SignNet and concatenating the output with node features. We parameterize SignNet by taking ϕ to be a GIN (Xu et al., 2019) and ρ to be an MLP. We sum over ϕ outputs before the MLP when handling variable numbers of eigenvectors, so the SignNet is of the form MLP l i=1 ϕ(v i ) + ϕ(-v i ) (see Appendix K.2 for further details). We consider four different base models that process the graph data and positional encodings: GatedGCN (Bresson & Laurent, 2017) , a Transformer with sparse attention only over neighbours (Kreuzer et al., 2021) , PNA (Corso et al., 2020) , and GIN (Xu et al., 2019) with edge features (i.e. GINE) (Hu et al., 2020b) . The total number of parameters of the SignNet and the base model is kept within a 500k budget. ). Although the resulting architecture is no longer sign invariant, ϕ still processes eigenvectors independently, meaning that only two invariances (±1) need be learned, significantly fewer than the 2 k total sign flip configurations. Accordingly, this non-sign-invariant learned positional encoding achieves a test MAE of 0.148, improving over the Laplacian PE (0.198) but falling short of the fully sign invariant SignNet (0.121). In all cases, using all available eigenvectors in SignNet significantly improves performance over using a fixed number of eigenvectors; this is notable as other works typically truncate to a fixed number of eigenvectors. Efficiency. These significant performance improvements from SignNet come with only a slightly higher computational cost. For example, GatedGCN with no PE takes about 8.2 seconds per training iteration on ZINC, while GatedGCN with 8 eigenvectors and SignNet takes about 10.6 seconds; Substructure counts (e.g. of cycles) and global graph properties (e.g. connectedness, diameter, radius) are important graph features that are known to be informative for problems in biology, chemistry, and social networks (Chen et al., 2020; Holland & Leinhardt, 1977) . Following the setting of Zhao et al. (2022) , we show that SignNet with Laplacian positional encodings boosts the ability of simple GNNs to count substructures and regress graph properties. We take a 4-layer GIN as the base model for all settings, and for SignNet we use GIN as ϕ and a Transformer as ρ to handle variable numbers of eigenvectors (see Appendix K.4 for details). As shown in Figure 3 , Laplacian PEs with sign-flip data augmentation improve performance for counting substructures but not for regressing graph properties, while Laplacian PEs processed by SignNet significantly boost performance on all tasks.

4.3. NEURAL FIELDS ON MANIFOLDS

Discrete approximations to the Laplace-Beltrami operator on manifolds have proven useful for processing data on surfaces, such as triangle meshes (Lévy, 2006) . Recently, Koestler et al. (2022) propose intrinsic neural fields, which use eigenfunctions of the Laplace-Beltrami operator as positional encodings for learning neural fields on manifolds. For generalized eigenfunctions v 1 , . . . , v k , at a point p on the surface, they parameterize functions f (p) = MLP(v 1 (p), . . . , v k (p)). As these eigenfunctions have sign ambiguity, we use our SignNet to parameterize f (p) = MLP( ρ( [ϕ(v i (p))+ ϕ(-v i (p))] i=1,...,k ) ) , with ρ and ϕ being MLPs. Table 3 shows our results for texture reconstruction experiments on all models from Koestler et al. (2022) . The total number of parameters in our SignNet-based model is kept below that of the original model. We see that the SignNet architecture improves over the original Intrinsic NF model and over other baselines -especially in the LPIPS metric, which is often a better perceptual metric than PSNR or DSSIM (Zhang et al., 2018a ). While we have not yet tested this, we believe that SignNet would allow even more improvement when learning over eigenfunctions of different models, as it could improve transfer and generalization. See Appendix D.1 for visualizations and Appendix K.5 for more details. 

5. RELATED WORK

In this section, we review selected related work. A more thorough review is deferred to Appendix E. Laplacian eigenvectors in GNNs. Various recently proposed methods in graph deep learning have directly used Laplacian eigenvectors as node positional encodings that are input to a message passing GNN (Dwivedi et al., 2020; 2022) , or some variant of a Transformer that is adapted to graphs (Dwivedi & Bresson, 2021; Kreuzer et al., 2021; Mialon et al., 2021; Dwivedi et al., 2022; Kim et al., 2022) . None of these methods address basis invariance, and they only partially address sign invariance for node positional encodings by randomly flipping eigenvector signs during training. Graph positional encodings. Other recent methods use positional encodings besides Laplacian eigenvectors. These include positional encodings based on random walks (Dwivedi et al., 2022; Mialon et al., 2021; Li et al., 2020) , diffusion kernels on graphs (Mialon et al., 2021; Feldman et al., 2022) , shortest paths (Ying et al., 2021; Li et al., 2020) , and unsupervised node embedding methods (Wang et al., 2022) . In particular, Wang et al. (2022) use Laplacian eigenvectors for relative positional encodings in an invariant way, but they focus on robustness, so they have stricter invariances that significantly reduce expressivity (see Appendix E.2 for more details). These previously used positional encodings are mostly ad-hoc, less general since they can be provably expressed by SignNet and BasisNet (see Section 3.3), and/or are expensive to compute (e.g., all pairs shortest paths).

6. CONCLUSION AND DISCUSSION

SignNet and BasisNet are novel architectures for processing eigenvectors that are invariant to sign flips and choices of eigenspace bases, respectively. Both architectures are provably universal under certain conditions. When used with Laplacian eigenvectors as inputs they provably go beyond spectral graph convolutions, spectral invariants, and a number of other graph positional encodings. 

A UNIVERSALITY FOR MULTIPLE SPACES

While the networks introduced in the Section 2.2 possess the desired invariances, it is not immediately obvious whether they are powerful enough to express all functions with these invariances. Under certain conditions, the universality of our architectures follows as a corollary of the following general decomposition result, which may enable construction of universal architectures for other invariances as well. Theorem 4 (Decomposition Theorem). Let X 1 , . . . , X k be topological spaces, and let G i be a group acting on X i for each i. We assume mild topological conditions on X i and G i hold. For any continuous f : X = X 1 × . . . × X k → R dout that is invariant to the action of G = G 1 × . . . × G k , there exists continuous ϕ i and a continuous ρ : Z ⊆ R a → R dout such that f (v 1 , . . . , v k ) = ρ(ϕ 1 (v 1 ), . . . , ϕ k (v k )). Furthermore: (1) each ϕ i can be taken to be invariant to G i , (2) the domain Z of ρ is compact if each X i is compact, (3) if X i = X j and G i = G j , then ϕ i can be taken to be equal to ϕ j . This result says that when a product of groups G acts on a product of spaces X , for invariance to the product group G it suffices to individually process each smaller group G i on X i and then aggregate the results. Along with the proof of Theorem 4, the mild topological assumptions are explained in Appendix G.1. The assumptions hold for sign invariance and basis invariance, when not enforcing permutation equivariance. By applying this theorem, we can prove universality of some instances of our networks: Corollary 1. Unconstrained-SignNet can represent any sign invariant function and Unconstrained-BasisNet can represent any basis invariant function. Expressive-BasisNet is a universal approximator of functions that are both basis invariant and permutation equivariant. This result shows that Unconstrained-SignNet, Unconstrained-BasisNet, and Expressive-BasisNet take the correct functional form for their respective invariances (proofs in Appendix G.2). Note that Expressive-BasisNet approximates all sign invariant functions as a special case, by treating all inputs as one dimensional eigenspaces. Further, note that we require Expressive-BasisNet's high order tensors to achieve universality when enforcing permutation equivariance. Universality under permutation equivariance is generally difficult to achieve when dealing with matrices with permutation symmetries (Maron et al., 2019; Keriven & Peyré, 2019) , but it may be possible that more efficient architectures can achieve it in our setting. Accompanying the decomposition result, we show a corresponding universal approximation result (proof in Appendix G.3). Similarly to Theorem 4, the problem of approximating G = G 1 × . . . × G k invariant functions is reduced to approximating several G i -invariant functions. 

B MORE DETAILS ON SIGNNET AND BASISNET

✓ × ✓ ✓ Universal ✓ × ✓ × ✓ Tractable ✓ ✓ ✓ ✓ × In Figure 2 , we show a diagram that describes how SignNet is used as a node positional encoding for a graph machine learning task. In Table 4 , we compare and contrast properties of the neural architectures that we introduce. In Figure 5 , we give pseudo-code of SignNet for learning node positional encodings with a GNN prediction model. 

B.1 GENERALIZATION BEYOND SYMMETRIC MATRICES

In the main paper, we assume that the eigenspaces come from a symmetric matrix. This holds for many cases of practical interest, as e.g. the Laplacian matrix of an undirected graph is symmetric. However, we may also want to process directed graphs, or other data that have associated nonsymmetric matrices. Our SignNet and BasisNet generalize in a straightforward way to handle nonsymmetric diagonalizable matrices, as we detail here. Let A ∈ R n×n be a matrix with a diagonalization A = V ΛV -1 , where Λ = Diag(λ 1 , . . . , λ n ) contains the eigenvalues λ i , and the columns of V = [v 1 . . . v n ] are eigenvectors. Suppose we want to learn a function on the eigenvectors v 1 , . . . , v k . Unlike in the symmetric matrix case, the eigenvectors are not necessarily orthonormal, and both the eigenvalues and eigenvectors can be complex. Real eigenvectors. First, we assume the eigenvectors v i are all real vectors in R n . We can take the eigenvectors to be real if A is symmetric, or if A has real eigenvalues (see Horn & Johnson (2012) Theorem 1.3.29). Also, suppose that we choose the real numbers R as our base field for the vector space in which eigenvectors lie. Note that for any scaling factor c ∈ R \ {0} and eigenvector v, we have that cv is an eigenvector of the same eigenvalue. If the eigenvalues are distinct, then the eigenvectors of the form cv are the only other eigenvectors in the same eigenspace as v. Thus, we want a function to be invariant to scalings: f (v 1 , . . . , v k ) = f (c 1 v 1 , . . . , c k v k ) c i ∈ R \ {0}. ( ) This can be handled by SignNet, by giving unit normalized vector inputs: f (v 1 , . . . , v k ) = ρ [ϕ(v i /∥v i ∥) + ϕ(-v i /∥v i ∥)] i=1,...,k . Now, say have bases of eigenspaces V 1 , . . . , V l with dimensions d 1 , . . . , d l . For a basis V i , we have that any other basis of the same space can be obtained as V i W for some W ∈ GL R (d i ), the set of real invertible matrices in R di×di . Indeed, the orthonormal projector for the space spanned by the columns of V i is given by V i (V ⊤ i V i ) -1 V ⊤ i . Thus, if Z ∈ R n×di is another basis for the column space of V i , we have that V i (V ⊤ i V i ) -1 V ⊤ i = Z(Z ⊤ Z) -1 Z ⊤ , so V i (V ⊤ i V i ) -1 V ⊤ i Z = Z(Z ⊤ Z) -1 Z ⊤ Z = Z, ( ) so let W = (V ⊤ i V i ) -1 V ⊤ i Z ∈ R di×di . Note that W is invertible, because it has inverse (Z ⊤ Z) -1 Z ⊤ V i , so indeed V i W = Z for W ∈ GL R (d i ). Thus, basis invariance in this case is of the form f (V 1 . . . , V l ) = f (V 1 W 1 , . . . , V l W l ) W i ∈ GL R (d i ). Note that the distinct eigenvalue invariance is a special case of this invariance, as G R (1) = R \ {0}. We can again achieve this basis invariance by using a BasisNet, where the inputs to the ϕ di are orthogonal projectors of the corresponding eigenspace: f (V 1 , . . . , V l ) = ρ ϕ di (V i (V ⊤ i V i ) -1 V ⊤ i ) i=1,...,l . Recall that if V i is an orthonormal basis, then the orthogonal projector is just V i V ⊤ i , so this is a direct generalization of BasisNet in the symmetric case. Complex eigenvectors. More generally, suppose V ∈ C n×n are complex eigenvectors, and we take the base field of the vector space to be C. The above arguments generalize to the complex case; in the case of distinct eigenvalues, we want f (v 1 , . . . , v k ) = f (c 1 v 1 , . . . , c k v k ) c i ∈ C \ {0}. However, this symmetry can not be as easily reduced to a unit normalization and a discrete sign invariance, as it can be in the real case. Nonetheless, the basis invariant architecture directly generalizes, so we can handle the case of distinct eigenvalues by a more general basis invariant architecture as well. The basis invariance is f (V 1 , . . . , V l ) = f (V 1 W 1 , . . . , V l W l ) W i ∈ GL C (d i ). ( ) The orthogonal projector of the image of V i is V i (V * i V i ) -1 V * i , where there are now conjugate transposes replacing the transposes. Thus, BasisNet takes the form: f (V 1 , . . . , V l ) = ρ ϕ di (V i (V * i V i ) -1 V * i ) i=1,...,l .

B.2 BROADER IMPACTS

We believe that our models and future sign invariant or basis invariant networks could be useful in a wide variety of applications. As eigenvectors arise in many domains, it is difficult to predict the uses of these models. We test on several molecular property prediction tasks, which have the potential for much positive impact, such as in drug discovery (Stokes et al., 2020) . However, recent work has found that the same models that we use for finding beneficial drugs can also be used to design biochemical weapons (Urbina et al., 2022) . Another major application of graph machine learning is in social network analysis, where positive (e.g. malicious node detection (Pandit et al., 2007) ) and negative (e.g. deanonymization (Narayanan & Shmatikov, 2009 )) uses of machine learning are possible. Even if there is no negative intent, bias in learned models can differentially impact particular subgroups of people. Thus, academia, industry, and policy makers must be aware of such potential negative uses, and work towards reducing the likelihood of them.

B.3 COMPLEXITY OF SIGNNET AND BASISNET

Here, we give a simplified but intuitive analysis of the complexity of SignNet and BasisNet. Suppose we have a graph of n nodes, with k eigenvectors v 1 , . . . , v k . A standard GNN that naively inputs the eigenvectors as node features forms tensors of size O(nk + nd), where d is the hidden dimension of the learned node features. SignNet forms tensors of size O(nkd), where d is the hidden dimension or output dimension of ϕ. This is because for each of the 2k eigenvectors v i and -v i , we must put it through our ϕ network. Similarly, BasisNet forms tensors of size O(n 2 ld), where l is the number of eigenspaces and d is the hidden dimension or output dimension of the ϕ di . Thus, there is an extra multiplicative factor of n when compared with SignNet. If we instead use p-IGNs with order p tensors, then the complexity is O(n p ld). Moreover, note that a naive version of BasisNet requires a separate IGN to be learned for each multiplicity d i . This may be intractable for datasets with eigenspaces of many sizes. One way to get around this would be to parameterize a single IGN, and define ϕ di (V i V ⊤ i ) = IGN(V i V ⊤ i , d i ) ; in other words, we simply input the dimension to the shared IGN. We have not tested the learning capabilities of this more efficient model in this work, but it could be promising for future work.

B.4 OTHER ARCHITECTURAL NOTES

There are several alternatives available in the design of SignNet and BasisNet that we now discuss. Our approach, as outlined in Figure 2 , processes the eigenvectors independently to compute learned positional encodings and then uses these learned positional encodings along with the node features X in a final base model (say, a GNN) to get a prediction. Another possibility is to process eigenvectors and node features jointly. One way to do this is to add X as input to ϕ, so for instance SignNet would include ϕ(v i , X) + ϕ(-v i , X). However, this requires processing X 2k times with ϕ, which may be inefficient. Another possibility to parameterize a sign invariant architecture is through taking elementwise absolute values of eigenvectors, and then composing with arbitrary functions, e.g. MLP(|v 1 |, . . . , |v k |), where the MLP acts independently on each node. Empirically, this often does not work well (see our results on ZINC as well as those of Dwivedi et al. ( 2020)). Intuitively, these elementwise absolute values remove distance information, since for instance nodes i and j in which v (i) 2 = -v (j) 2 are typically far in the graph, but they will have the same value in this eigenvector under the absolute value mapping. Nonetheless, if the ϕ in SignNet is taken to be an elementwise function, meaning ϕ : R n → R n×d satisfies ϕ(v) i = ψ(v i ) for some ψ applied independently to each node, then SignNet is equivalent in expressiveness to MLP(|v 1 |, . . . , |v k |), where the MLP acts independently on each node.

C MORE ON EIGENVALUE MULTIPLICITIES

In this section, we study the properties of eigenvalues and eigenvectors computed by numerical algorithms on real-world data.

C.1 SIGN AND BASIS AMBIGUITIES IN NUMERICAL EIGENSOLVERS

When processing real-world data, we use eigenvectors that are computed by numerical algorithms. These algorithms return specific eigenvectors for each eigenspace, so there is some choice of sign or basis of each eigenspace. The general symmetric matrix eigensolvers numpy.linalg.eigh and scipy.linalg.eigh both call LAPACK routines. They both proceed as follows: for a symmetric matrix A, they first decompose it as A = QT Q ⊤ for orthogonal Q and tridiagonal T , then they compute the eigendecomposition of T = W ΛW ⊤ , so the eigendecomposition of A is A = (QW )Λ(W ⊤ Q ⊤ ). There are multiple ambiguities here: for diagonal sign matrices S = Diag(s 1 , . . . , s n ) and S ′ = Diag(s ′ 1 , . . . , s ′ n ), where s i , s ′ i ∈ {-1, 1}, we have that A = QS(ST S)SQ ⊤ is also a valid tridiagonalization, as QS is still orthogonal, SS = I, and ST S is still tridiagonal. Also, T = (W S ′ )Λ(S ′ W ⊤ ) is a valid eigendecomposition of T , as W S ′ is still orthogonal. In practice, we find that the general symmetric matrix eigensolvers numpy.linalg.eigh and scipy.linalg.eigh differ between frameworks but are consistent with the same framework. More specifically, for a symmetric matrix A, we find that the eigenvectors computed with the default settings in numpy tend to differ by a choice of sign or basis from those that are computed with the default settings in scipy. On the other hand, the called LAPACK routines are deterministic, so the eigenvectors returned by numpy are the same in each call, and the eigenvectors returned by scipy are likewise the same in each call. Eigensolvers for sparse symmetric matrices like scipy.linalg.eigsh are required for large scale problems. This function calls ARPACK, which uses an iterative method that starts with a randomly sampled initial vector. Due to this stochasticity, the sign and basis of eigenvectors returned differs between each call. Bro et al. (2008) develop a data-dependent method to choose signs for each singular vector of a singular value decomposition. Still, in the worst case the signs chosen will be arbitrary, and they do not handle basis ambiguities in higher dimensional eigenspaces. Other works have made choices of sign, such as by picking the sign so that the eigenvector's entries are in the largest lexicographic order (Tam & Dunson, 2022) . This choice of sign may work poorly for learning on graphs, as it is sensitive to permutations on nodes. For some graph regression experiments in Section 4.1, we try a choice of sign that is permutation invariant, but we find it to work poorly. Here, we investigate the normalized Laplacian eigenspace statistics of real-world graph data. For any graph that has distinct Laplacian eigenvalues, only sign invariance is required in processing eigenvectors. However, we find that graph data tends to have higher multiplicity eigenvalues, so basis invariance would be required for learning symmetry-respecting functions on eigenvectors. Indeed, we show statistics for multi-graph datasets in Table 5 and for single-graph datasets with more nodes per graph in Table 6 . For multi-graph datasets, we consider : • Molecule graphs: ZINC (Irwin et al., 2012; Dwivedi et al., 2020) , ogbg-molhiv (Wu et al., 2018; Hu et al., 2020a) • Social networks: IMDB-M, COLLAB (Yanardag & Vishwanathan, 2015; Morris et al., 2020a) , • Bioinformatics graphs: PROTEINS (Morris et al., 2020a) • Computer vision graphs: COIL-DEL (Riesen & Bunke, 2008; Morris et al., 2020a) . For single-graph datasets, we consider: • The 32 × 32 image grid as in Section J.3 • Citation networks: Cora, Citeseer (Sen et al., 2008) • Co-purchasing graphs with Amazon Photo (McAuley et al., 2015; Shchur et al., 2018) . We see that these datasets all contain higher multiplicity eigenspaces, so sign invariance is insufficient for fully respecting symmetries. The majority of graphs in each multi-graph dataset besides COIL-DEL contain higher multiplicity eigenspaces. Also, the dimension of these eigenspaces can be quite large compared to the size of the graphs in the dataset. The single-graph datasets have a large proportion of their eigenvectors belonging to higher dimensional eigenspaces. Thus, basis invariance may play a large role in processing spectral information from these graph datasets.

C.3 RELATIONSHIP TO GRAPH AUTOMORPHISMS

Higher multiplicity eigenspaces are related to automorphism symmetries in graphs. For an adjacency matrix A, the permutation matrix P is an automorphism of the graph associated to A if P AP ⊤ = A. If P is an automorphism, then for any eigenvector v of A with eigenvalue λ, we have AP v = P AP ⊤ P v = P Av = P λv = λP v, so P v is an eigenvector of A with the same eigenvalue λ. If P v and v are linearly independent, then λ has a higher dimensional eigenspace. Thus, under certain additional conditions, automorphism symmetries of graphs lead to repeated eigenvalues (Sachs & Stiebitz, 1983; Teranishi, 2009) .

C.4 MULTIPLICITIES IN RANDOM GRAPHS

It is known that almost all random graphs under the Erdős-Renyi model have no repeated eigenvalues in the infinite number of nodes limit (Tao & Vu, 2017) . Likewise, almost all random graphs under the Erdős-Renyi model are asymmetric in the sense of having no nontrivial automorphism symmetries (Erdos & Rényi, 1963) . These results contrast sharply with the high eigenvalue multiplicities that we see in real-world data in Section C.2. Likewise, many types of real-world graph data have been found to possess nontrivial automorphism symmetries (Ball & Geyer-Schulz, 2018 ). This demonstrates a potential downside of using random graph models to study real-world data: the eigenspace dimensions and automorphism symmetries of random graphs may not agree with those of real-world data. In Figure 6 , we plot the eigenvectors of the cotangent Laplacian on a cat model, as well as the first principal component of the corresponding learned ϕ(v) + ϕ(-v) from our SignNet model that was trained on the texture reconstruction task. Interestingly, this portion of our SignNet encodes bilateral symmetry; for instance, while some eigenvectors differ between left feet and right feet, this portion of our SignNet gives similar values for the left and right feet. This is useful for the texture reconstruction task, as the texture regression target has bilateral symmetry. We also show principal components of outputs for the full SignNet model in Figure 7 . This is not as interpretable, as the outputs are high frequency and appear to be close to the texture that is the regression target. If instead we trained the network on a task involving eigenvectors of multiple models, then we may expect the SignNet to learn more structurally interpretable mappings (as in the case of the molecule tasks).

D.2 MOLECULE VISUALIZATION

To better understand SignNet, in Figure 9 we visualize the learned positional encodings of a SignNet with ϕ = GIN, ρ = MLP (with a summation to handle variable eigenvector numbers) trained on ZINC as in Section 4.1. SignNet learns interesting structural information such as cut nodes (PC 3) and appendage atoms (PC 2) that qualitatively differ from any single eigenvector of the graph. For this visualization we use a SignNet trained with a GatedGCN base model on ZINC, as in Section 4.1. This SignNet uses GIN as ϕ and ρ as an MLP (with a sum before it to handle variable numbers of eigenvectors), and takes in all eigenvectors of each graph. See Figure 8 for all of the eigenvectors of fluorescein.

E MORE RELATED WORK E.1 GRAPH POSITIONAL ENCODINGS

Various graph positional encodings have been proposed, which have been motivated for increasing expressive power or practical performance of graph neural networks, and for generalizing Transformers to graphs. Positional encodings are related to so-called position-aware network embeddings (Chami et al., 2020) , which capture distances between nodes in graphs. These include network embedding methods like Deepwalk (Perozzi et al., 2014 ) and node2vec (Grover & Leskovec, 2016) , which have been recently integrated into GNNs that respect their invariances by Wang et al. (2022) While positional encodings in sequences as used for Transformers (Vaswani et al., 2017) 2021) develop higher-order transformers (that generalize invariant graph networks), which interestingly perform well on graph regression using sparse higher-order transformers without positional encodings.

E.2 EIGENVECTOR SYMMETRIES IN GRAPH REPRESENTATION LEARNING

Many works that attempt to respect the invariances of eigenvectors solely focus on sign invariance (by using data augmentation) (Dwivedi et al., 2020; Dwivedi & Bresson, 2021; Dwivedi et al., 2022; Kreuzer et al., 2021) . This may be reasonable for continuous data, where eigenvalues of associated matrices may be usually distinct and separated (e.g. Puny et al. (2022) finds that this empirically holds for covariance matrices of n-body problems). However, discrete graph Laplacians are known to have higher multiplicity eigenvalues in many cases, and in Appendix C.2 we find this to be true in various types of real-world graph data. Graphs without higher multiplicity eigenspaces are easier to deal with; in fact, graph isomorphism can be tested in polynomial time on graphs of bounded multiplicity for adjacency matrix eigenvalues (Babai et al., 1982; Leighton & l. Miller, 1979) , with a time complexity that is lower for graphs with lower maximum multiplicities. A recent work of Wang et al. (2022) proposes full orthogonal group invariance for functions that process positional encodings. In particular, for positional encodings Z ∈ R n×k , they parameterize functions f (Z) such that f (Z) = f (ZQ) for all Q ∈ O(k). This indeed makes sense for network embeddings like node2vec (Grover & Leskovec, 2016) , as their objective functions are based on inner products and are thus orthogonally invariant. While they prove stability results when enforcing full orthogonal invariance for eigenvectors, this is a very strict constraint compared to our basis invariance. For instance, when k = n and all eigenvectors are used in V , the condition f (V ) = f (V Q) implies that f is a constant function on orthogonal matrices, since any orthogonal matrix W can be obtained as W = V Q for Q = V ⊤ W ∈ O(n). In other words, for bases of eigenspaces V 1 , . . . , V l and V = [V 1 . . . V l ], Wang et al. (2022) enforces V Q ∼ = V , while we enforce V Diag(Q 1 , . . . , Q l ) ∼ = V . While the columns of V Diag(Q 1 , . . . , Q l ) are still eigenvectors, the columns of V Q generally are not.

E.3 GRAPH SPECTRA AND LEARNING ON GRAPHS

More generally, graph spectra are widely used in analyzing graphs, and spectral graph theory (Chung, 1997) studies the connection between graph properties and graph spectra. Different graph kernels have been defined based on graph spectra, which use robust and discriminative notions of generalized spectral distance (Verma & Zhang, 2017) , the spectral density of states (Huang et al., 2021) , random walk return probabilities (Zhang et al., 2018b) , or the trace of the heat kernel (Tsitsulin et al., 2018) . Graph signal processing relies on spectral operations to define Fourier transforms, frequencies, convolutions, and other useful concepts for processing data on graphs (Ortega et al., 2018) . The closely related spectral graph neural networks (Wu et al., 2020; Balcilar et al., 2020) parameterize neural architectures that are based on similar spectral operations.

F DEFINITIONS, NOTATION, AND BACKGROUND

F.1 BASIC TOPOLOGY AND ALGEBRA DEFINITIONS We will use some basic topology and algebra for our theoretical results. A topological space (X , τ ) is a set X along with a family of subsets τ ⊆ 2 X satisfying certain properties, which gives useful notions like continuity and compactness. From now on, we will omit mention of τ , and refer to a topological space as the set X itself. For topological spaces X and Y, we write X ∼ = Y and say that X is homeomorphic to Y if there exists a continuous bijection with continuous inverse from X to Y. We will say X = Y if the underlying sets and topologies are equal as sets (we will often use this notion of equality for simplicity, even though it can generally be substituted with homeomorphism). For a function f : X → Y between topological spaces X and Y, the image imf is the set of values that f takes, imf = {f (x) : x ∈ X }. This is also denoted f (X ). A function f : X → Y is called a topological embedding if it is a homeomorphism from X to its image. A group G is a set along with a multiplication operation G × G → G, such that multiplication is associative, there is a multiplicative identity e ∈ G, and each g ∈ G has a multiplicative inverse g -1 . A topological group is a group that is also a topological space such that the multiplication and inverse operations are continuous. A group G may act on a set X by a function • : G × X → X . We usually denote g • x as gx. A topological group is said to act continuously on a topological space X if • is continuous. For any group G and topological space X , we define the coset Gx = {gx : g ∈ G}, which can be viewed as an equivalance class of elements that can be transformed from one to another by a group element. The quotient space X /G = {Gx : x ∈ X } is the set of all such equivalence classes, with a topology induced by that of X . The quotient map π : X → X /G is a surjective continuous map that sends x to its coset, π(x) = Gx. For x ∈ R d , ∥x∥ 2 denotes the standard Euclidean norm. By the ∞ norm of functions f : Z → R d from a compact Z to a Euclidean space R d , we mean ∥f ∥ ∞ = sup z∈Z ∥f (z)∥ 2 . F.2 BACKGROUND ON EIGENSPACE INVARIANCES Let V = [v 1 . . . v d ] and W = [w 1 . . . w d ] ∈ R n×d be two orthonormal bases for the same d dimensional subspace of R n . Since V and W span the same space, their orthogonal projectors are the same, so V V ⊤ = W W ⊤ . Also, since V and W have orthonormal columns, we have V ⊤ V = W ⊤ W = I ∈ R d×d . Define Q = V ⊤ W . Then Q is orthogonal because Q ⊤ Q = W ⊤ V V ⊤ W = W ⊤ W W ⊤ W = I Moreover, we have that V Q = V V ⊤ W = W W ⊤ W = W 20) Thus, for any orthonormal bases V and W of the same subspace, there exists an orthogonal Q ∈ O(d) such that V Q = W . For another perspective on this, define the Grassmannian Gr(d, n) as the smooth manifold consisting of all d dimensional subspaces of R n . Further define the Stiefel manifold St(d, n) as the set of all orthonormal tuples (d, n) . This implies that any O(d) invariant function on St(d, n) can be viewed as a function on subspaces. See e.g. Gallier & Quaintance (2020) Chapter 5 for more information on this. We will use this relationship in our proofs of universal representation. [v 1 . . . v d ] ∈ R n×d of d vectors in R n . Letting O(d) act by right multiplication, it holds that St(d, n)/O(d) ∼ = Gr When we consider permutation invariance or equivariance, the permutation acts on dimensions of size n. Then a tensor X ∈ R n k ×d is called an order k tensor with respect to this permutation symmetry, where order 0 are called scalars, order 1 tensors are called vectors, and order 2 tensors are called matrices. Note that this does not depend on d; in this work, we only ever consider vectors and scalars with respect to the O(d) action.

G PROOFS OF UNIVERSALITY

We begin by proving the two propositions for the single subspace case from Section 2.1. Proposition 1. A continuous function h : R n → R dout is sign invariant if and only if h(v) = ϕ(v) + ϕ(-v) for some continuous ϕ : R n → R dout . A continuous h : R n → R n is sign invariant and permutation equivariant if and only if (3) holds for a continuous permutation equivariant ϕ : R n → R n . Proof. If h(v) = ϕ(v) + ϕ(-v), then h is obviously sign invariant. On the other hand, if h is sign invariant, then letting ϕ(v) = h(v)/2 gives that h(v) = ϕ(v) + ϕ(-v), and ϕ is of course continuous. If h(v) = ϕ(v) + ϕ(-v) for a permutation equivariant ϕ, then h(-P v) = ϕ(-P v) + ϕ(P v) = P ϕ(-v) + P ϕ(v) = P (ϕ(v) + ϕ(-v)) = P h(v) , so h is permutation equivariant and sign invariant. If h is permutation equivariant and sign invariant, then define ϕ(v) = h(v)/2 again; it is clear that ϕ is continuous and permutation equivariant. Proposition 2. Any continuous, O(d) invariant h : R n×d → R dout is of the form h(V ) = ϕ(V V ⊤ ) for a continuous ϕ. For a compact domain Z ⊆ R n×d , maps of the form V → IGN(V V ⊤ ) universally approximate continuous functions h : Z ⊆ R n×d → R n that are O(d) invariant and permutation equivariant. Proof. The case without permutation equivariance holds by the First Fundamental Theorem of O(d) (Lemma 2). For the permutation equivariant case, let Z ′ = {V V ⊤ : V ∈ Z} and let ϵ > 0. Note that Z ′ is compact, as it is the continuous image of a compact set. Since h is O(d) invariant, the first fundamental theorem of O(d) shows that there exists a continuous function ϕ : Z ′ ⊆ R n×n → R n such that h(V ) = ϕ(V V ⊤ ). Since h is permutation equivariant, for any permutation matrix P we have that h(P V ) = P • h(V ) ϕ(P V V ⊤ P ⊤ ) = P • ϕ(V V ⊤ ), so ϕ is a continuous permutation equivariant function from matrices to vectors. Then note that Keriven & Peyré (2019) show that invariant graph networks (of generally high tensor order in hidden layers) universally approximate continuous permutation equivariant functions from matrices to vectors on compact sets of matrices. Thus, an IGN can ϵ-approximate ϕ, and hence Here, we give the formal statement of Theorem 4, which provides the necessary topological assumptions for the theorem to hold. In particular, we only require the G i be a topological group that acts continuously on X i for each i, and that there exists a topological embedding of each quotient space into some Euclidean space. That the group action is continuous is a very mild assumption, and it holds for any finite or compact matrix group, which all of the invariances we consider in this paper can be represented as. V → IGN(V V ⊤ ) can ϵ-approximate h. G.1 PROOF OF DECOMPOSITION THEOREM X 1 × . . . × X k (X 1 /G 1 ) × . . . × (X k /G k ) R dout Z = im(ψ) ⊆ R a π = π 1 × . . . π k f = f • π ϕ = ψ • π ψ = ψ 1 × . . . × ψ k f ψ -1 ρ = f • ψ -1 A topological embedding of the quotient space into a Euclidean space is desired, as we know how to parameterize neural networks with Euclidean outputs and inputs, whereas dealing with a quotient space is generally difficult. Many different conditions can guarantee existence of such an embedding. For instance, if the quotient space is a smooth manifold, then the Whitney Embedding Theorem

G.2.1 SIGN INVARIANT UNIVERSAL REPRESENTATION

Recall that S n-1 denotes the unit sphere in R n . As we normalize eigenvectors to unit norm, the domain of our functions on k eigenvectors are on the compact space (S n-1 ) k . Corollary 2 (Universal Representation for SignNet). A continuous function f : (S n-1 ) k → R dout is sign invariant, i.e. f (s 1 v 1 , . . . , s k v k ) = f (v 1 , . . . , v k ) for any s i ∈ {-1, 1}, if and only if there exists a continuous ϕ : R n → R 2n-2 and a continuous ρ : R (2n-2)k → R dout such that f (v 1 , . . . , v k ) = ρ [ϕ(v i ) + ϕ(-v i )] k i=1 . Proof. It can be directly seen that any f of the above form is sign invariant. Thus, we show that any sign invariant f can be expressed in the above form. First, we show that we can apply the general Theorem 4. The group G i = {1, -1} acts continuously and satisfies that S n-1 /{1, -1} = RP n-1 , where RP n-1 is the real projective space of dimension n -1. Since RP n-1 is a smooth manifold of dimension n -1, Whitney's embedding theorem states that there exists a (smooth) topological embedding ψ i : RP n-1 → R 2n-2 (Lemma 5). Thus, we can apply the general theorem to see that f = ρ • φk for some continuous ρ and φk . Note that each φi = φ is the same, as each X i = S n-1 and G i = {1, -1} is the same. Also, Theorem 4 says that we may assume that φ is sign invariant, so φ(x) = φ(-x). Letting ϕ(x) = φ(x)/2, we are done with the proof.

G.2.2 SIGN INVARIANT UNIVERSAL REPRESENTATION WITH EXTRA FEATURES

Recall that we may want our sign invariant functions to process other data besides eigenvectors, such as eigenvalues or node features associated to a graph. Here, we show universal representation for when we have this other data that does not possess sign symmetry. The proof is a simple extension of Corollary 2, but we provide the technical details for completeness. Corollary 3 (Universal Representation for SignNet with features). For a compact space of features Ω ⊆ R d , let f (v 1 , . . . , v k , x 1 , . . . , x k ) be a continuous function f : (S n-1 × Ω) k → R dout . Then f is sign invariant for the inputs on the sphere, i.e. f (s 1 v 1 , . . . , s k v k , x 1 , . . . , x k ) = f (v 1 , . . . , v k , x 1 , . . . , x k ) s i ∈ {1, -1}, if and only if there exists a continuous ψ : R n+d → R 2n-2+d and a continuous ρ : R (2n-2+d)k → R dout such that f (v 1 , . . . , v k ) = ρ (ϕ(v 1 , x 1 ) + ϕ(-v 1 , x 1 ), . . . , ϕ(v k , x k ) + ϕ(-v k , x k )) . Proof. Once again, the sign invariance of any f in the above form is clear. We follow very similar steps to the proof of Corollary 2 to show that we may apply Theorem 4. We can view Ω as a quotient space, after quotienting by the trivial group that does nothing, Ω ∼ = Ω/{1}. The corresponding quotient map is id Ω , the identity map. Also, Ω trivially topologically embeds in R d by the inclusion map. As G i = {-1, 1} × {1} acts continuously, by Lemma 3 we have that (S n-1 × Ω)/({1, -1} × {1}) ∼ = (S n-1 /{1, -1}) × (Ω/{1}) ∼ = RP n-1 × Ω, with corresponding quotient map π × id Ω , where π is the quotient map to RP n-1 . Letting ψ be the embedding of RP n-1 → R 2n-2 guaranteed by Whitney's embedding theorem (Lemma 5), we have that ψ = ψ × id Ω is an embedding of RP n-1 × Ω → R 2n-2+d . Thus, we can apply Theorem 4 to write f = ρ • φk for φ = ( ψ × id Ω ) • (π × id Ω ), so φ(v i , x i ) = ( ψ(v i ), x i ), where φ(v i , x i ) = φ(-v i , x i ). Letting ϕ(v i , x i ) = φ(v i , x i )/2, we are done. (Segol & Lipman, 2019 ) (note that we can pass λ i as a vector in R n by instead passing λ i 1, where 1 is the all ones vector). Then ρ = n i=1 is a linear permutation equivariant operation that can be exactly expressed by DeepSets, so the total error is within ε. The same argument applies when θ i = h(λ i ) for some continuous function h. For the basis invariant case, consider a parametric spectral graph convolution f (V, Λ, X) = n i=1 h(λ i )v i v ⊤ i X. Note that if the eigenspace bases are V 1 , . . . , V l with eigenvalues µ 1 , . . . , µ l , we can write the f (V, Λ, X) = l i=1 h(µ j )V j V ⊤ j X. Again, we will let ρ = l i=1 be a sum function, which can be expressed exactly by DeepSets. Thus, it suffices to show that h(µ j )V j V ⊤ j X can be ϵ/n approximated by a 2-IGN (i.e. an IGN that only uses vectors and matrices). Note that since h is continuous, we can use an elementwise MLP (which IGNs can learn) to approximate f 1 (µ11 ⊤ , V V ⊤ , X) = (h(µ)11 ⊤ , V V ⊤ , X) to arbitrary precision (note that we represent the eigenvalue µ as a constant matrix µ11 ⊤ ). Also, since a 2-IGN can learn matrix vector multiplication (Cai & Wang (2022) Lemma 10), we can approximate f 2 (h(µ)11 ⊤ , V V ⊤ , X) = (h(µ)11 ⊤ , V V ⊤ X), as V i V ⊤ i ∈ R n 2 is a matrix and X ∈ R n×d feat is a vector with respect to permutation symmetries. Finally, we use an elementwise MLP to approximate the scalar-vector multiplication f 3 (h(µ)11 ⊤ , V V ⊤ , X) = h(µ)V V ⊤ X. Since f 3 • f 2 • f 1 (µ11 ⊤ , V V ⊤ , X) = h(µ)V V ⊤ X, and since 2-IGNs universally approximate each f i , applying Lemma 6 shows that a 2-IGN can approximate h(µ)V V ⊤ X to ϵ/n accuracy, so we are done. Since Expressive-BasisNet is stronger than BasisNet, it can also universally approximate these functions. From the proof, we can see that SignNet and BasisNet need only learn simple functions for the ρ and ϕ when h is simple, or when the filter is non-parametric and we need only learn θ i . Xu et al. (2020) propose the principle of algorithmic alignment, and show that if separate modules of a neural network each need only learn simple functions (that is, functions that are well-approximated by low-order polynomials with small coefficients), then the network may be more sample efficient. If we do not require permutation equivariance, and parameterize SignNet and BasisNet with simple MLPs, then algorithmic alignment may suggest that our models are sample efficient. Indeed, ρ = is a simple linear function with coefficients 1, and ϕ(V, λ, X) = h(λ)V V ⊤ X is quadratic in V and linear in X, so it is simple if h is simple. Proposition 3. There exist infinitely many pairs of non-isomorphic graphs that SignNet and BasisNet can distinguish, but spectral graph convolutions or spectral GNNs cannot distinguish. Proof. The idea is as follows: we will take graphs G and give them the node feature matrix X G = D 1/2 1, i.e. each node has as feature the square root of its degree. Then any spectral graph convolution (or, the first layer of any spectral GNN) will map V Diag(θ)V ⊤ X to something that only depends on the degree sequence and number of nodes. Thus, any spectral graph convolution or spectral GNN will have the same output (up to permutation) for any such graphs G with node features X G and the same number of nodes and same degree sequence. On the other hand, SignNet and BasisNet can distinguish between infinitely many pairs of graphs (G (1) , G (2) ) with node features (X G (1) , X G (2) ) and the same number of nodes and degree sequence; this is because SignNet and BasisNet can tell when a graph is bipartite. For each n ≥ 5, we will define G (1) and G (2) as connected graphs with n nodes, with the same degree sequence. Also, we define G (1) to have node features X (1) i = d (1) i , where d (1) i is the degree of node i in G (1) , and similarly G (2) has node features X (2) i = d (2) i . Now, note that X (1) is an eigenvector of the normalized Laplacian of G (1) , and it has eigenvalue 0. As we take the eigenvectors to be orthonormal (since the normalized Laplacian is symmetric), for any spectral graph convolution we have that n i=1 θ i v i v ⊤ i X (1) = θ 1 v 1 v ⊤ 1 X (1) = θ 1 D 1/2 1 1(D 1/2 1 1) ⊤ D 1/2 1 1 = θ 1 n j=1 (d (1) j )D 1/2 1 1. ( ) Where D 1 is the diagonal degree matrix of G (1) . Likewise, any spectral graph convolution outputs θ 1 j (d 2) . Since D 1 and D 2 are the same up to a permutation, we have that any spectral graph convolution has the same output for G (1) and G (2) , up to a permutation. In fact, this (2) j )D 1/2 2 1 for G w 1 w 2 w 3 w 4 w 5 G (1) v 1 v 2 v 3 v 4 v 5 G (2) Figure 11 : Illustration of our constructed G (1) and G (2) for n = 5, as used in the proof of Proposition 3. w 1 w 2 w 3 w 4 w 5 w 6 G (1) v 1 v 2 v 3 v 4 v 5 v 6 G (2) Figure 12 : Illustration of our constructed G (1) and G (2) for n = 6, as used in the proof of Proposition 3. also holds for spectral GNNs, as the first layer will always have the same output (up to a permutation) on G (1) and G (2) , so the latter layers will also have the same output up to a permutation. Now, we concretely define G (1) and G (2) . This is illustrated in Figure 11 and Figure 12 . For n = 5, let G (1) contain a triangle with nodes w 1 , w 2 , w 3 , and have a path of length 2 coming out of one of the nodes in the triangle, say w 1 connects to w 4 , and w 4 connects to w 5 . This is not bipartite, as there is a triangle. Let G (2) be a bipartite graph that has 2 nodes on the left (v 1 , v 2 ) and 3 nodes on the right (v 3 , v 4 , v 5 ). Connect v 1 with all nodes on the right, and connect v 2 with v 3 and v 4 . Note that both G (1) and G (2) have the same number of nodes and the same degree sequence {3, 2, 2, 2, 1}. Thus, spectral graph convolutions or spectral GNNs cannot distinguish them. However, SignNet and BasisNet can distinguish them, as they can tell whether a graph is bipartite by checking the highest eigenvalue of the normalized Laplacian. This is because the multiplicity of the eigenvalue 2 is the number of bipartite components. In particular, SignNet can approximate the function ϕ(v i , λ i , X) = λ i and ρ ≈ max n i=1 . Likewise, BasisNet can approximate the function ϕ di (V i V ⊤ i , λ i ) = λ i and ρ ≈ max l i=1 . This in fact gives an infinite family of graphs that SignNet / BasisNet can distinguish, but spectral graph convolutions or spectral graph GNNs cannot. To see why, suppose we have G (1) and G (2) for some n ≥ 5. Then we construct a pair of graphs on n + 1 nodes with the same degree sequence. To do this, we add another node to the path of G (1) , thus giving it degree sequence {3, 2, . . . , 2, 1}. For G (2) , we add a node v n+1 to the side that v n is not contained on (e.g. for n = 5, we add v 6 to the left side, as v 5 was on the right), then connect v n to v n+1 to also give a degree sequence {3, 2, . . . , 2, 1}. Note that the non-bipartiteness of G (1) and bipartiteness of G (2) are preserved.

H.2 EXISTING POSITIONAL ENCODINGS

Here, we show that our SignNets and BasisNets universally approximate various types of existing graph positional encodings. The key is to show that these positional encodings are related to spectral graph convolution matrices and the diagonals of these matrices, and to show that our networks can approximate these matrices and diagonals. Proposition 5. If the eigenvalues take values in a compact set, SignNets and BasisNets universally approximate the diagonal of any spectral graph convolution matrix f ( V, Λ) = diag n i=1 h(λ i )v i v ⊤ i . BasisNets can additionally universally approximate any spectral graph convolution matrix f ( V, Λ) = n i=1 h(λ i )v i v ⊤ i . Proof. Note that the v i come from a compact set as they are of unit norm. The λ i are from a compact set by assumption; this assumption holds for the normalized Laplacian, as λ i ∈ [0, 2]. Also, as diag is linear, the spectral graph convolution diagonal can be written n i=1 h(λ i )diag(v i v ⊤ i ). Let ϵ > 0. For SignNet, let ρ = n i=1 , which can be exactly expressed as it is a permutation equivariant linear operation from vectors to vectors. Then ϕ(v i , λ i ) can approximate the function λ i diag(v i v ⊤ i ) to arbitrary precision, as it is a permutation equivariant function from vectors to vectors (Segol & Lipman, 2019) . Thus, letting ϕ approximate the function to ϵ/n accuracy, SignNet can approximate f to ϵ accuracy. Let l be the number of eigenspaces V 1 , . . . , V l , so f (V, Λ) = l i=1 h(µ i )V i V ⊤ i . For BasisNet, we need only show that it can approximate the spectral graph convolution matrix to ϵ/l accuracy, as a 2-IGN can exactly express the diag function in each ϕ di , since it is a linear permutation equivariant function from matrices to vectors. A 2-IGN can universally approximate the function f 1 (µ i , V i V ⊤ i ) = (h(µ i ), V i V ⊤ i ), as it can express any elementwise MLP. Also, a 2-IGN can universally approximate the scalar-matrix multiplication f 2 (h(µ i ), V i V ⊤ i ) = h(µ i )V i V ⊤ i by another elementwise MLP. Since h(µ i )V i V ⊤ i = f 2 • f 1 (µ i , V i V ⊤ i ), Lemma 6 shows that a single 2-IGN can approximate this composition to ϵ/l accuracy, so we are done. Proposition 4. SignNet and BasisNet can approximate node positional encodings based on heat kernels (Feldman et al., 2022) and random walks (Dwivedi et al., 2022) . BasisNet can approximate diffusion and p-step random walk relative positional encodings (Mialon et al., 2021) , and generalized PageRank and landing probability distance encodings (Li et al., 2020) . Proof. We will show that we can apply the above Proposition 5, by showing that all of these positional encodings are spectral graph convolutions. The heat kernel embeddings are of the form diag n i=1 exp(-tλ i )v i v ⊤ i for some choices of the parameter t, so they can be approximated by SignNets or BasisNets. Also, the diffusion kernel (Mialon et al., 2021) is just the matrix of this heat kernel, and the p-step random walk kernel is n i=1 (1 -γλ i ) p v i v ⊤ i for some parameter γ, so BasisNets can universally approximate both of these. For the other positional encodings, we let v i be the eigenvectors of the random walk Laplacian I -D -1 A instead of the normalized Laplacian I -D -1/2 AD -1/2 . The eigenvalues of these two Laplacians are the same, and if ṽi is an eigenvector of the normalized Laplacian then D -1/2 ṽi is an eigenvector of the random walk Laplacian with the same eigenvalue (Von Luxburg, 2007) . Then with v i as the eigenvectors of the random walk Laplacian, the random walk positional encodings (RWPE) in Dwivedi et al. ( 2022) take the form diag (D -1 A) k = diag n i=1 (1 -λ i ) k v i v ⊤ i , for any choices of integer k. The distance encodings proposed in Li et al. ( 2020) take the form f 3 (AD -1 , (AD -1 ) 2 , (AD -1 ) 3 , . . .), for some function f 3 . We restrict to continuous f 3 here; shortest path distances can be obtained by a discontinuous f 3 that we discuss below. Their generalized PageRank based distance encodings can be obtained by n i=1   k≥1 γ k (1 -λ i ) k   v i v ⊤ i (47) for some γ k ∈ R, so this is a spectral graph convolution. They also define so-called landing probability based positional encodings, which take the form n i=1 (1 -λ i ) k v i v ⊤ i , for some choices of integer k. Thus, BasisNets can approximate these distance encoding matrices. Another powerful class of positional encodings is based on shortest path distances between nodes in the graph (Ying et al., 2021; Li et al., 2020) . Shortest path distances can be expressed in a form similar to the spectral graph convolution, but require a highly discontinuous function. If we define f 3 (x 1 , . . . , x n ) = min i:xi̸ =0 i to be the lowest index such that x i is nonzero, then we can write the shortest path distance matrix as f 3 (D -1 A, (D -1 A) 2 , . . . , (D -1 A) n ), where f 3 is applied elementwise to return an n × n matrix. As (D -1 A) k = n i=1 (1 -λ i ) k v i v ⊤ i , BasisNets can learn the inside arguments, but cannot learn the discontinuous function f 3 .

H.3 SPECTRAL INVARIANTS

Here, we consider the graph angles α ij = ∥V i V ⊤ i e j ∥ 2 , for i = 1, . . . , l where l is the number of eigenspaces, and j = 1, . . . , n. It is clear that graph angles are permutation equivariant and basis invariant. These graph angles have been extensively studied, so we cite a number of interesting properties of them. That graph angles determine the number of length 3, 4 and 5 cycles, the connectivity of a graph, and the number of length k closed walks is all shown in Chapter 4 of Cvetković et al. (1997) . Other properties may be of use for graph representation learning as well. For instance, the eigenvalues of node-deleted subgraphs of a graph G are determined by the eigenvalues and graph angles of G; this may be useful in extending recent graph neural networks that are motivated by node deletion and the reconstruction conjecture (Cotta et al., 2021; Bevilacqua et al., 2022; Papp et al., 2021; Tahmasebi et al., 2020) . Now, we prove that BasisNet can universally approximate the graph angles. The graph properties we consider in the theorem are all integer valued (e.g. the number of cycles of length 3 in a graph is an integer). Thus, any two graphs that differ in these properties will differ by at least 1, so as long as we have approximation to ε < 1/2, we can distinguish any two graphs that differ in these properties. Recall the statement of Theorem 3. Theorem 3. BasisNet can universally approximate the graph angles α ij . The eigenvalues and graph angles (and thus BasisNets) can determine the number of length 3, 4, and 5 cycles, whether a graph is connected, and the number of length k closed walks from any vertex to itself. Proof. Note that the graph angles satisfy α ij = ∥V i V ⊤ i e j ∥ 2 = e ⊤ j V i V ⊤ i V i V ⊤ i e j = e ⊤ j V i V ⊤ i e j , where V i is a basis for the ith adjacency matrix eigenspace, and e ⊤ j V i V ⊤ i e j is the (j, j)-entry of V i V ⊤ i . These graph angles are just the elementwise square roots of the diagonals of the matrices V i V ⊤ i . As f 1 (V i V ⊤ i ) = diag(V i V ⊤ i ) is a permutation equivariant linear function from matrices to vectors, 2-IGN on V i V ⊤ i can exactly compute this with 0 error. Then a 2-IGN can learn an elementwise MLP to approximate the elementwise square root f 2 (diag(V i V ⊤ i )) = diag(V i V ⊤ i ) to arbitrary precision. Finally, there may be remaining operations f 3 that are permutation invariant or permutation equivariant from vectors to vectors; for instance, the α ij are typically gathered into a matrix of size l × n where the columns are lexicographically sorted (l is the number of eigenspaces) (Cvetković et al., 1997) , or we may have a permutation invariant readout to compute a subgraph count. A The data we use are all freely available online. The datasets we use are ZINC (Irwin et al., 2012) , Alchemy (Chen et al., 2019a) , the synthetic counting substructures dataset (Chen et al., 2020) , the multi-task graph property regression synthetic dataset (Corso et al., 2020) (MIT License), the images dataset used by Balcilar et al. (2020) (GNU General Public License v3.0), the cat mesh from free3d. com/3d-model/cat-v1--522281.html (Personal Use License), and the human mesh from turbosquid.com/3d-models/water-park-slides-3d-max/1093267 (TurboSquid 3D Model License). If no license is listed, this means that we cannot find a license for the dataset. As they appear to be freely available with permissive licenses or no licenses, we do not ask for permission from the creators or hosts of the data. We do not believe that any of this data contains offensive content or personally identifiable information. The 50 images used in the spectral graph convolution experiments are mostly images of objects, with a few low resolution images of humans that do not appear to have offensive content. The only other human-related data appears to be the human mesh, which appears to be from a 3D scan of a human.

K.2 GRAPH REGRESSION DETAILS

ZINC. In Section 4.1 we study the effectiveness of SignNet for learning positional encodings to boost the expressive power, and thereby generalization, on the graph regression problem ZINC. In all cases we take our ϕ encoder to be an 8 layer GIN with ReLU activation. The input eigenvector v i ∈ R n , where n is the number of nodes in the graph, is treated as a single scalar feature for each node. In the case of using a fixed number of eigenvectors k, the aggregator ρ is taken to be an 8 layer MLP with batch normalization and ReLU activation. The aggregator ρ is applied separately to the concatenatation of the k different embeddings for each node in a graph, resulting in one single embedding per node. This embedding is concatenated to the node features for that node, and the result passed as input to the base (predictor) model. We also consider using all available eigenvectors in each graph instead of a fixed number k. Since the total number of eigenvectors is a variable quantity, equal to the number of nodes in the underlying graph, an MLP cannot be used for ρ. To handle the variable sized input in this case, we take ρ to be an MLP preceded by a sum over the ϕ outputs. In other words, the SignNet is of the form MLP k i=1 ϕ(v i ) + ϕ(-v i ) in this case. As well as testing SignNet, we also checked whether simple transformations that resolve the sign ambiguity of the Laplacian eigenvectors p = (v 1 , . . . , v k ) could serve as effective positional encoding. We considered three options. First is to randomly flip the sign of each ±v i during training. This is a common heuristic used in prior work on Laplacian positional encoding (Kreuzer et al., 2021; Dwivedi et al., 2020) . Second, take the element-wise absolute value |v i |. This is a non-injective map, creating sign invariance at the cost of destroying positional information. Third is a different canonicalization that avoids stochasticity and use of absolute values by selecting the sign of each v i so that the majority of entries are non-negative, with ties broken by comparing the ℓ 1 -norm of positive and negative parts. When the tie-break also fails, the sign is chosen randomly. Results for GatedGCN base model on ZINC in Table 1 show that all three of these approaches are significantly poorer positional encodings compared to SignNet. Our training pipeline largely follows that of Dwivedi et al. (2022) , and we use the GatedGCN and PNA base models from the accompanying implementation (see https://github.com/ vijaydwivedi75/gnn-lspe). The Sparse Transformer base model architecture we use, which like GAT computes attention only across neighbouring nodes, is introduced by Kreuzer et al. (2021) . Finally, the GINE implementation is based on the PyTorch Geometric implementation (Fey & Lenssen, 2019) . For the state-of-the-art comparison, all baseline results are from their respective papers, except for GIN, which we run. We used edge features for all models except the Sparse Transformer. For the Sparse Transformer, we found our method of using edge features to somewhat increase training instability, so standard deviation was higher, though mean test MAE was mostly similar to the runs without edge features. ZINC-full. We also run our method on the full ZINC dataset, termed ZINC-full. The result we report for SignNet is a larger version of the GatedGCN base model with a SignNet that takes in all eigenvectors. This model has 994,113 parameters in total. All baseline results are from their respective papers, except for GIN, which is from (Bodnar et al., 2021) . We closely follow the experimental setting of Koestler et al. (2022) for the texture reconstruction experiments. In this work, we use the cotangent Laplacian (Rustamov et al., 2007) of a triangle mesh with the lowest 1023 eigenvectors besides the trivial eigenvector of eigenvalue 0. We implemented SignNet in the authors' original code, which was privately shared with us. Both ρ and ϕ are taken to be MLPs. Hyperparameter settings and number of parameters are given in Table 10 . We chose hyperparameters so that the total number of parameters in the SignNet model was no larger than that of the original model.



A function class F model distinguishes graphs G1, G2 if there is an f ∈ F model such that f (G1) ̸ = f (G2).



Figure1: Symmetries of eigenvectors of a symmetric matrix with permutation invariances (e.g. a graph Laplacian). A neural network applied to the eigenvectors matrix (middle) should be invariant or equivariant to permutation of the rows (left product with a permutation matrix P ) and invariant to the choice of eigenvectors in each eigenbasis (right product with a block diagonal orthogonal matrix Diag(Q 1 , Q 2 , Q 3 )).

Figure 3: Counting substructures and regressing graph properties (lower is better). With Laplacian PEs, SignNet improves performance, while sign flip data augmentation (LapPE) is less consistent. Mean and standard deviations are reported on 3 runs. All runs use the same 4-layer GIN base model.

Figure 4: Cotangent Laplacian eigenvectors of the cat model and first principal component of ϕ(v) + ϕ(-v) from our trained SignNet. Our SignNet encodes bilateral symmetry, which is useful for reconstruction of the bilaterally symmetric texture.

Figure 5: PyTorch-like pseudo-code for using SignNet with a GNN prediction model, where ϕ = GIN and ρ = MLP as in the ZINC molecular graph regression experiments. Reshaping eigenvectors from n × k to n × k × 1 allows ϕ to process each eigenvector (and its negation) independently in PyTorch-like deep learning libraries.

Figure 6: (Left) Cotangent Laplacian eigenvectors of the cat model. (Right) First principal component of ϕ(v) + ϕ(-v) from our trained SignNet.

Figure 7: First three principal components of the full SignNet output on the cat model.

Figure 8: All normalized Laplacian eigenvectors of the fluorescein graph. The first principal components of SignNet's learned positional encodings do not exactly match any eigenvectors.

Figure 9: Normalized Laplacian eigenvectors and learned positional encodings for the graph of fluorescein. (Top row) From left to right: smallest and second smallest nontrivial eigenvectors, then second largest and largest eigenvectors. (Bottom row) From left to right: first four principal components of the output ρ([ϕ(v i ) + ϕ(-v i )] i=1,...,n ) of SignNet.

Figure 10: Commutative diagram for our proof of Theorem 4. Black arrows denote functions from topological constructions, and red dashed lines denote functions that we parameterize by neural networks (ϕ = ϕ 1 × . . . × ϕ k and ρ).

Results on the ZINC dataset with a 500k parameter budget. All models use edge features besides the Sparse Transformer. Numbers are the mean and standard deviation over 4 runs, each with different seeds.

Comparison with SOTA methods on graph-level regression tasks. Numbers are test MAE, so lower is better. Best models within a standard deviation are bolded.

Test results for texture reconstruction experiment on cat and human models, following the experimental setting of(Koestler et al., 2022). We use 1023 eigenvectors of the cotangent Laplacian. 29% increase in time, for a reduction of test MAE by over 50%. Also, eigenvector computation time is neglible, as we need only precompute and save the eigenvectors once, and it only takes 15 seconds to do this for the 12,000 graphs of ZINC.Comparison with SOTA. In Table2, we compare SignNet with other domain-agnostic state-of-theart methods on graph-level molecular regression tasks on ZINC (10,000 training graphs), ZINC-full

Jinwoo Kim, Saeyoon Oh, and Seunghoon Hong. Transformers generalize deepsets and can be extended to graphs & hypergraphs. In Advances in Neural Information Processing Systems (NeurIPS), volume 34, 2021. Jinwoo Kim, Tien Dat Nguyen, Seonwoo Min, Sungjun Cho, Moontae Lee, Honglak Lee, and Seunghoon Hong. Pure transformers are powerful graph learners. arXiv preprint arXiv:2207.02505, 2022. Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In Int. Conference on Learning Representations (ICLR), volume 5, 2017. Lukas Koestler, Daniel Grittner, Michael Moeller, Daniel Cremers, and Zorah Lähner. Intrinsic neural fields: Learning functions on manifolds. arXiv preprint arXiv:2203.07967, 2022. Hanspeter Kraft and Claudio Procesi. Classical invariant theory, a primer. Lecture Notes., 1996.

Properties of our architectures: Unconstrained-SignNet, SignNet, Unconstrained-BasisNet, and Expressive-BasisNet. The properties are: permutation equivariance, universality (for the proper class of continuous invariant functions), and computational tractability.

Eigenspace statistics for datasets of multiple graphs. From left to right, the columns are: dataset name, number of graphs, range of number of nodes per graph, largest multiplicity, and percent of graphs with an eigenspace of dimension > 1.

Eigenspace statistics for single graphs. From left to right, the columns are: dataset name, number of nodes, distinct eigenvalues (i.e. distinct eigenspaces), number of unique multiplicities, largest multiplicity, and percent of eigenvectors belonging to an eigenspace of dimension > 1.

.Further, Li  et al. (2020)  studies the theoretical and practical benefits of incorporating distance features into graph neural networks.Dwivedi et al. (2022) proposes a method to inject learnable positional encodings into each layer of a graph neural network, and uses a simple random walk based node positional encoding.You et al. (2021) proposes a node positional encoding diag(A k ), which captures the number of closed walks from a node to itself.Dwivedi et al. (2020) propose to use Laplacian eigenvectors as positional encodings in graph neural networks, with sign ambiguities alleviated by sign flipping data augmentation.Srinivasan & Ribeiro (2019) theoretically analyze node positional embeddings and structural representations in graphs, and show that most-expressive structural representations contain the information of any node positional embedding.

are able to leverage the canonical order in sequences, there is no such useful canonical order for nodes in a graph, due in part to permutation symmetries. Thus, different permutation equivariant positional

ACKNOWLEDGMENTS

We thank anonymous reviewers of this work, especially those of the Topology, Algebra, and Geometry Workshop at ICML 2022, for providing useful feedback and suggestions. We thank Leonardo Cotta for a discussion about automorphism symmetries in real-world and random graphs (Appendix C.3 and C.4). We thank Truong Son Hy for sending us some useful PyTorch codes for invariant graph networks. Stefanie Jegelka and Suvrit Sra acknowledge support from NSF CCF-2112665 (TILOS AI Research Institute) and NSF BIGDATA IIS-1741341. Stefanie Jegelka also acknowledges support from NSF Award 2134108 and NSF Convergence Accelerator Track D 2040636 and NSF C-ACCEL D636 -CRIPT Phase 2. Suvrit Sra acknowledges support from NSF CAREER grant (IIS-1846088). Joshua Robinson is partially supported by a Two Sigma fellowship. Derek Lim is supported by an NSF Graduate Fellowship.

annex

Published as a conference paper at ICLR 2023 (Lemma 5) guarantees such an embedding. Also, if the base space X i is a Euclidean space and G i is a finite or compact matrix Lie group, then a map built from G-invariant polynomials gives such an embedding (González & de Salas (2003) Lemma 11.13) .Figure 10 provides a commutative diagram representing the constructions in our proof.Theorem 4 (Decomposition Theorem). Let X 1 , . . . , X k be topological spaces, and let G i be a topological group acting continuously on X i for each i. Assume that there is a topological embedding ψ i : X i /G i → R ai of each quotient space into a Euclidean space R ai for some dimension a i . Then, for any continuous function f : X = X 1 × . . . × X k → R dout that is invariant to the action of G = G 1 × . . . × G k , there exists continuous functions ϕ i : X i → R ai and a continuous function ρ : Z ⊆ R a → R dout , where a = i a i such that f (v 1 , . . . , v k ) = ρ(ϕ 1 (v 1 ), . . . , ϕ k (v k )).(23)Furthermore: (1) each ϕ i can be taken to be invariant to G i , (2) the domain Z is compact if each X i is compact, (3) if X i = X j and G i = G j , then ϕ i can be taken to be equal to ϕ j .Proof. Let π i : X i → X i /G i denote the quotient map for X i /G i . Since each G i acts continuously, Lemma 3 gives that the quotient of the product space is the product of the quotient spaces, i.e. thatand the corresponding quotient map π : X /G is given byBy passing to the quotient (Lemma 1), there exists a continuous f : X /G → R dout on the quotient space such that f = f • π. By Lemma 4, each X i /G i is compact if X i is compact. Defining the image Z i = ψ i (X i /G i ) ⊆ R ai , we thus know that Z i is compact if X i is compact.Moreover, as ψ i is a topological embedding, it has a continuous inverse ψ -1 i on its image Z i . Further, we have a topological embedding ψ : X /G → Z = Z 1 × . . . × Z k given by ψ = ψ 1 × . . . × ψ k , with continuous inverseSo we defineThus, f = ρ • ϕ = ρ • (ϕ 1 × . . . × ϕ k ), so equation ( 9) holds. Moreover, the ρ and ϕ i are continuous, as they are compositions of continuous functions. Furthermore, (1) holds as eachTo show the last statement (3), note simply that if X i = X j and G i = G j , then the quotient maps are equal, i.e. π i = π j . Moreover, we can choose the embeddings to be equal, so say ψ i = ψ j . Then,, so we are done.

G.2 UNIVERSALITY OF SIGNNET AND BASISNET

Here, we prove Corollary 1 on the universal representation and approximation capabilities of our Unconstrained-SignNets, Unconstrained-BasisNets, and Expressive-BasisNets. We proceed in several steps, first proving universal representation of continuous functions when we do not require permutation equivariance, then proving universal approximation when we do require permutation equivariance.

G.2.3 BASIS INVARIANT UNIVERSAL REPRESENTATION

Recall that St(d, n) is the Stiefel manifold of d-tuples of vectors (v 1 , . . . , v d ) where v i ∈ R n and v 1 , . . . , v d are orthonormal. This is where our inputs lie, as our eigenvectors are unit norm and orthogonal. We will also make use of the Grassmannian Gr(d, n), which consists of all d-dimensional subspaces in R n . This is because the Grassmannian is the quotient space for the group action we want, Then there exist continuous ρ : Rwhere the ϕ i are O(d i ) invariant functions, and we can takeProof., it can be seen that G i acts continuously on X i . Also, we have that the quotient spaceThus, the Whitney embedding theorem (Lemma 5) gives a topological embeddingHence, we may apply Theorem 4 to obtain continuousand continuous ρ : R

G.2.4 BASIS INVARIANT AND PERMUTATION EQUIVARIANT UNIVERSAL APPROXIMATION

With the restriction that f (V 1 , . . . , V l ) : R n× i di → R n be permutation equivariant and basis invariant, we need to use the impractically expensive Expressive-BasisNet to approximate f . Universality of permutation invariant or equivariant functions from matrices to scalars or matrices to vectors is difficult to achieve in a computationally tractable manner (Maron et al., 2019; Keriven & Peyré, 2019; Maehara & NT, 2019) . One intuitive reason to expect this is that universally approximating such functions allows solution of the graph isomorphism problem (Chen et al., 2019b) , which is a computationally difficult problem. While we have exact representation of basis invariant functions by continuous ρ and ϕ i when there is no permutation equivariance constraint, we can only achieve approximation up to an arbitrary ϵ > 0 when we require permutation equivariance. Corollary 5 (Universal Approximation for Expressive-BasisNets).and permutation equivariant. Then f can be ϵ-approximated by an Expressive-BasisNet.Proof. By invariance, Corollary 4 of the decomposition theorem shows that f can be written asfor some continuous O(d i ) invariant φ di and continuous ρ. By the first fundamental theorem of O(d) (Lemma 2), each φ di can be written aswhich is compact as it is the image of the compact spaceThen note that h is continuous and permutation equivariant from matrices to vectors, so it can be ϵ-approximated by an invariant graph network (Keriven & Peyré, 2019) , call it IGN. If we define Published as a conference paper at ICLR 2023this identity operation is linear and permutation equivariant, so it can be exactly expressed by an IGN), then we have ϵ-approximation of f byG.3 PROOF OF UNIVERSAL APPROXIMATION FOR GENERAL DECOMPOSITIONS Theorem 5. Consider the same setup as Theorem 4, where X i are also compact. Let Φ i be a family of G i -invariant functions that universally approximate G i -invariant continuous functions X i → R ai , and let R be a set of continuous function that universally approximate continuous functions Z ⊆ R a → R dout for every compact Z, where a = i a i . Then for any ε > 0 and any G-invariant continuous function f :Now fix an ε > 0. For any ρ ∈ R and any ϕ i ∈ Φ i (i = 1, . . . k) we may bound the difference from f as follows (suppressing the v i 's for brevity),Since each ϕ ′ i is continuous and defined on a compact set X i we know that imϕ ′ i is compact, and so the product K is also compact. Since K ′ is compact, it is contained in a closed ball B(r) of radius r > 0 centered at the origin. Let K be the closed ball B(r + 1) of radius r + 1 centered at the origin, so K contains K ′ and a ball of radius 1 around each point of K ′ . We may extend ρ ′ continuously to K as needed, so assume ρ ′ : K → R dout . By universality of R we may pick a particular ρ :Keeping this choice of ρ, it remains only to bound II. As ρ is continuous on a compact domain, it is in fact uniformly continuous. Thus, we can choose a δ ′ > 0 such that if ∥y -z∥ 2 ≤ δ ′ , then ∥ρ(y) -ρ(z)∥ ∞ < ϵ/2, and then we define δ = min(δ ′ , 1).) is well-defined, and we havedue to our choice of δ, which completes the proof.

H BASIS INVARIANCE FOR GRAPH REPRESENTATION LEARNING H.1 SPECTRAL GRAPH CONVOLUTION

In this section, we consider spectral graph convolutions, which for node featuresfor some parameters θ i . We can optionally take θ i = h(λ i ) for some continuous function h : R → R of the eigenvalues. This form captures most popular spectral graph convolutions in the literature (Bruna et al., 2014; Hamilton, 2020; Bronstein et al., 2017) ; often, such convolutions are parameterized by taking h to be some analytic function such as a simple affine function (Kipf & Welling, 2017), a linear combination in a polynomial basis (Defferrard et al., 2016; Chien et al., 2021) , or a parameterization of rational functions (Levie et al., 2018; Bianchi et al., 2021) .First, it is well known and easy to see that spectral graph convolutions are permutation equivariant, as for a permutation matrix P we haveAlso, it is easy to see that they are sign invariant, asHowever, if the θ i do not depend on the eigenvalues, then the spectral graph convolution is not necessarily basis invariant. For instance, if v 1 and v 2 are in the same eigenspace, and we change basis by permuting v ′ 1 = v 2 and v ′ 2 = v 1 , then if θ 1 ̸ = θ 2 the spectral graph convolution will generally change as well.On the other hand, if θ i = h(λ i ) for some function h : R → R, then the spectral graph convolution is basis invariant. This is because if v i and v j belong to the same eigenspace, then λ i = λ j so h(λ i ) = h(λ j ). Thus, if v i1 , . . . , v i d are eigenvectors of the same eigenspace with eigenvalue λ, we have thatis the orthogonal projector onto the eigenspace (Trefethen & Bau III, 1997) . A change of basis does not change this orthogonal projector, so such spectral graph convolutions are basis invariant.Another way to see this basis invariance is with a simple computation. Let V 1 , . . . , V l be the eigenspaces of dimension d 1 , . . . , d l , where V i ∈ R n×di . Let the corresponding eigenvalues be µ 1 , . . . , µ l . Then for any orthogonal matricesso the spectral graph convolution is invariant to substituting V j Q j for V j . Now, we give the proof that shows SignNet and BasisNet can universally approximate spectral graph convolutions.Theorem 2 (Learning Spectral Graph Convolutions). Suppose the node features X ∈ R n×d feat take values in compact sets. Then SignNet can universally approximate any spectral graph convolution, and both BasisNet and Expressive-BasisNet can universally approximate any parametric spectral graph convolution.Proof. Note that eigenvectors and eigenvalues of normalized Laplacian matrices take values in compact sets, since the eigenvalues are in [0, 2] and we take eigenvectors to have unit-norm. Thus, the whole domain of the spectral graph convolution is compact.i X to within ε/n error, which DeepSets can do since this is a continuous permutation equivariant function from vectors to vectors DeepSets can approximate f 3 without any higher order tensors besides vectors (Zaheer et al., 2017; Segol & Lipman, 2019) .As 2-IGNs can approximate each f i individually, a single 2-IGN can approximate f 3 • f 2 • f 1 by Lemma 6. Also, since the graph properties considered in the theorem are integer-valued, BasisNet can distinguish any two graphs that differ in one of these properties.To see that message passing graph neural networks (MPNNs) cannot determine these quantities, we use the fact that MPNNs cannot distinguish between two graphs that have the same number of nodes and where each node (in both graphs) has the same degree. For k ≥ 3, let C k denote the cycle graph of size k, and C k + C k denote the graph that is the union of two disjoint cycle graphs of size k. MPNNs cannot distinguish between C 2k and C k + C k for k ≥ 3, because they have the same number of nodes, and each node has degree 2. Thus, MPNNs cannot tell whether a graph is connected, as 

I USEFUL LEMMAS

In this section, we collect useful lemmas for our proofs. These lemmas generally only require basic tools to prove. Our first lemma is a crucial property of quotient spaces. Lemma 1 (Passing to the quotient). Let X and Y be topological spaces, and let X /G be a quotient space, with corresponding quotient map π. Then for every continuous G-invariant function f : X → Y, there is a unique continuous f :. This is well-defined, since if π(x z ) = π(x) for any other x ∈ X , then gx z = x for some g ∈ G, sowhere the second equality uses the G-invariance of f . Note that f is continuous by the universal property of quotient spaces. Also, f is the unique function such that f = f • π; if there were another function h :Next, we give the First Fundamental Theorem of O(d), a classical result that has been recently used for machine learning by Villar et al. (2021) . This result shows that an orthogonally invariant f (V ) can be expressed as a function h(V V ⊤ ). We give a proof that if f is continuous, then h is also continuous.For the other direction, invariant theory shows that the O(d) invariant polynomials are generated by the inner products v ⊤ i v j , where v i ∈ R d are the rows of V (Kraft & Procesi, 1996). Let p : R n×d → R n×n be the map p(V ) = V V ⊤ . Then González & de Salas (2003) Lemma 11.13 shows that the quotient space R n×d /O(d) is homeomorphic to a closed subset p(R n×d ) = Z ⊆ R n×n . Let p refer to this homeomorphism, and note that p • π = p by passing to the quotient (Lemma 1). Then any continuous O(d) invariant f passes to a unique continuous f : R n×d /O(d) → R dout (Lemma 1), so f = f • π where π is the quotient map. Define h : Z → R dout by h = f • p-1 , and note that h is a composition of continuous functions and hence continuous. Finally, we have thatThe next lemma allows us to decompose a quotient of a product space into a product of smaller quotient spaces. Lemma 3. Let X 1 , . . . , X k be topological spaces and G 1 , . . . , G k be topological groups such that each G i acts continuously on X i . Denote the quotient maps by π i : X i → X i /G i . Then the quotient of the product is the product of the quotient, i.e.and the quotient map for this space, it is easily seen that q(x 1 , . . . , x k ) = q(y 1 . . . , y k ) if and only if. . , y k ), since either of these is true if and only if there exist g i ∈ G i such that x i = g i y i for each i. Thus, we have an isomorphism of these quotient spaces.The following lemma shows that quotients of compact spaces are also compact, which is useful for universal approximation on quotient spaces. Lemma 4 (Compactness of quotients of compact spaces). Let X be a compact space. Then the quotient space X /G is compact.Proof. Denoting the quotient map by π : X → X /G and letting {U α } α be an open cover of X /G, we have that {π -1 (U α )} α is an open cover of X . By compactness of X , we can choose a finite subcover {π -1 (U αi )} i=1,...,n . Then {π(π -1 (U αi ))} i=1,...,n = {U αi } i=1,...,n by surjectivity, and {U αi } i=1,...,n is thus an open cover of X /G.The Whitney embedding theorem gives a nice condition that we apply to show that the quotient spaces X /G that we deal with embed into Euclidean space. It says that when X /G is a smooth manifold, then it can be embedded into a Euclidean space of double the dimension of the manifold. The proof is outside the scope of this paper. Lemma 5 (Whitney Embedding Theorem (Whitney, 1944) ). Every smooth manifold M of dimension n > 0 can be smoothly embedded in R 2n .Finally, we give a lemma that helps prove universal approximation results. It says that if functions f that we want to approximate can be written as compositions f = f L • . . . • f 1 , then it suffices to universally approximate each f i and compose the results to universally approximate the f . This is especially useful for proving universality of neural networks, as we may use some layers to approximate each f i , then compose these layers to approximate the target function f . Lemma 6 (Layer-wise universality implies universality). Let Z ⊆ R d0 be a compact domain, let F 1 , . . . , F L be families of continuous functions whereFor each i, let Φ i be a family of continuous functions that universally approximates F i . Then the family of compositionsLet Z1 = Z, and then for i ≥ 2 let Zi = f i-1 ( Zi-1 ). Then each Zi is compact by continuity of the f i . For 1 ≤ i < L, let Z i = Zi , and for i = L let Z L be a compact set containing ZL such that every ball of radius one centered at a point in ZL is still contained in Z L .Let ϵ > 0. We will show that there is a ϕ ∈ Φ such that ∥f -ϕ∥ ∞ < ϵ by induction on L. This holds trivially for L = 1, as then Φ = Φ 1 . Now, let L ≥ 2, and suppose it holds for L -1. By universality of Φ L , we can choose a ϕ L :As ϕ L is continuous on a compact domain, it is also uniformly continuous, so we can choose a δ > 0 such that ∥y -Let δ = min( δ, 1). By induction, we can chooseTo bound this other term, let x ∈ Z, and for y, we know that ∥y -z∥ 2 < δ, so ∥ϕ L (y) -ϕ L (z)∥ 2 < ϵ/2 by uniform continuity. As this holds for all x, we have ∥ϕ - All graph regression models in Table 1 use edge features for learning and inference. To show that SignNet is also useful when no edge features are available, we ran ZINC experiments without edge features as well. The results are displayed in Table 7 . In this setting, SignNet still significantly improves the performance over message passing networks without positional encodings, and over Laplacian positional encodings with sign flipping data augmentation. In Table 8 , we compare our model against methods that have domain-specific information about molecules built into them: HIMP (Fey et al., 2020) and CIN (Bodnar et al., 2021) . We see that SignNet is better than HIMP and CIN-small on these tasks, and is within a standard deviation of CIN. The SignNet models are the same as the ones reported in Table 2 . Once again, we emphasize that SignNet is domain-agnostic. To numerically test the ability of our basis invariant networks for learning spectral graph convolutions, we follow the experimental setups of Balcilar et al. (2020) ; He et al. (2021) . We take the dataset of 50 images in He et al. (2021) (originally from the Image Processing Toolbox of MATLAB), and resize them from 100×100 to 32×32. Then we apply the same spectral graph convolutions on them as in He et al. (2021) , and train neural networks to learn these as regression targets. As in prior work, we report sum of squared errors on the training set to measure expressivity.

J.3 LEARNING SPECTRAL GRAPH CONVOLUTIONS

We compare against message passing GNNs (Kipf & Welling, 2017; Veličković et al., 2018) and spectral GNNs (Chien et al., 2021; Bianchi et al., 2021; Defferrard et al., 2016; He et al., 2021) . Also, we consider standard Transformers with only node features, with eigenvectors and sign flip augmentation, and with absolute values of eigenvectors. These models are all approximately sign invariant (they either use eigenvectors in a sign invariant way or do not use eigenvectors). We use DeepSets (Zaheer et al., 2017) in SignNet and 2-IGN (Maron et al., 2018) in BasisNet for ϕ, use a DeepSets for ρ in both cases, and then feed the features into another DeepSets or a standard Transformer (Vaswani et al., 2017) to make the final predictions. That is, we are only given graph information through the eigenvectors and eigenvalues, and we do not use message passing. 2021), we only train and evaluate on nodes that are not connected to the boundary of the grid (that is, we only evaluate on the 28 × 28 middle section). For all experiments we limit each model to 50,000 parameters. We use the Adam (Kingma & Ba, 2014) optimizer for all experiments. For each of the GNN baselines (GCN, GAT, GPR-GNN, ARMA, ChebNet, BernNet), we select the best performing out of 4 hyperparameter settings: either 2 or 4 convolution layers, and a hidden dimension of size 32 or D, where D is just large enough to stay with 50,000 parameters (for instance, D = 128 for GCN, GPR-GNN, and BernNet).We use DeepSets or standard Transformers as our prediction network. This takes in the output of SignNet or BasisNet and concatenates it with the node features, then outputs a scalar prediction for each node. We use a 3 layer output network for DeepSets SignNet, and 2 layer output networks for all other configurations. All networks use ReLU activations.For SignNet, we use DeepSets for both ϕ and ρ. Our ϕ takes in eigenvectors only, then our ρ takes the outputs of ϕ and the eigenvalues. We use three layers for ϕ and ρ.For BasisNet, we use the same DeepSets for ρ as in SignNet, and 2-IGNs for the ϕ di . There are three distinct multiplicities for the grid graph (1, 2, and 32), so we only need 3 separate IGNs. Each IGN consists of an R n 2 ×1 → R n×d ′ layer and two R n×d ′′ → R n×d ′′′ layers, where the d ′ are hidden dimensions. There are no matrix to matrix operations used, as the memory requirements are intensive for these ≥ 1000 node graphs. The ϕ di only take in V i V ⊤ i from the eigenspaces, and the ρ takes the output of the ϕ di as well as the eigenvalues.

K.4 SUBSTRUCTURES AND GRAPH PROPERTIES REGRESSION DETAILS

We use the random graph dataset from Chen et al. (2020) for counting substructures and the synthetic dataset from Corso et al. (2020) for regressing graph properties. For fair comparison we fix the base model as a 4-layer GIN model with hidden size 128. We choose ϕ as a 4-layer GIN (independently applied to every eigenvector) and ρ as a 1-layer Transformer (independently applied to every node). Combined with proper batching and masking, we have a SignNet that takes Laplacian eigenvectors V ∈ R n×n and outputs fixed size sign-invariant encoding node features f (V, Λ, X) ∈ R n×d , where n varies between graphs but d is fixed. We use this SignNet in our experiments and compare with other methods of handling PEs. 

