SPECFORMER: SPECTRAL GRAPH NEURAL NETWORKS MEET TRANSFORMERS

Abstract

Spectral graph neural networks (GNNs) learn graph representations via spectraldomain graph convolutions. However, most existing spectral graph filters are scalar-to-scalar functions, i.e., mapping a single eigenvalue to a single filtered value, thus ignoring the global pattern of the spectrum. Furthermore, these filters are often constructed based on some fixed-order polynomials, which have limited expressiveness and flexibility. To tackle these issues, we introduce Specformer, which effectively encodes the set of all eigenvalues and performs self-attention in the spectral domain, leading to a learnable set-to-set spectral filter. We also design a decoder with learnable bases to enable non-local graph convolution. Importantly, Specformer is equivariant to permutation. By stacking multiple Specformer layers, one can build a powerful spectral GNN. On synthetic datasets, we show that our Specformer can better recover ground-truth spectral filters than other spectral GNNs. Extensive experiments of both node-level and graph-level tasks on real-world graph datasets show that our Specformer outperforms state-ofthe-art GNNs and learns meaningful spectrum patterns. Code and data are available at https://github.com/bdy9527/Specformer.

1. INTRODUCTION

Graph neural networks (GNNs), firstly proposed in (Scarselli et al., 2008) , become increasingly popular in the field of machine learning due to their empirical successes. Depending on how the graph signals (or features) are leveraged, GNNs can be roughly categorized into two classes, namely spatial GNNs and spectral GNNs. Spatial GNNs often adopt a message passing framework (Gilmer et al., 2017; Battaglia et al., 2018) , which learns useful graph representations via propagating local information on graphs. Spectral GNNs (Bruna et al., 2013; Defferrard et al., 2016) instead perform graph convolutions via spectral filters (i.e., filters applied to the spectrum of the graph Laplacian), which can learn to capture non-local dependencies in graph signals. Although spatial GNNs have achieved impressive performances in many domains, spectral GNNs are somewhat under-explored. There are a few reasons why spectral GNNs have not been able to catch up. First, most existing spectral filters are essentially scalar-to-scalar functions. In particular, they take a single eigenvalue as input and apply the same filter to all eigenvalues. This filtering mechanism could ignore the rich information embedded in the spectrum, i.e., the set of eigenvalues. For example, we know from the spectral graph theory that the algebraic multiplicity of the eigenvalue 0 tells us the number of connected components in the graph. However, such information can not be captured by scalarto-scalar filters. Second, the spectral filters are often approximated via fixed-order (or truncated) orthonormal bases, e.g., Chebyshev polynomials (Defferrard et al., 2016; He et al., 2022) and graph wavelets (Hammond et al., 2011; Xu et al., 2019) , in order to avoid the costly spectral decomposition of the graph Laplacian. Although the orthonormality is a nice property, this truncated approximation is less expressive and may severely limit the graph representation learning. Therefore, in order to improve spectral GNNs, it is natural to ask: how can we build expressive spectral filters that can effectively leverage the spectrum of graph Laplacian? To answer this question, we first note that eigenvalues of graph Laplacian represent the frequency, i.e., total variation of the corresponding eigenvectors. The magnitudes of frequencies thus convey rich information. Moreover, the relative difference between two eigenvalues also reflects important frequency information, e.g., the spectral gap. To capture both magnitudes of frequency and relative frequency, we propose a Transformer (Vaswani et al., 2017b) based set-to-set spectral filter, termed Specformer. Our Specformer first encodes the range of eigenvalues via positional embedding and then exploits the self-attention mechanism to learn relative information from the set of eigenvalues. Relying on the learned representations of eigenvalues, we also design a decoder with a bank of learnable bases. Finally, by combining these bases, Specformer can construct a permutation-equivariant and non-local graph convolution. In summary, our contributions are as follows: • We propose a novel Transformer-based set-to-set spectral filter along with learnable bases, called Specformer, which effectively captures both magnitudes and relative differences of all eigenvalues of the graph Laplacian. • We show that Specformer is permutation equivariant and can perform non-local graph convolutions, which is non-trivial to achieve in many spatial GNNs. • Experiments on synthetic datasets show that Specformer learns to better recover the given spectral filters than other spectral GNNs. • Extensive experiments on various node-level and graph-level benchmarks demonstrate that Specformer outperforms state-of-the-art GNNs and learns meaningful spectrum patterns.

2. RELATED WORK

Existing GNNs can be roughly divided into two categories: spatial and spectral GNNs. Spatial GNNs. Spatial GNNs like GAT (Velickovic et al., 2018) and MPNN (Gilmer et al., 2017) leverage message passing to aggregate local information from neighborhoods. By stacking multiple layers, spatial GNNs can possibly learn long-range dependencies but suffer from over-smoothing (Oono & Suzuki, 2020) and over-squashing (Topping et al., 2022) . Therefore, how to balance local and global information is an important research topic for spatial GNNs. We refer readers to (Wu et al., 2021; Zhou et al., 2020; Liao, 2021) for a more detailed discussion about spatial GNNs. Spectral GNNs. Spectral GNNs (Ortega et al., 2018; Dong et al., 2020; Wu et al., 2019; Zhu et al., 2021; Bo et al., 2021; Chang et al., 2021; Yang et al., 2022) leverage the spectrum of graph Laplacian to perform convolutions in the spectral domain. A popular subclass of spectral GNNs leverages different kinds of orthogonal polynomials to approximate arbitrary filters, including Monomial (Chien et al., 2021 ), Chebyshev (Defferrard et al., 2016; Kipf & Welling, 2017; He et al., 2022 ), Bernstein (He et al., 2021 ), and Jacobi (Wang & Zhang, 2022) . Relying on the diagonalization of symmetric matrices, they avoid direct spectral decomposition and guarantee localization. However, all such polynomial filters are scalar-to-scalar functions, and the bases are pre-defined, which limits their expressiveness. Another subclass requires either full or partial spectral decomposition, such as SpectralCNN (Estrach et al., 2014) and LanczosNet (Liao et al., 2019) . They parameterize the spectral filters by neural networks, thus being more expressive than truncated polynomials. However, such spectral filters are still limited as they do not capture the dependencies among multiple eigenvalues. Graph Transformer. Transformers and GNNs are closely relevant since the attention weights of Transformer can be seen as a weighted adjacency matrix of a fully connected graph. Graph Transformers Dwivedi & Bresson (2020) combine both and have gained popularity recently. Graphormer (Ying et al., 2022) , SAN (Kreuzer et al., 2021), and GPS (Rampásek et al., 2022) design powerful positional and structural embeddings to further improve their expressive power. Graph Transformers still belong to spatial GNNs, although the high-cost self-attention is non-local. The limitation of spatial attention compared to spectral attention has been discussed in (Bastos et al., 2022) .

