A SPECTRAL PERSPECTIVE ON DEEP SUPERVISED COMMUNITY DETECTION Anonymous

Abstract

In this work, we study the behavior of standard models for community detection under spectral manipulations. Through various ablation experiments, we evaluate the impact of bandpass filtering on the numerical performances of a GCN: we empirically show that most of the necessary and used information for nodes classification is contained in the low-frequency domain, and thus contrary to Euclidean graph (e.g., images), high-frequencies are less crucial to community detection. In particular, it is possible to obtain accuracies at a state-of-the-art level with simple classifiers that rely only on a few low frequencies: this is surprising because contrary to GCNs, no cascade of filtering along the graph structure is involved and it indicates that the important spectral components for the supervised community detection task are essentially in the low-frequency domain.

1. INTRODUCTION

Graph Convolutional Networks (GCNs) are the state of the art in community detection (Kipf & Welling, 2016) . They correspond to Graph Neural Networks (GNNs) that propagate graph features through a cascade of linear operator and non-linearities, while exploiting the graph structure through a linear smoothing operator. However, the principles that allow GCNs to obtain good performances remain unclear. It is suggested in Li et al. (2018) that GCNs are eager to over-smooth their representation, which indicates they average too much neighborhood nodes and dilute classification information. The smoothing is generally interpreted as a low-pass filtering through the graph Laplacian, and finding a way to exploit high-frequencies of the graph Laplacian is an active research question (Oono & Suzuki, 2019) . In contrast to this, our work actually suggests that, in the setting of community detection, graph Laplacian high-frequencies have actually a minor impact on the classification performances of a standard GCN, as opposed to standard Convolutional Neural Networks for vision, which are built thanks to image processing considerations. Graph Signal Processing (GSP) is a popular field whose objective is to manipulate signals spectrum whose topology is given by a graph. Typically, this graph has a non-Euclidean structure, however many central theoretical results (Hammond et al., 2011) are based on an analogy with Euclidean, regular grids. For instance, a spectral component or frequency has to be understood as an eigenvalue of the Laplacian, yet it thus suffers from intrinsic issues such as isotropy (Oyallon, 2020) . The principles of GSP are very appealing because they allow to use the dense literature of harmonic analysis, on graphs. Thus, this literature is at the core of many intuitions and drives many key ingredients of a GCN design, which evokes standard tools of signal processing: convolutions, shift invariance, wavelets, Fourier (Bronstein et al., 2017) , etc. Here, we certainly observe several limits of this analogy in the context of community detection: for instance, we observe that discarding high-frequencies has a minor impact on a GCN behavior, because the spectrum of the graphs of the datasets that are used is essentially located in the low-frequency domain. This type of ideas is for instance core in spectral clustering algorithms. Spectral clustering is a rather different point of view from deep supervised GCNs which studies node labeling in unsupervised contexts: it generally relies on generative models based on the graph spectrum. The main principle is to consider the eigenvectors corresponding to the smallest nonzero eigenvalues, referred to as Fiedler vectors (Doshi & Eun, 2020): those directions allow to define clusters, depending on the sign of a feature. Several theoretical guarantees can be obtained in the context of Stochastic Block Model approximation (Rohe et al., 2011) . Our paper proposes to establish a clear link with this approach: we show that the informative graph features are located in a low-frequency band of the graph Laplacian and do not need extra graph processing tools to be efficiently used in a deep supervised classifier. This paper shows via various ablation experiments that experiments on standard community detection datasets like Cora, Citeseer, Pubmed can be conducted using only few frequencies of their respective graph spectrum without observing any significant performances drop. Other contributions are as follows: (a) First we show that most of the necessary information exploited by a GCN for a community detection task can actually be isolated in the very first eigenvectors of a Laplacian. (b) We numerically show that the high-frequency eigenvalues are less informative for the supervised community detection task and that a trained GCN is more stable to them. (c) We observe that a simple MLP method fed with handcrafted features allows to successfully deal with transdusctive datasets like Cora, Citeseer or Pubmed: to our knowledge, this is the first competitive results obtained with a MLP on those datasets. We now discuss the organization of the paper: first, we discuss the related work in Sec. 2. We explain our notations as well as our work hypotheses in Sec. 3. Then, we study low-rank approximations of the graph Laplacian in Sec. 4.1. Finally, the end of Sec. 4 proposes several experiments to study the impact of high-frequencies on GCNs. A basic code is provided in the supplementary materials, and our code will be released on an online public repository at the time of publication.

2. RELATED WORK

GCNs and Spectral GCNs Introduced in Kipf & Welling (2016) , GCNs allow to deal with large graph structure in semi-supervised classification contexts. This type of model works at the node level, meaning that it uses locally the adjacency matrix. This approach has inspired a wide range of models, such as linear GCN (Wu et al., 2019 ), Graph Attention Networks (Veličković et al., 2017 ), GraphSAGE (Hamilton et al., 2017) , etc. In general, this line of work does not consider directly the graph Laplacian. Another line of work corresponds to spectral methods, that employ filters which are designed from the spectrum of a graph Laplacian (Bruna et al., 2013) . In general, those works make use of polynomials in the Laplacian (Defferrard et al., 2016) , which are very similar to an anisotropic diffusion (Klicpera et al., 2019) . All those references share the idea to manipulate bandpass filters that discriminate the different ranges of frequencies. Over-smoothing in GCNs In the context of GCN, Li et al. (2018) is one of the first papers to notice that cascading low-pass filters can lead to a substantial information loss. The result of our work indicates that the important spectral components for detecting communities are already in the low-frequency domain and that this is not due to an architecture bias. Zhao & Akoglu (2019); Yang et al. (2020) proposes to introduce regularizations which address the loss of information issues. Cai & Wang (2020); Oono & Suzuki (2019) study the spectrum of a graph Laplacian under various transform, yet they consider the spectrum globally and in asymptotic settings, with a deep cascade of layers. Huang et al. (2020); Rong et al. (2019b) introduce data augmentations, whose aim is to alleviate over-smoothing in deep networks: we study GCNs without this ad-hoc procedure.

Spectral clustering and low rank approximation

As the literature about spectral clustering is large, we mainly focus on the subset that connects directly with GCN. Mehta et al. (2019) proposes to learn an unsupervised auto-encoder in the framework of a Stochastic Block Model. Oono & Suzuki (2019) introduces the Erdös -Renyi model in the GCN analysis, but only in an asymptotic setting. Loukas & Vandergheynst (2018) studies the graph topology preservation under the coarsening of the graph, which could be a potential direction for future works. Node embedding A MLP approach can be understood as an embedding at the node level. For instance, Aubry et al. (2011) applies a spectral embedding combined with a diffusion process for shape analysis, which allows point-wise comparisons. We should also point Deutsch & Soatto (2020) that uses a node embedding, based on the spectrum of a quite modified graph Laplacian, obtained from on a measure of node centrality. Graph Scattering Networks (GSN) This class model explicitly employs band-pass based on the spectrum of a graph Laplacian and it is thus necessary to review it. Gao et al. (2019) ; Gama et al.

