A SPECTRAL PERSPECTIVE ON DEEP SUPERVISED COMMUNITY DETECTION Anonymous

Abstract

In this work, we study the behavior of standard models for community detection under spectral manipulations. Through various ablation experiments, we evaluate the impact of bandpass filtering on the numerical performances of a GCN: we empirically show that most of the necessary and used information for nodes classification is contained in the low-frequency domain, and thus contrary to Euclidean graph (e.g., images), high-frequencies are less crucial to community detection. In particular, it is possible to obtain accuracies at a state-of-the-art level with simple classifiers that rely only on a few low frequencies: this is surprising because contrary to GCNs, no cascade of filtering along the graph structure is involved and it indicates that the important spectral components for the supervised community detection task are essentially in the low-frequency domain.

1. INTRODUCTION

Graph Convolutional Networks (GCNs) are the state of the art in community detection (Kipf & Welling, 2016) . They correspond to Graph Neural Networks (GNNs) that propagate graph features through a cascade of linear operator and non-linearities, while exploiting the graph structure through a linear smoothing operator. However, the principles that allow GCNs to obtain good performances remain unclear. It is suggested in Li et al. (2018) that GCNs are eager to over-smooth their representation, which indicates they average too much neighborhood nodes and dilute classification information. The smoothing is generally interpreted as a low-pass filtering through the graph Laplacian, and finding a way to exploit high-frequencies of the graph Laplacian is an active research question (Oono & Suzuki, 2019) . In contrast to this, our work actually suggests that, in the setting of community detection, graph Laplacian high-frequencies have actually a minor impact on the classification performances of a standard GCN, as opposed to standard Convolutional Neural Networks for vision, which are built thanks to image processing considerations. Graph Signal Processing (GSP) is a popular field whose objective is to manipulate signals spectrum whose topology is given by a graph. Typically, this graph has a non-Euclidean structure, however many central theoretical results (Hammond et al., 2011) are based on an analogy with Euclidean, regular grids. For instance, a spectral component or frequency has to be understood as an eigenvalue of the Laplacian, yet it thus suffers from intrinsic issues such as isotropy (Oyallon, 2020). The principles of GSP are very appealing because they allow to use the dense literature of harmonic analysis, on graphs. Thus, this literature is at the core of many intuitions and drives many key ingredients of a GCN design, which evokes standard tools of signal processing: convolutions, shift invariance, wavelets, Fourier (Bronstein et al., 2017) , etc. Here, we certainly observe several limits of this analogy in the context of community detection: for instance, we observe that discarding high-frequencies has a minor impact on a GCN behavior, because the spectrum of the graphs of the datasets that are used is essentially located in the low-frequency domain. This type of ideas is for instance core in spectral clustering algorithms. Spectral clustering is a rather different point of view from deep supervised GCNs which studies node labeling in unsupervised contexts: it generally relies on generative models based on the graph spectrum. The main principle is to consider the eigenvectors corresponding to the smallest nonzero eigenvalues, referred to as Fiedler vectors (Doshi & Eun, 2020) : those directions allow to define clusters, depending on the sign of a feature. Several theoretical guarantees can be obtained in the context of Stochastic Block Model approximation (Rohe et al., 2011) . Our paper proposes

