SOGCN: SECOND-ORDER GRAPH CONVOLUTIONAL NETWORKS

Abstract

We introduce a second-order graph convolution (SoGC), a maximally localized kernel, that can express a polynomial spectral filter with arbitrary coefficients. We contrast our SoGC with vanilla GCN, first-order (one-hop) aggregation, and higher-order (multi-hop) aggregation by analyzing graph convolutional layers via generalized filter space. We argue that SoGC is a simple design capable of forming the basic building block of graph convolution, playing the same role as 3 × 3 kernels in CNNs. We build purely topological Second-Order Graph Convolutional Networks (SoGCN) and demonstrate that SoGCN consistently achieves state-ofthe-art performance on the latest benchmark. Moreover, we introduce the Gated Recurrent Unit (GRU) to spectral GCNs. This explorative attempt further improves our experimental results.

1. INTRODUCTION

Deep localized convolutional filters have achieved great success in the field of deep learning. In image recognition, the effectiveness of 3 × 3 kernels as the basic building block in Convolutional Neural Networks (CNNs) is shown both experimentally and theoretically (Zhou, 2020) . We are inspired to search for the maximally localized Graph Convolution (GC) kernel with full expressiveness power for Graph Convolutional Networks (GCNs). Most existing GCN methods utilize localized GCs based on one-hop aggregation scheme as the basic building block. Extensive works have shown performance limitations of such design due to over-smoothing (Li et al., 2018; Oono & Suzuki, 2019; Cai & Wang, 2020) . In vanilla GCNs (Kipf & Welling, 2017) the root cause of its deficiency is the lumping of the graph node self-connection with pairwise neighboring connections. Recent works of Xu et al. (2019); Dehmamy et al. (2019); Ming Chen et al. (2020) disentangle the effect of self-connection by adding an identity mapping (so-called first-order GC). However, its lack of expressive power in filter representation remains (Abu-El-Haija et al., 2019) . The work of (Ming Chen et al., 2020) conjectured that the ability to express a polynomial filter with arbitrary coefficients is essential for preventing over-smoothing. A longer propagation distance in the graph facilitates GCNs to retain its expressive power, as pointed out by (Liao et al., 2019; Luan et al., 2019; Abu-El-Haija et al., 2019) . The minimum propagation distance needed to construct our building block of GCN remains the open question. We show that the minimum propagation distance is two: a two-hop graph kernel with the second-order polynomials in adjacency matrices is sufficient. We call our graph kernel Second-Order GC (SoGC). We introduce a Layer Spanning Space (LSS) framework to quantify the expressive power of multilayer GCs for modeling a polynomial filter with arbitrary coefficients. By relating low-pass filtering on the graph spectrum (Hoang & Maehara, 2019) with over-smoothing, one can see the lack of filter representation power (Ming Chen et al., 2020) can lead to the performance limitation of GCN. Using the LSS framework, we show that SoGCs can approximate any linear GCNs in channel-wise filtering. Furthermore, higher-order GCs do not contribute more expressiveness, and vanilla GCN or first-order GCs cannot represent all polynomial filters in general. In this sense, SoGC is the maximally localized graph kernel with the full representation power. To our best knowledge, this work is the first study that identifies the importance of the two-hop neighborhood in the context of GCNs' ability to express a polynomial filter with arbitrary coefficients. Our model is a special but non-trivial case of Defferrard et al. (2016) . Kipf & Welling (2017) conducted an ablation study with GC kernels of different orders but missed the effectiveness of the second-order relationships. The work of Abu-El-Haija et al. ( 2019) talked about muti-hop graph kernels; however, they did not identify the critical importance of the second-order form. In contrast, we clarify the prominence of SoGCs in theories and experiments. Our research on graph convolution using pure topologically relationship is orthogonal to those uses geometric relations (Monti et al., 2017; Fey et al., 2018; Pei et al., 2020) , or those with expressive edge features (Li et al., 2016; Gilmer et al., 2017; Corso et al., 2020) , and hyper-edges (Morris et al., 2019; Maron et al., 2018; 2019) . It is also independent with graph sampling procedures (Rong et al., 2019; Hamilton et al., 2017; Li et al., 2019) .

2. PRELIMINARIES

We begin by reformulating spectral GCNs and introducing our notation. We are interested in a finite graph set G = G 1 , • • • , G |G| . Assume each graph G ∈ G is simple and undirected, associated with a finite vertex set V(G), an edge set E(G) = {(u, v) : ∀u ↔ v}, and a symmetric normalized adjacency matrix A(G) (Chung & Graham, 1997; Shi & Malik, 2000) . Without loss of generality and for simplicity, |V(G)| = N for every G ∈ G. Single-channel features x ∈ R N supported in graph G ∈ G is a vectorization of function V(G) → R. Graph Convolutions (GCs) is known as Linear Shift-Invariant (LSI) operators to adjacency matrices (Sandryhaila & Moura, 2013) . By this definition, GCs can extract features regardless of where local structures fall. Given parameter space Ω ⊆ R, we write a single-channel GC (Sandryhaila & Moura, 2013; Defferrard et al., 2016) as a mapping f θ : G × R N → R N such that 1 : f θ (G, x) = K k=0 θ k A(G) k x, where θ = [θ 0 • • • θ K ] T ∈ Ω K+1 parameterizes the GC. K reflects the localization of f θ : a linear combination of features aggregated by A(G) k . Moreover, we reformulate two popular models, vanilla GC (Figure 1a ) and first-order GC (Figure 1b ), as below: f 0 (G, x) = θ (A(G) + I) x, f 1 (G, x) = (θ 1 A(G) + θ 0 I) x. The general spectral GCNs stack L layers of GCs (Equation 1) with nonlinear activations. Let f (l) be GC layers with parameters θ (l) ∈ Ω K+1 , l ∈ [L], the single-channel GCNs can be written as: F (G, x) = g • f (L) • σ • f (L-1) • • • • • σ • f (1) (G, x),



To validate our theory, we build Second-Order Graph Convolutional Networks (SoGCN) using SoGC kernels. Our SoGCN using simple graph topological features consistently achieves state- We can replace the Laplacian matrix L in Defferrard et al. (2016) with the normalized adjacency matrix A since L = I -A.



Figure 1: Vertex domain interpretations of vanilla GC, First-Order GC and SoGC. Denote I as the zero-hop aggregator, A the first-hop aggregator and A 2 the second-hop aggregator. Nodes in the same colored ring share the same weights. (a) I and A of vanilla GC share the same weights. (b) First-order GC disentangles I from A. (c) SoGC introduces new weights for A 2 in addition.

