GENERALIZING GRAPH CONVOLUTIONAL NETWORKS VIA HEAT KERNEL

Abstract

Graph convolutional networks (GCNs) have emerged as a powerful framework for mining and learning with graphs. A recent study shows that GCNs can be simplified as a linear model by removing nonlinearities and weight matrices across all consecutive layers, resulting the simple graph convolution (SGC) model. In this paper, we aim to understand GCNs and generalize SGC as a linear model via heat kernel (HKGCN), which acts as a low-pass filter on graphs and enables the aggregation of information from extremely large receptive fields. We theoretically show that HKGCN is in nature a continuous propagation model and GCNs without nonlinearities (i.e., SGC) are the discrete versions of it. Its low-pass filter and continuity properties facilitate the fast and smooth convergence of feature propagation. Experiments on million-scale networks show that the linear HKGCN model not only achieves consistently better results than SGC but also can match or even beat advanced GCN models, while maintaining SGC's superiority in efficiency.

1. INTRODUCTION

Graph neural networks (GNNs) have emerged as a powerful framework for modeling structured and relational data (Gori et al., 2005; Scarselli et al., 2008; Gilmer et al., 2017; Kipf & Welling, 2017) . A wide range of graph mining tasks and applications have benefited from its recent emergence, such as node classification (Kipf & Welling, 2017; Veličković et al., 2018) , link inference (Zhang & Chen, 2018; Ying et al., 2018), and graph classification (Xu et al., 2019b) . The core procedure of GNNs is the (discrete) feature propagation operation, which propagates information between nodes layer by layer based on rules derived from the graph structures. Take the graph convolutional network (GCN) (Kipf & Welling, 2017) for example, its propagation is performed through the normalized Laplacian of the input graph. Such a procedure usually involves 1) the non-linear feature transformation, commonly operated by the activation function such as ReLU, and 2) the discrete propagation layer by layer. Over the course of its development, various efforts have been devoted to advancing the propagation based architecture, such as incorporating self-attention in GAT (Veličković et al., 2018) , mixing high-order neighborhoods in MixHop (Abu-El-Haija et al., 2019), and leveraging graphical models in GMNN (Qu et al., 2019) . Recently, Wu et al. (Wu et al., 2019) observe that the non-linear part of GCNs' feature propagation is actually associated with excess complexity and redundant operations. To that end, they simplify GCNs into a linear model SGC by removing all non-linearities between consecutive GCN layers. Surprisingly, SGC offers comparable or even better performance to advanced GCN models, based on which they argue that instead of the non-linear feature transformation, the repeated graph propagation may contribute the most to the expressive power of GCNs. Though interesting results generated, SGC still inherits the discrete nature of GCNs' propagation, which can lead to strong oscillations during the procedure. Take, for example, a simple graph of two nodes v 1 and v 2 with one-dimension input features x 1 = 1 & x 2 = 2 and one weighted edge between them, the feature updates of x 1 and x 2 during the GCN propagation is shown in Figure 1 (a), from which we can clearly observe the oscillations of x 1 and x 2 step by step. This indicates that though the features from multi-hops away may seem to be taken into consideration during the GCN propagation, it is still far away to learn patterns from them.

