POLYNOMIAL GRAPH CONVOLUTIONAL NETWORKS

Abstract

Graph Convolutional Neural Networks (GCNs) exploit convolution operators, based on some neighborhood aggregating scheme, to compute representations of graphs. The most common convolution operators only exploit local topological information. To consider wider topological receptive fields, the mainstream approach is to non-linearly stack multiple Graph Convolutional (GC) layers. In this way, however, interactions among GC parameters at different levels pose a bias on the flow of topological information. In this paper, we propose a different strategy, considering a single graph convolution layer that independently exploits neighbouring nodes at different topological distances, generating decoupled representations for each of them. These representations are then processed by subsequent readout layers. We implement this strategy introducing the Polynomial Graph Convolution (PGC) layer, that we prove being more expressive than the most common convolution operators and their linear stacking. Our contribution is not limited to the definition of a convolution operator with a larger receptive field, but we prove both theoretically and experimentally that the common way multiple non-linear graph convolutions are stacked limits the neural network expressiveness. Specifically, we show that a Graph Neural Network architecture with a single PGC layer achieves state of the art performance on many commonly adopted graph classification benchmarks.

1. INTRODUCTION

In the last few years, the definition of machine learning methods, particularly neural networks, for graph-structured input has been gaining increasing attention in literature (Defferrard et al., 2016; Errica et al., 2020) . In particular, Graph Convolutional Networks (GCNs), based on the definition of a convolution operator in the graph domain, are relatively fast to compute and have shown good predictive performance. Graph Convolutions (GC) are generally based on a neighborhood aggregation scheme (Gilmer et al., 2017) considering, for each node, only its direct neighbors. Stacking multiple GC layers, the size of the receptive field of deeper filters increases (resembling standard convolutional networks). However, stacking too many GC layers may be detrimental on the network ability to represent meaningful topological information (Li et al., 2018) due to a too high Laplacian smoothing. Moreover, in this way interactions among GC parameters at different layers pose a bias on the flow of topological information. For these reasons, several convolution operators have been defined in literature, differing from one another in the considered aggregation scheme. We argue that the performance of GC networks could benefit by increasing the size of the receptive fields, but since with existing GC architectures this effect can only be obtained by stacking more GC layers, the increased difficulty in training and the limitation of expressiveness given by the stacking of many local layers ends up hurting their predictive capabilities. Consequently, the performances of existing GCNs are strongly dependent on the specific architecture. Therefore, existing graph neural network performances are limited by (i) the necessity to select an appropriate convolution operator, and (ii) the limitation of expressiveness caused by large receptive fields being possible only stacking many local layers. In this paper, we tackle both the issues following a different strategy. We propose the Polynomial Graph Convolution (PGC) layer that independently considers neighbouring nodes at different topological distances (i.e. arbitrarily large receptive fields). The PGC layer faces the problem of selecting a suitable convolution operator being able to represent many existing convolutions in literature, and being more expressive than most of them. As for the second issue a PGC layer, directly considering larger receptive fields, can represent a richer set of functions compared to the linear stacking of two or more Graph Convolution layers, i.e. it is more expressive. Moreover, the linear PGC design allows to consider large receptive fields without incurring in typical issues related to training deep networks. We developed the Polynomial Graph Convolutional Network (PGCN), an architecture that exploits the PGC layer to perform graph classification tasks. We empirically evaluate the proposed PGCN on eight commonly adopted graph classification benchmarks. We compare the proposed method to several state-of-the-art GCNs, consistently achieving higher or comparable predictive performances. Differently from other works in literature, the contribution of this paper is to show that the common approach of stacking multiple GC layers may not provide an optimal exploitation of topological information because of the strong coupling of the depth of the network with the size of the topological receptive fields. In our proposal, the depth of the PGCN is decoupled from the receptive field size, allowing to build deep GNNs avoiding the oversmoothing problem.

2. NOTATION

We use italic letters to refer to variables, bold lowercase to refer to vectors, and bold uppercase letters to refer to matrices. The elements of a matrix A are referred to as a ij (and similarly for vectors). We use uppercase letters to refer to sets or tuples. Let G = (V, E, X) be a graph, where V = {v 0 , . . . , v n-1 } denotes the set of vertices (or nodes) of the graph, E ⊆ V × V is the set of edges, and X ∈ R n×s is a multivariate signal on the graph nodes with the i-th row representing the attributes of v i . We define A ∈ R n×n as the adjacency matrix of the graph, with elements a ij = 1 ⇐⇒ (v i , v j ) ∈ E. With N (v) we denote the set of nodes adjacent to node v. Let also D ∈ R n×n be the diagonal degree matrix where d ii = j a ij , and L the normalized graph laplacian defined by L = I -D -1 2 AD -1 2 , where I is the identity matrix. With GConv θ (x v , G) we denote a graph convolution with set of parameters θ. A GCN with k levels of convolutions is denoted as GConv θ k (. . . GConv θ1 (x v , G) . . . , G). For a discussion about the most common GCNs we refer to Appendix A. We indicate with X the input representation fed to a layer, where X = X if we are considering the first layer of the graph convolutional network, or X = H (i-1) if considering the i-th graph convolution layer.

3. POLYNOMIAL GRAPH CONVOLUTION (PGC)

In this section, we introduce the Polynomial Graph Convolution (PGC), able to simultaneously and directly consider all topological receptive fields up to k -hops, just like the ones that are obtained by the graph convolutional layers in a stack of size k. PGC, however, does not incur in the typical limitation related to the complex interaction among the parameters of the GC layers. Actually, we show that PGC is more expressive than the most common convolution operators. Moreover, we prove that a single PGC convolution of order k is capable of implementing k linearly stacked layers of convolutions proposed in the literature, providing also additional functions that cannot be realized by the stack. Thus, the PGC layer extracts topological information from the input graph decoupling in an effective way the depth of the network from the size of the receptive field. Its combination with deep MLPs allows to obtain deep graph neural networks that can overcome the common oversmoothing problem of current architectures. The basic idea underpinning the definition of PGC is to consider the case in which the graph convolution can be expressed as a polynomial of the powers of a transformation T of the adjacency matrix. This definition is very general, and thus it incorporates many existing graph convolutions as special cases. Given a graph G = (V, E, X) with adjacency matrix A, the Polynomial Graph Convolution (PGC) layer of degree k, transformation T of A, and size m, is defined as P GConv k,T ,m (X, A) = R k,T W, where T (A) ∈ R n×n , R k,T ∈ R n×s * (k+1) , R k,T = [X, T (A)X, T (A) 2 X, .., T (A) k X], and W ∈ R s * (k+1)×m is a learnable weight matrix. For the sake of presentation, we will consider W as composed of blocks: W = [W 0 , . . . , W k ] , with W j ∈ R s×m . In the following, we show that PGC is very expressive, able to implement commonly used convolutions as special cases.

