POLYNOMIAL GRAPH CONVOLUTIONAL NETWORKS

Abstract

Graph Convolutional Neural Networks (GCNs) exploit convolution operators, based on some neighborhood aggregating scheme, to compute representations of graphs. The most common convolution operators only exploit local topological information. To consider wider topological receptive fields, the mainstream approach is to non-linearly stack multiple Graph Convolutional (GC) layers. In this way, however, interactions among GC parameters at different levels pose a bias on the flow of topological information. In this paper, we propose a different strategy, considering a single graph convolution layer that independently exploits neighbouring nodes at different topological distances, generating decoupled representations for each of them. These representations are then processed by subsequent readout layers. We implement this strategy introducing the Polynomial Graph Convolution (PGC) layer, that we prove being more expressive than the most common convolution operators and their linear stacking. Our contribution is not limited to the definition of a convolution operator with a larger receptive field, but we prove both theoretically and experimentally that the common way multiple non-linear graph convolutions are stacked limits the neural network expressiveness. Specifically, we show that a Graph Neural Network architecture with a single PGC layer achieves state of the art performance on many commonly adopted graph classification benchmarks.

1. INTRODUCTION

In the last few years, the definition of machine learning methods, particularly neural networks, for graph-structured input has been gaining increasing attention in literature (Defferrard et al., 2016; Errica et al., 2020) . In particular, Graph Convolutional Networks (GCNs), based on the definition of a convolution operator in the graph domain, are relatively fast to compute and have shown good predictive performance. Graph Convolutions (GC) are generally based on a neighborhood aggregation scheme (Gilmer et al., 2017) considering, for each node, only its direct neighbors. Stacking multiple GC layers, the size of the receptive field of deeper filters increases (resembling standard convolutional networks). However, stacking too many GC layers may be detrimental on the network ability to represent meaningful topological information (Li et al., 2018) due to a too high Laplacian smoothing. Moreover, in this way interactions among GC parameters at different layers pose a bias on the flow of topological information. For these reasons, several convolution operators have been defined in literature, differing from one another in the considered aggregation scheme. We argue that the performance of GC networks could benefit by increasing the size of the receptive fields, but since with existing GC architectures this effect can only be obtained by stacking more GC layers, the increased difficulty in training and the limitation of expressiveness given by the stacking of many local layers ends up hurting their predictive capabilities. Consequently, the performances of existing GCNs are strongly dependent on the specific architecture. Therefore, existing graph neural network performances are limited by (i) the necessity to select an appropriate convolution operator, and (ii) the limitation of expressiveness caused by large receptive fields being possible only stacking many local layers. In this paper, we tackle both the issues following a different strategy. We propose the Polynomial Graph Convolution (PGC) layer that independently considers neighbouring nodes at different topological distances (i.e. arbitrarily large receptive fields). The PGC layer faces the problem of selecting a suitable convolution operator being able to represent many existing convolutions in literature, and being more expressive than most of them. As for the second issue a PGC layer, directly considering larger receptive fields, can represent a richer set of functions compared to the linear stacking of two or more Graph Convolution layers, i.e. it is more expressive. Moreover, the linear PGC design allows

