ANALYSING THE UPDATE STEP IN GRAPH NEURAL NETWORKS VIA SPARSIFICATION

Abstract

In recent years, Message-Passing Neural Networks (MPNNs), the most prominent Graph Neural Network (GNN) framework, have celebrated much success in the analysis of graph-structured data. In MPNNs the computations are split into three steps, Aggregation, Update and Readout. In this paper a series of models to successively sparsify the linear transform in the Update step is proposed. Specifically, the ExpanderGNN model with a tuneable sparsification rate and the Activation-Only GNN, which has no linear transform in the Update step, are proposed. In agreement with a growing trend in the relevant literature the sparsification paradigm is changed by initialising sparse neural network architectures rather than expensively sparsifying already trained architectures. These novel benchmark models enable a better understanding of the influence of the Update step on model performance and outperform existing simplified benchmark models such as the Simple Graph Convolution (SGC). The ExpanderGNNs, and in some cases the Activation-Only models, achieve performance on par with their vanilla counterparts on several down-stream graph prediction tasks, while containing exponentially fewer trainable parameters. In experiments with matching parameter numbers our benchmark models outperform the state-of-the-art GNNs models. These observations enable us to conclude that in practice the update step often makes no positive contribution to the model performance.

1. INTRODUCTION

Recent years have witnessed the blossom of Graph Neural Networks (GNNs). They have become the standard tools for analysing and learning graph-structured data (Wu et al., 2020) and have demonstrated convincing performance in various application areas, including chemistry (Duvenaud et al., 2015) , social networks (Monti et al., 2019) , natural language processing (Yao et al., 2019) and neural science (Griffa et al., 2017) . Among various GNN models, Message-Passing Neural Networks (MPNNs, Gilmer et al. ( 2017)) and their variants are considered to be the dominating class. In MPNNs, the learning procedure can be separated into three major steps: Aggregation, Update and Readout, where Aggregation and Update are repeated iteratively so that each node's representation is updated recursively based on the transformed information aggregated over its neighbourhood. With each iteration, the receptive field of the hidden representation is increased by 1-step on the graph structure such that at k th iteration, the hidden state of node i is composed of information from its k-hop neighbourhood. There is thus a division of labour between the Aggregation and the Update step, where the Aggregation utilises local graph structure, while the Update step is only applied to single node representations at a time independent of the local graph structure. From this a natural question then arises: What is the impact of the graph-agnostic Update step on the performance of GNNs? Wu et al. (2019) first challenged the role of Update steps by proposing a simplified graph convolutional network (SGC) where they removed the non-linearities in the Update steps and collapsed the consecutive linear transforms into a single transform. Their experiments demonstrated, surprisingly, that in some instances the Update step of Graph Convolutional Network (GCN, Kipf & Welling (2017) ) can be left out completely without the models' accuracy decreasing.

