MLPINIT: EMBARRASSINGLY SIMPLE GNN TRAIN-ING ACCELERATION WITH MLP INITIALIZATION

Abstract

Training graph neural networks (GNNs) on large graphs is complex and extremely time consuming. This is attributed to overheads caused by sparse matrix multiplication, which are sidestepped when training multi-layer perceptrons (MLPs) with only node features. MLPs, by ignoring graph context, are simple and faster for graph data, however they usually sacrifice prediction accuracy, limiting their applications for graph data. We observe that for most message passing-based GNNs, we can trivially derive an analog MLP (we call this a PeerMLP) with an equivalent weight space, by setting the trainable parameters with the same shapes, making us curious about how do GNNs using weights from a fully trained PeerMLP perform? Surprisingly, we find that GNNs initialized with such weights significantly outperform their PeerMLPs, motivating us to use PeerMLP training as a precursor, initialization step to GNN training. To this end, we propose an embarrassingly simple, yet hugely effective initialization method for GNN training acceleration, called MLPInit. Our extensive experiments on multiple large-scale graph datasets with diverse GNN architectures validate that MLPInit can accelerate the training of GNNs (up to 33× speedup on OGB-products) and often improve prediction performance (e.g., up to 7.97% improvement for GraphSAGE across 7 datasets for node classification, and up to 17.81% improvement across 4 datasets for link prediction on metric Hits@10).

1. INTRODUCTION

Graph Neural Networks (GNNs) (Zhang et al., 2018; Zhou et al., 2020; Wu et al., 2020) have attracted considerable attention from both academic and industrial researchers and have shown promising results on various practical tasks, e.g., recommendation (Fan et al., 2019; Sankar et al., 2021; Ying et al., 2018; Tang et al., 2022) , knowledge graph analysis (Arora, 2020; Park et al., 2019; Wang et al., 2021 ), forecasting (Tang et al., 2020; Zhao et al., 2021; Jiang & Luo, 2022 ) and chemistry analysis (Li et al., 2018b; You et al., 2018; De Cao & Kipf, 2018; Liu et al., 2022) . However, training GNN on large-scale graphs is extremely time-consuming and costly in practice, thus spurring considerable work dedicated to scaling up the training of GNNs, even necessitating new massive graph learning libraries (Zhang et al., 2020; Ferludin et al., 2022) for large-scale graphs. Recently, several approaches for more efficient GNNs training have been proposed, including novel architecture design (Wu et al., 2019; You et al., 2020d; Li et al., 2021) , data reuse and partitioning paradigms (Wan et al., 2022; Fey et al., 2021; Yu et al., 2022) and graph sparsification (Cai et al., 2020; Jin et al., 2021b) . However, these kinds of methods often sacrifice prediction accuracy and increase modeling complexity, while sometimes meriting significant additional engineering efforts. MLPs are used to accelerate GNNs (Zhang et al., 2021b; Frasca et al., 2020; Hu et al., 2021) by decoupling GNNs to node features learning and graph structure learning. Our work also leverages MLPs but adopts a distinct perspective. Notably, we observe that the weight space of MLPs and GNNs can be identical, which enables us to transfer weights between MLP and GNN models. Having the fact that MLPs train faster than GNNs, this observation inspired us to raise the question:

availability

The code is available at https://github.com/snapresearch/MLPInit

