MLPINIT: EMBARRASSINGLY SIMPLE GNN TRAIN-ING ACCELERATION WITH MLP INITIALIZATION

Abstract

Training graph neural networks (GNNs) on large graphs is complex and extremely time consuming. This is attributed to overheads caused by sparse matrix multiplication, which are sidestepped when training multi-layer perceptrons (MLPs) with only node features. MLPs, by ignoring graph context, are simple and faster for graph data, however they usually sacrifice prediction accuracy, limiting their applications for graph data. We observe that for most message passing-based GNNs, we can trivially derive an analog MLP (we call this a PeerMLP) with an equivalent weight space, by setting the trainable parameters with the same shapes, making us curious about how do GNNs using weights from a fully trained PeerMLP perform? Surprisingly, we find that GNNs initialized with such weights significantly outperform their PeerMLPs, motivating us to use PeerMLP training as a precursor, initialization step to GNN training. To this end, we propose an embarrassingly simple, yet hugely effective initialization method for GNN training acceleration, called MLPInit. Our extensive experiments on multiple large-scale graph datasets with diverse GNN architectures validate that MLPInit can accelerate the training of GNNs (up to 33× speedup on OGB-products) and often improve prediction performance (e.g., up to 7.97% improvement for GraphSAGE across 7 datasets for node classification, and up to 17.81% improvement across 4 datasets for link prediction on metric Hits@10).

1. INTRODUCTION

Graph Neural Networks (GNNs) (Zhang et al., 2018; Zhou et al., 2020; Wu et al., 2020) have attracted considerable attention from both academic and industrial researchers and have shown promising results on various practical tasks, e.g., recommendation (Fan et al., 2019; Sankar et al., 2021; Ying et al., 2018; Tang et al., 2022) , knowledge graph analysis (Arora, 2020; Park et al., 2019; Wang et al., 2021) , forecasting (Tang et al., 2020; Zhao et al., 2021; Jiang & Luo, 2022 ) and chemistry analysis (Li et al., 2018b; You et al., 2018; De Cao & Kipf, 2018; Liu et al., 2022) . However, training GNN on large-scale graphs is extremely time-consuming and costly in practice, thus spurring considerable work dedicated to scaling up the training of GNNs, even necessitating new massive graph learning libraries (Zhang et al., 2020; Ferludin et al., 2022) for large-scale graphs. Recently, several approaches for more efficient GNNs training have been proposed, including novel architecture design (Wu et al., 2019; You et al., 2020d; Li et al., 2021) , data reuse and partitioning paradigms (Wan et al., 2022; Fey et al., 2021; Yu et al., 2022) and graph sparsification (Cai et al., 2020; Jin et al., 2021b) . However, these kinds of methods often sacrifice prediction accuracy and increase modeling complexity, while sometimes meriting significant additional engineering efforts. MLPs are used to accelerate GNNs (Zhang et al., 2021b; Frasca et al., 2020; Hu et al., 2021) by decoupling GNNs to node features learning and graph structure learning. Our work also leverages MLPs but adopts a distinct perspective. Notably, we observe that the weight space of MLPs and GNNs can be identical, which enables us to transfer weights between MLP and GNN models. Having the fact that MLPs train faster than GNNs, this observation inspired us to raise the question: Can we train GNNs more efficiently by leveraging the weights of converged MLPs? To answer this question, we first pioneer a thorough investigation to reveal the relationship between the MLPs and GNNs in terms of trainable weight space. For ease of presentation, we define the PeerMLP of a GNNfoot_0 so that GNN and its PeerMLP share the same weightsfoot_1 . We find that interestingly, GNNs can be optimized by training the weights of their PeerMLP. Based on this observation, we adopt weights of converged PeerMLP as the weights of corresponding GNNs and find that these GNNs perform even better than converged PeerMLP on node classification tasks (results in Table 2 ). Motivated by this, we propose an embarrassingly simple, yet remarkably effective method to accelerate GNNs training by initializing GNN with the weights of its converged PeerMLP. Specifically, to train a target GNN, we first train its PeerMLP and then initialize the GNN with the optimal weights of converged PeerMLP. We present the experimental results in Figure 1 to show the training speed comparison of GNNs with random initialization and with MLPInit. In Figure 1 , Speedup shows the training time reduced by our proposed MLPInit compared to random initialized GNN, while achieving the same test performance. This experimental result shows that MLPInit is able the accelerate the training of GNNs significantly: for example, we speed up the training of GraphSAGE, GraphSAINT, ClusterGCN, GCN by 2.48×, 3.94×, 2.06×, 1.91× on OGB-arXiv dataset, indicating the superiority of our method in GNNs training acceleration. Moreover, we speed up GraphSAGE training more than 14× on OGB-products. We highlight our contributions as follows: • We pioneer a thorough investigation to reveal the relationship between MLPs and GNNs in terms of the trainable weight space through the following observations: (i) GNNs and MLPs have the same weight space. (ii) GNNs can be optimized by training the weights of their PeerMLPs. (iii) GNN with weights from its converged PeerMLP surprisingly performs better than the performance of its converged PeerMLP on node classification tasks. • Based on the above observations, we proposed an embarrassingly simple yet surprisingly effective initialization method to accelerate the GNNs training. Our method, called MLPInit, initializes the weights of GNNs with the weight of their converged PeerMLP. After initialization, we observe that GNN training takes less than half epochs to converge than those with random initialization. while often improving the model performancefoot_2 (e.g., 7.97% improvement for node classification on GraphSAGE and 17.81% improvement for link prediction on Hits@10). • MLPInit is extremely easy to implement and has virtually negligible computational overhead compared to the conventional GNN training schemes. In addition, it is orthogonal to other GNN acceleration methods, such as weight quantization and graph coarsening, further increasing headroom for GNN training acceleration in practice.



The formal definition of PeerMLP is in Section 3. By share the same weight, we mean that the trainable weights of GNN and its PeerMLP are the same in terms of size, dimension, and values. By performance, we refer to the model prediction quality metric of the downstream task on the corresponding test data throughout the discussion.



Figure 1: The training speed comparison of the with Random initialization and MLPInit.indicates the best performance that GNNs with random initialization can achieve.indicates the comparable performance of the GNN with MLPInit. Speedup indicates the training time reduced by our proposed MLPInit compared to random initialization. This experimental result shows that MLPInit is able to accelerate the training of GNNs significantly.

Thus, MLPInit is able to accelerate the training of GNNs since training MLPs is cheaper and faster than training GNNs. • Comprehensive experimental results on multiple large-scale graphs with diverse GNNs validate that MLPInit is able to accelerate the training of GNNs (up to 33× speedup on OGB-products)

availability

The code is available at https://github.com/snapresearch/MLPInit

