Graph Neural Network Acceleration via Matrix Dimension Reduction

Abstract

Graph Neural Networks (GNNs) have become the de facto method for machine learning on graph data (e.g., social networks, protein structures, code ASTs), but they require significant time and resource to train. One alternative method is Graph Neural Tangent Kernel (GNTK), a kernel method that corresponds to infinitely wide multi-layer GNNs. GNTK's parameters can be solved directly in a single step, avoiding time-consuming gradient descent. Today, GNTK is the state-of-the-art method to achieve high training speed without compromising accuracy. Unfortunately, solving for the kernel and searching for parameters can still take hours to days on real-world graphs. The current computation of GNTK has running time O(N 4 ), where N is the number of nodes in the graph. This prevents GNTK from scaling to datasets that contain large graphs. Theoretically, we present two techniques to speed up GNTK training while preserving the generalization error: (1) We use a novel matrix decoupling method to reduce matrix dimensions during the kernel solving. This allows us to reduce the dominated computation bottleneck term from O(N 4 ) to O(N 3 ). (2) We apply sketching to further reduce the bottleneck term to o(N ω ), where ω ≈ 2.373 is the exponent of current matrix multiplication. Experimentally, we demonstrate that our approaches speed up kernel learning by up to 19× on real-world benchmark datasets.



Recently, a new direction for fast GNN training is to use Graph Neural Tangent Kernel (GNTK). Solving for the kernel and searching for the parameters in GNTK is equivalent to using gradient descent to train an infinitely wide multi-layer GNN. GNTK is significantly faster than iterative gradient descent optimization because solving the parameters in GNTK is just a single-step kernel learning process. In addition, GNTK allows GNN training to scale with GNN model sizes because the training time grows only linearly with the complexity of GNN models. However, GNTK training can still take hours to days on typical GNN datasets today. Our key observation is that, during the process of solving parameters in GNTK, most of the training time and resource is spent on multiplications of large matrices. Let N be the maximum number of nodes in the graphs, these matrices can have sizes as large as N 2 × N 2 ! This means a single matrix multiplication takes at least N 4 time, and it prevents GNTK from scaling to larger graphs. Thus, in order to speed up GNTK training, we need to reduce matrix dimensions.



(GNNs) have quickly become the de facto method for machine learning on graph data. GNNs have delivered ground-breaking results in many important areas of AI, including social networking Yang et al. (2020a), bio-informatics Zitnik & Leskovec (2017); Yue et al. (2020), recommendation systems Ying et al. (2018), and autonomous driving Weng et al. (2020); Yang et al. (2020b). However, efficient GNNs training has become a major challenge with the relentless increase in the complexity of GNN models and dataset sizes, both in terms of the number of graphs in a dataset and the sizes of the graphs.

