GCINT:DYNAMIC QUANTIZATION ALGORITHM FOR TRAINING GRAPH CONVOLUTION NEURAL NETWORKS USING ONLY INTEGERS

Abstract

Quantization approaches can minimize storage costs while decreasing the computational complexity of a model, although there is minimal study in the GNN field on quantization networks. We studied the four primary reasons why existing quantization approaches cannot be employed extensively with GNNs: (1)Quantifying the distinctions between data sources; (2)Quantifying the distinctions between data streams; (3)Quantifying the distinctions between concentrations; (4)QAT's Limitations. Based on this, we propose GCINT, which is an efficient quantization framework prepared for GNN training. The entire forward, backward, optimizer, and loss functions are calculated using integer data. We achieved a training acceleration ratio of nearly 10× compared to FP32 Cuda Core in RTX 2080TI INT8 Tensor Core. Our quantization is independent of the dataset and weight distribution, and more than 2,000 randomized trials have been undertaken on the 8 popular GNN benchmark datasets, with all achieving errors within 1% of the FP32.

1. INTRODUCTION

There is an abundance of graph-structured data in the natural and social sciences. In fields such as social networks (Fan et al., 2019) , recommender systems (Wu et al., 2020) , traffic networks (Jiang & Luo, 2022 ), molecular prediction (Mansimov et al., 2019) , and drug discovery (Zhang et al., 2022) , Graph Neural Networks (GNNs) representative deep learning systems for graph data learning, inference, and generalization have produced superior outcomes. As graph learning applications increase and graph data expands, the training of GNNs becomes inefficient due to two significant obstacles: Quantization (Yang et al., 2019) can lower storage costs while decreasing the model's computational complexity (Nagel et al., 2021) . Although quantization is widely used in CNNs, research on quantized networks for GNNs is scarce, we believe the following factors primarily restrict the applicability of quantization approaches in GNNs: (1) Quantifying the distinctions between data sources. During CNN training, the RGB images of UINT8 are normalized and sent to the network. In contrast, when using GNN models, the node features of the network are frequently not the consequence of normalization, and the distribution of node features will shift as the graph changes and embedding methods are employed. The information contained in the image could have been represented by UINT8, whereas the embedding vectors of graph nodes are typically in FP32 data format, which contains significantly more information than UINT8. Therefore, it is essential for GNN to quantize the dataset, which must represent a large amount of information in the dataset with a limited number of bits. (2) Quantifying the distinctions between data streams. The calculation in each layer of CNN that maps to the GPU is typically General Matrix Multiplication (GEMM), and the activation distribution in each layer of GNN is strongly tied to the graph topology, when the 1



Storage expenses. Since training needs recording the outputs of several layers in forward propagation for backward propagation calculation, extremely large scale graph data is frequently saved utilizing distributed CPU-centric memory storage by distributed GPU clusters employing a minibatch training technique. Common acceleration devices such as GPUs and FPGAs with on-chip storage and bandwidth can no longer match the demand for training large GNNs and are too dependent on sampling techniques to train on a device with a limited batch size for each training session (Yang, 2019). (2) Calculated expenses. Training a single epoch on the Reddit dataset generally requires tens of TFLOPS, even for KB-sized GNN models.

