UNIFYING DATA-MODEL SPARSITY FOR CLASS-IMBALANCED GRAPH REPRESENTATION LEARNING

Abstract

To relieve the massive computation cost in the field of deep learning, models with more compact architectures have been proposed for comparable performance. However, it is not only the cumbersome model architectures but also the massiveness of the training data that adds up to the expensive computational burdens. This problem is particularly accentuated in the graph learning field: on one hand, Graph Neural Networks (GNNs) trained upon non-Euclidean graph data often encounter relatively higher time costs, due to their irregular density properties; on the other hand, the natural class-imbalance property accompanied by graphs cannot be alleviated by the massiveness of data, therefore hindering GNNs' ability in generalization. To fully tackle the above issues, (i) theoretically, we introduce a hypothesis on to what extent a subset of the training data can approximate the full dataset's learning effectiveness, which is further guaranteed by the gradients' distance between the subset and the full set; (ii) empirically, we discover that during the learning process of a GNN, some samples in the training dataset are informative in providing gradients for model parameters update. Moreover, the informative subset evolves as the training process proceeds. We refer to this observation as dynamic data sparsity. We also notice that a pruned sparse contrastive GNN model sometimes "forgets" the information provided by the informative subset, reflected in their large loss in magnitudes. Motivated by the above findings, we develop a unified data-model dynamic sparsity framework named Graph Decantation (GraphDec) to address the above challenges. The key idea of GraphDec is to identify the informative subset dynamically during the training process by adopting the sparse graph contrastive learning. Extensive experiments on multiple benchmark datasets demonstrate that GraphDec outperforms state-of-the-art baselines for the class-imbalanced graph/node classification tasks, with respect to classification accuracy and data usage efficiency.

1. INTRODUCTION

Graph representation learning (GRL) (Kipf & Welling, 2017) has shown remarkable power in dealing with non-Euclidean structure data (e.g., social networks, biochemical molecules, knowledge graphs). Graph neural networks (GNNs) (Kipf & Welling, 2017; Hamilton et al., 2017; Veličković et al., 2018) , as the current state-of-the-art of GRL, have become essential in various graph mining applications. However, in many real-world scenarios, training on graph data often encounters two difficulties: class imbalance (Park et al., 2022) and massive data usage (Thakoor et al., 2021; Hu et al., 2020) . Firstly, class imbalance naturally exists in datasets from diverse practical domains, such as bioinformatics and social networks. GNNs are sensitive to this property and can be biased toward the dominant classes. This bias may mislead GNNs' learning process, resulting in underfitting samples that are of real importance to the downstream tasks, and poor test performance at last. Secondly, massive data usage requires GNN to perform message-passing over nodes of high degrees bringing about heavy computation burdens. Some calculations are redundant in that not all neighbors are informative regarding learning task-related embeddings. Unlike regular data such as images or texts, the connectivity of irregular graph data invokes random memory access, which further slows down the efficiency of data readout.

