SEARCHING LOTTERY TICKETS IN GRAPH NEURAL NETWORKS: A DUAL PERSPECTIVE

Abstract

Graph Neural Networks (GNNs) have shown great promise in various graph learning tasks. However, the computational overheads of fitting GNNs to large-scale graphs grow rapidly, posing obstacles to GNNs from scaling up to real-world applications. To tackle this issue, Graph Lottery Ticket (GLT) hypothesis articulates that there always exists a sparse subnetwork/subgraph with admirable performance in GNNs with random initialization. Such a pair of core subgraph and sparse subnetwork (called graph lottery tickets) can be uncovered by iteratively applying a novel sparsification method. While GLT provides new insights for GNN compression, it requires a full pretraining process to obtain graph lottery tickets, which is not universal and friendly to real-world applications. Moreover, the graph sparsification in GLT utilizes sampling techniques, which may result in massive information loss and aggregation failure. In this paper, we explore the searching of graph lottery tickets from a complementary perspective -transforming a random ticket into a graph lottery ticket, which allows us to more comprehensively explore the relationships between the original network/graph and their sparse counterpart. Compared to GLT, our proposal helps achieve a triple-win situation of graph lottery tickets with high sparsity, admirable performance, and good explainability. More importantly, we rigorously prove that our model can eliminate noise and maintain reliable information in substructures using the graph information bottleneck theory. Extensive experimental results on various graphrelated tasks validate the effectiveness of our framework.



. The success of GNNs mainly derives from a recursive neighborhood aggregation scheme, i.e., message passing, in which each node updates its feature by aggregating and transforming the features of its neighbors. However, GNNs suffer notoriously high computational overheads when scaling up to large graphs or with dense connections, since conducting message passing over large or dense graphs proves costly for training and inference Xu et al. ( 2018 



(GNNs) Kipf & Welling (2016); Hamilton et al. (2017) have recently emerged as the dominant model for a diversity of graph learning tasks, such as node classification Velickovic et al. (2017), link prediction Zhang & Chen (2019), and graph classification Ying et al.

); You et al. (2020). To alleviate such inefficiency, existing approaches mostly fall into two research lines -that is, they either simplify the graph structure or compress the GNN model. Within the first class, many studies Chen et al. (2018); Eden et al. (2018); Calandriello et al. (2018) have investigated the use of sampling to reduce the computational footprint of GNNs. These sampling-based strategies are usually integrated with mini-batch training schedule for local feature aggregation and updating. Another representative is graph sparsification techniques Voudigari et al. (2016); Zheng et al. (2020); Li et al. (2020b) which improve training or inference efficiency of GNNs by learning to remove redundant edges from input graphs. In contrast to simplifying the graph structure, there are much fewer prior studies on pruning or compressing GNNs Tailor et al. (2020), as GNNs are generally less parameterized than DNNs in other fields, e.g., computer vision Wen et al. (2016); He et al. (2017).

