RETHINKING GRAPH LOTTERY TICKETS: GRAPH SPARSITY MATTERS

Abstract

Lottery Ticket Hypothesis (LTH) claims the existence of a winning ticket (i.e., a properly pruned sub-network together with original weight initialization) that can achieve competitive performance to the original dense network. A recent work, called UGS, extended LTH to prune graph neural networks (GNNs) for effectively accelerating GNN inference. UGS simultaneously prunes the graph adjacency matrix and the model weights using the same masking mechanism, but since the roles of the graph adjacency matrix and the weight matrices are very different, we find that their sparsifications lead to different performance characteristics. Specifically, we find that the performance of a sparsified GNN degrades significantly when the graph sparsity goes beyond a certain extent. Therefore, we propose two techniques to improve GNN performance when the graph sparsity is high. First, UGS prunes the adjacency matrix using a loss formulation which, however, does not properly involve all elements of the adjacency matrix; in contrast, we add a new auxiliary loss head to better guide the edge pruning by involving the entire adjacency matrix. Second, by regarding unfavorable graph sparsification as adversarial data perturbations, we formulate the pruning process as a min-max optimization problem to gain the robustness of lottery tickets when the graph sparsity is high. We further investigate the question: Can the "retrainable" winning ticket of a GNN be also effective for graph transferring learning? We call it the transferable graph lottery ticket (GLT) hypothesis. Extensive experiments were conducted which demonstrate the superiority of our proposed sparsification method over UGS, and which empirically verified our transferable GLT hypothesis.

1. INTRODUCTION

Graph Neural Networks (GNNs) (Kipf & Welling, 2017; Hamilton et al., 2017) have demonstrated state-of-the-art performance on various graph-based learning tasks. However, large graph size and over-parameterized network layers are factors that limit the scalability of GNNs, causing high training cost, slow inference speed, and large memory consumption. Recently, Lottery Ticket Hypothesis (LTH) (Frankle & Carbin, 2019) claims that there exists properly pruned sub-networks together with original weight initialization that can be retrained to achieve comparable performance to the original large deep neural networks. LTH has recently been extended to GNNs by Chen et al. (2021b), which proposes a unified GNN sparsification (UGS) framework that simultaneously prunes the graph adjacency matrix and the model weights to accelerate GNN inference on large graphs. Specifically, two differentiable masks m g and m θ are applied to the adjacency matrix A and the model weights Θ, respectively, during end-to-end training by element-wise product. After training, lowest-magnitude elements in m g and m θ are set to zero w.r.t. pre-defined ratios p g and p θ , which basically eliminates low-scored edges and weights, respectively. The weight parameters are then rewound to their original initialization, and this pruning process is repeated until pre-defined sparsity levels are reached, i.e., graph sparsity 1 -∥m g ∥ 0 ∥A∥ 0 ≥ s g and weight sparsity 1 -∥m θ ∥ 0 ∥Θ∥ 0 ≥ s θ , where ∥.∥ 0 is the L 0 norm counting the number of non-zero elements. Intuitively, UGS simply extends the basic parameter-masking algorithm of Frankle & Carbin (2019) for identifying winning tickets to also mask and remove graph edges. However, our empirical study 1

