LINGUINE: LEARNING TO PRUNE ON SUBGRAPH CONVOLUTION NETWORKS

Abstract

Graph Convolutional Network (GCN) has become one of the most successful methods for graph representation learning. Training and evaluating GCNs on large graphs is challenging since full-batch GCNs have high overhead in memory and computation. In recent years, research communities have been developing stochastic sampling methods to handle large graphs when it is unreal to put the whole graph into a single batch. The performance of the model depends largely on the quality and size of subgraphs in the batch-training. Existing sampling approaches mostly focus on approximating the full-graph structure but care less about redundancy and randomness in sampling subgraphs. To address these issues and explore a better mechanism of producing high-quality subgraphs to train GCNs, we proposed the Linguine framework where we designed a metamodel to prune the subgraph smartly. To efficiently obtain the meta-model, we designed a joint training scenario with the idea of hardness based learning. The empirical study shows that our method could augment the accuracy of the current state-of-art and reduce the error incurred by the redundancies in the subgraph structure. We also explored the reasoning behind smart pruning via its visualization.

1. INTRODUCTION

Graph Representation Learning has attracted much attention from the research communities in recent years, with emerging new work every year. Graph Convolution Neural Networks (GCNs) were proposed as the extension of Convolutional Neural Networks(CNNs) (LeCun et al., 1995) on geometric data. The first spectral-based GCN was designed on Spectral Graph Theory (Bruna et al., 2013) and was extended by many following works (Henaff et al., 2015; Defferrard et al., 2016) . Over recent years, the spatial-based counterpart (Kipf & Welling, 2016a) gained more attention and had facilitated many machine learning tasks (Wu et al., 2020; Cai et al., 2018) including semisupervised node classification (Hamilton et al., 2017b ), link prediction (Kipf & Welling, 2016b; Berg et al., 2017) and knowledge graphs (Schlichtkrull et al., 2018) . In this work, we primarily focused on large-scale spatial-based GCNs (Hamilton et al., 2017a; Chen et al., 2018b; Gao et al., 2018; Huang et al., 2018; Zeng et al., 2019; Zou et al., 2019; Chiang et al., 2019) , where a given node aggregates hidden states from its neighbors in the previous layer, followed by a non-linear activation to obtain the topological representation. However, as the graph gets larger, GNN models suffer from the challenges imposed by limited physical memory and exponentially growing computation overhead. Recent work adopted sampling methods to handle the large volume of data and facilitate batch training. The majority of them could be classified as 3 types, layer-wise sampling (Hamilton et al., 2017a; Gao et al., 2018; Huang et al., 2018; Zou et al., 2019) , node-wise sampling (Chen et al., 2018b) and subgraph sampling (Chiang et al., 2019; Zeng et al., 2019) . In layer-wise sampling, we take samples from the neighbors of a given node in each layer. The number of nodes is growing exponentially as the GCNs gets deeper, which resulted in 'neighbor explosion'. In node-wise sampling, the nodes in each layer are sampled independently to form the structure of GCNs, which did avoid 'neighbor explosion'. But the GCN's structure is unstable and resulted in inferior convergence. In subgraph sampling, the GCNs are trained on a subgraph sampled on the original graph. The message was passed within the subgraph during training. This approach resolved the problem of neighbor explosion and can be applied to training deep GCNs. However, the subgraph's structure and connectivity had a great

