LOCAL CLUSTERING GRAPH NEURAL NETWORKS

Abstract

Graph Neural Networks (GNNs), which benefit various real-world problems and applications, have emerged as a powerful technique for learning graph representations. The depth of a GNN model, denoted by K, restricts the receptive field of a node to its K-hop neighbors and plays a subtle role in the performance of GNNs. Recent works demonstrate how different choices of K produce a trade-off between increasing representation capacity and avoiding over-smoothing. We establish a theoretical connection between GNNs and local clustering, showing that short random-walks in GNNs have a high probability to be stuck at a local cluster. Based on the theoretical analysis, we propose Local Clustering Graph Neural Networks (LCGNN), a GNN learning paradigm that utilizes local clustering to efficiently search for small but compact subgraphs for GNN training and inference. Compared to full-batch GNNs, sampling-based GNNs and graph partition-based GNNs, LCGNN performs comparably or even better, achieving state-of-the-art results on four Open Graph Benchmark (OGB) datasets. The locality of LCGNN allows it to scale to graphs with 100M nodes and 1B edges on a single GPU.

1. INTRODUCTION

Recent emergence of the Graph Neural Networks (GNNs), exemplified by models like ChebyNet (Defferrard et al., 2016) , GCN (Kipf & Welling, 2017) , GraphSAGE (Hamilton et al., 2017) , GAT (Veličković et al., 2018), and GIN (Xu et al., 2019) , has drastically reshaped the landscape of the graph learning research. These methods generalize traditional deep learning algorithms to model graph-structured data by combining graph propagation and neural networks. Despite its conceptual simplicity, GNNs have reestablished the new state-of-the-art methods in various graph learning tasks, such as node classification, link prediction, and graph classification (Hu et al., 2020; Dwivedi et al., 2020) , also served as key contributors to many real-world applications, such as recommendation system (Ying et al., 2018 ), smart transportation (Luo et al., 2020) , visual question answering (Teney et al., 2017) and molecular de-novo design (You et al., 2018) . With the growth of real-world social and information networks (Leskovec et al., 2005) , there is an urgent need to scale GNNs to massive graphs. For example, the recommendation systems in Alibaba (Zhu et al., 2019) and Pinterest (Ying et al., 2018) require training and inferring GNNs on graphs with billions of edges. Building such large-scale GNNs, however, is a notoriously expensive process. For instance, the GNN models in Pinterest are trained on a 500GB machine with 16 Tesla K80 GPUs, and served on a Hadoop cluster with 378 d2.8xlarge Amazon AWS machines. Although one may think model parameters are the main contributors to the huge resource consumption of GNNs, previous work (Ma et al., 2019) suggests the main bottleneck actually comes from the entanglement between graph propagation and neural networks, which leads to a large and irregular computation graph for GNNs. This problem is further exacerbated by the small-world phenomenon (Watts & Strogatz, 1998) , i.e., even a relatively small number of graph propagation can involve full-graph computation. For example, in Facebook college graphs of John Hopkins (Traud et al., 2012) , the 2-hop neighbors of node 1, as shown in Fig. 1a , covers 74.5% of the whole graph. A common strategy to reduce the overhead of GNNs is to make the graph smaller but may bring side effects. For instance, graph sampling techniques, such as neighborhood sampling in Graph-SAGE (Hamilton et al., 2017) , may lead to the high variance issue (Chen et al., 2018a) . Alternatively, graph partition techniques, such as METIS (Karypis & Kumar, 1998 ) that adopted by Cluster- GCN (Chiang et al., 2019) and AliGraph (Zhu et al., 2019) , essentially involves extra full-1

