(LA)YER-NEIGH(BOR) SAMPLING: DEFUSING NEIGHBORHOOD EXPLOSION IN GNNS

Abstract

Graph Neural Networks have recently received a significant attention, however, training them at a large scale still remains a challenge. Minibatch training coupled with sampling is used to alleviate this challenge. Even so existing approaches either suffer from the neighborhood explosion phenomenon or do not have good performance. To deal with these issues, we propose a new sampling algorithm called LAyer-neighBOR sampling (LABOR). It is designed to be a direct replacement for Neighbor Sampling with the same fanout hyperparameter while sampling upto 7× fewer vertices, without sacrificing quality. By design, the variance of the estimator of each vertex matches Neighbor Sampling from the point of view of a single vertex. Moreover under the same vertex sampling budget constraints, LA-BOR converges faster than existing layer sampling approaches and can use upto 112× larger batch size compared to Neighbor Sampling.

1. INTRODUCTION

Graph Neural Networks (GNN) Hamilton et al. (2017) ; Kipf & Welling (2017) have become de facto models for representation learning on graph structured data. Hence they have started being deployed in production systems Ying et al. (2018); Niu et al. (2020) . These models iteratively update the node embeddings by passing messages along the direction of the edges in the given graph with nonlinearities in between different layers. With l layers, the computed node embeddings contain information from the l-hop neighborhood of the seed vertex. In the production setting, the GNN models need to be trained on billion-scale graphs (Ching et al., 2015; Ying et al., 2018) . The training of these models takes hours to days even on distributed systems Zheng et al. (2022b; a) . As in general Deep Neural Networks (DNN), it is more efficient to use minibatch training (Bertsekas, 1994) on GNNs, even though it is a bit trickier in this case. The node embeddings in GNNs depend recursively on their set of neighbors' embeddings, so when there are l layers, this dependency spans the l-hop neighborhood of the node. Real world graphs usually have a very small diameter and if l is large, the l-hop neighborhood may very well span the entire graph, also known as the Neighborhood Explosion Phenomenon (NEP) (Zeng et al., 2020) . To solve these issues, researchers proposed sampling a subgraph of the l-hop neighborhood of the nodes in the batch. There are mainly three different approaches: Node-based, Layer-based and Subgraph-based methods. Node-based sampling methods (Hamilton et al., 2017; Chen et al., 2018a; Liu et al., 2020; Zhang et al., 2021) sample independently and recursively for each node. It was noticed that node-based methods sample subgraphs that are too shallow, i.e., with a low ratio of number of edges to nodes. Thus layer-based sampling methods were proposed (Chen et al., 2018b; Zou et al., 2019; Huang et al., 2018; Dong et al., 2021) , where the sampling for the whole layer is done collectively. On the other hand subgraph sampling methods (Chiang et al., 2019; Zeng et al., 2020; Hu et al., 2020b; Zeng et al., 2021) do not use the recursive layer by layer sampling scheme used in the node-and layer-based sampling methods and instead tend to use the same subgraph for all of the layers. Some of these sampling methods take the magnitudes of embeddings into account (Liu et al., 2020; Zhang et al., 2021; Huang et al., 2018) 



, while others, such as Chen et al. (2018a); Cong et al. (2021), cache the historical embeddings to reduce the variance of the computed approximate embeddings. There are methods sampling from a vertex cache Dong et al. (2021) filled with popular vertices. Most of these approaches are orthogonal to each other and they can be incorporated into other sampling algorithms. 1

