SUBSTRUCTURED GRAPH CONVOLUTION FOR NON-OVERLAPPING GRAPH DECOMPOSITION Anonymous

Abstract

Graph convolutional networks have been widely used to solve the graph problems such as node classification, link prediction, and recommender systems. It is well known that large graphs require large amount of memory and time to train graph convolutional networks. To deal with large graphs, many methods are being done, such as graph sampling or decomposition. In particular, graph decomposition has the advantage of parallel computation, but information loss occurs in the interface part. In this paper, we propose a novel substructured graph convolution that reinforces the interface part lost by graph decomposition. Numerical results indicate that the proposed method is robust in the number of subgraphs compared to other methods.

1. INTRODUCTION

Graph convolutional networks (GCNs) (Kipf & Welling, 2017) are widely used in node classification (Xiao et al., 2022) , link prediction (Zhang & Chen, 2018), and recommender systems (Wu et al., 2022) . For a given graph, GCN constructs a renormalized graph Laplacian using the graph's adjacency matrix and uses it for layer propagation. Therefore, as the dimension of the adjacency matrix of the graph increases, more memory and time are required to train the network. There are two main types of research to solve the memory problem. The first is graph sampling methods (Hamilton et al., 2017; Chen et al., 2018; Ye et al., 2019; Zeng et al., 2020) . These methods basically create a subgraph at every iteration using an appropriate sampling algorithm like Deep-Walk (Perozzi et al., 2014) . The network is trained using this subgraph. GraphSAGE (Hamilton et al., 2017) used the edge information corresponding to a fixed-size neighborhood of uniformly sampled nodes. FastGCN (Chen et al., 2018) proposed the importance sampling and showed faster learning speed compared to GraphSAGE. VR-GCN (Ye et al., 2019) used the variance reduction technique to reduce the number of sampling nodes. GraphSAINT (Zeng et al., 2020) improved performance by using graph sampling instead of node sampling or edge sampling. Because the graph sampling method uses subgraphs to reduce memory usage, it is important to determine the number of samples. The higher the number of samples, the higher the performance is expected, but the slower the training speed and the memory is consumed. On the one hand, there is another approach to decompose the graph (Chiang et al., 2019) . The biggest advantage of the decomposition methods is that, unlike the sampling methods, it can be performed in advance before network training. A lot of research has been done on how to decompose the graph (Karypis & Kumar, 1998; Avery, 2011; Gonzalez et al., 2012) . Among them, METIS (Karypis & Kumar, 1998) , which can quickly decompose a graph using a multi-level structure, is widely used. In view of linear algebra, METIS derives a block diagonal matrix by performing a non-overlapping decomposition on the adjacency matrix of a given graph. ClusterGCN (Chiang et al., 2019) trains the network with a mini-batch gradient descent algorithm by performing block sampling on the block diagonal matrix generated by METIS. That is, this method trains the network by alternating block submatrices through random sampling. On the other hand, there is another way to train the network at once with the gradient descent algorithm by computing the block diagonal matrix for each block in parallel. A big difference from the alternating method is that it does not require inner iteration because it trains the network using all subgraphs at once and then merges them. However, non-overlapping decomposition drops blocks in off-diagonal part and does not supplement

