SUBSTRUCTURED GRAPH CONVOLUTION FOR NON-OVERLAPPING GRAPH DECOMPOSITION Anonymous

Abstract

Graph convolutional networks have been widely used to solve the graph problems such as node classification, link prediction, and recommender systems. It is well known that large graphs require large amount of memory and time to train graph convolutional networks. To deal with large graphs, many methods are being done, such as graph sampling or decomposition. In particular, graph decomposition has the advantage of parallel computation, but information loss occurs in the interface part. In this paper, we propose a novel substructured graph convolution that reinforces the interface part lost by graph decomposition. Numerical results indicate that the proposed method is robust in the number of subgraphs compared to other methods.

1. INTRODUCTION

Graph convolutional networks (GCNs) (Kipf & Welling, 2017) are widely used in node classification (Xiao et al., 2022) , link prediction (Zhang & Chen, 2018), and recommender systems (Wu et al., 2022) . For a given graph, GCN constructs a renormalized graph Laplacian using the graph's adjacency matrix and uses it for layer propagation. Therefore, as the dimension of the adjacency matrix of the graph increases, more memory and time are required to train the network. There are two main types of research to solve the memory problem. The first is graph sampling methods (Hamilton et al., 2017; Chen et al., 2018; Ye et al., 2019; Zeng et al., 2020) . These methods basically create a subgraph at every iteration using an appropriate sampling algorithm like Deep-Walk (Perozzi et al., 2014) . The network is trained using this subgraph. GraphSAGE (Hamilton et al., 2017) used the edge information corresponding to a fixed-size neighborhood of uniformly sampled nodes. FastGCN (Chen et al., 2018) proposed the importance sampling and showed faster learning speed compared to GraphSAGE. VR- GCN (Ye et al., 2019) used the variance reduction technique to reduce the number of sampling nodes. GraphSAINT (Zeng et al., 2020) improved performance by using graph sampling instead of node sampling or edge sampling. Because the graph sampling method uses subgraphs to reduce memory usage, it is important to determine the number of samples. The higher the number of samples, the higher the performance is expected, but the slower the training speed and the memory is consumed. On the one hand, there is another approach to decompose the graph (Chiang et al., 2019) . The biggest advantage of the decomposition methods is that, unlike the sampling methods, it can be performed in advance before network training. A lot of research has been done on how to decompose the graph (Karypis & Kumar, 1998; Avery, 2011; Gonzalez et al., 2012) . Among them, METIS (Karypis & Kumar, 1998) , which can quickly decompose a graph using a multi-level structure, is widely used. In view of linear algebra, METIS derives a block diagonal matrix by performing a non-overlapping decomposition on the adjacency matrix of a given graph. ClusterGCN (Chiang et al., 2019) trains the network with a mini-batch gradient descent algorithm by performing block sampling on the block diagonal matrix generated by METIS. That is, this method trains the network by alternating block submatrices through random sampling. On the other hand, there is another way to train the network at once with the gradient descent algorithm by computing the block diagonal matrix for each block in parallel. A big difference from the alternating method is that it does not require inner iteration because it trains the network using all subgraphs at once and then merges them. However, non-overlapping decomposition drops blocks in off-diagonal part and does not supplement information about this part. Therefore, as the number of blocks increases, the amount of information lost increases, which also affects training of the network. In the field of numerical analysis, there are substructuring methods (Bramble et al., 1986; Farhat & Roux, 1991 ) that additionally use information on the interface part in the domain that has undergone non-overlapping decomposition. Assuming that the interface part is sparse when appropriate nonoverlapping decomposition is performed, the added computation and communication costs are very small. Therefore, although the interface part requires sequential computation, it does not become a bottleneck in the overall parallel structure. Motivated by the substructuring method, we modify the graph convolution with the block diagonal adjacency matrix generated by non-overlapping decomposition. That is, a substructure using the interface adjacency matrix is added to the graph convolution. We call a graph convolution with this added substructure a substructured graph convolution. A simple linear algebra calculation shows that the sum of the outputs of the aggregate using the block diagonal adjacency matrix and the interface adjacency matrix is different from the output of the aggregate using the original adjacency matrix. Therefore, to compensate for this difference, a weighted sum is performed by calculating coefficients by referring to the attention module that shows good performance in natural language processing (Vaswani et al., 2017) and image classification (Hu et al., 2018) . From the numerical results, it can be confirmed that the proposed graph convolution adequately complements the interface part. The rest of this paper is organized as follows. In Section 2, we introduce an abstract non-overlapping graph decomposition framework and two methods for training a given network with decomposed graphs. We present the substructured graph convolution in Section 3. Improved node classification accuracy or F1-score of the proposed graph convolution applied to GCN, GCNII, GAT, and SGC using various datasets is presented in Section 4. We conclude this paper with remarks in Section 5.

2. NON-OVERLAPPING GRAPH DECOMPOSITION

In this section, we briefly introduce an algebraic framework of non-overlapping graph decomposition. We then describe two methods for training the graph convolutional networks using the decomposed graphs.

2.1. ALGEBRAIC FRAMEWORK

Let A ∈ R n×n be an adjacency matrix of a given graph consisting of n nodes. Without loss of generality, let the graph be uniformly decomposed so that each subgraph has n/N nodes for a positive integer N . Let R k : R n → R n/N be the restriction operator onto k-th subgraph. We construct a non-overlapping decomposition of given adjacency matrix A under the node decomposition setting. A subgraph adjacency matrix A k ∈ R n/N ×n/N is defined by A k = R k AR T k , k = 1, • • • , N. (2.1) The non-overlapping decomposition A of A with subgraph adjacency matrices (2.1) is given by A = N k=1 R T k A k R k . (2.2) Then A becomes the adjacency matrix of the graph consisting of subgraphs having A 1 , • • • , A N as adjacency matrices. We define this graph as a non-overlapping graph decomposition for a given graph. For the block matrix representation  A = [A ij ]



1≤i,j≤N = A 11 A 12 . . . A 1N A 22 A 22 . . . A 2N

