ITERATED GRAPH NEURAL NETWORK SYSTEM

Abstract

We present Iterated Graph Neural Network System (IGNNS), a new framework of Graph Neural Networks (GNNs), which can deal with undirected graph and directed graph in a unified way. The core component of IGNNS is the Iterated Function System (IFS), which is an important research field in fractal geometry. The key idea of IGNNS is to use a pair of affine transformations to characterize the process of message passing between graph nodes and assign an adjoint probability vector to them to form an IFS layer with probability. After embedding in the latent space, the node features are sent to IFS layer for iterating, and then obtain the high-level representation of graph nodes. We also analyze the geometric properties of IGNNS from the perspective of dynamical system. We prove that if the IFS induced by IGNNS is contractive, then the fractal representation of graph nodes converges to the fractal set of IFS in Hausdorff distance and the ergodic representation of that converges to a constant matrix in Frobenius norm. We have carried out a series of semi-supervised node classification experiments on citation network datasets such as citeser, Cora and PubMed. The experimental results show that the performance of our method is obviously better than the related methods.

1. INTRODUCTION

GNN (Scarselli et al., 2009) has been proved to be effective in processing graph structured data, and has been widely used in natural language processing, computer vision, data mining, social network and biochemistry. In recent years, GNN has developed a variety of architectures, such as GCN (Kipf & Welling, 2017) , GraphSAGE (Hamilton et al., 2017) , GAT (Veličković et al., 2018) , DGI (Veličković et al., 2019) , GIN (Xu et al., 2019) , GCNII (Ming Chen et al., 2020) and GEN (Li et al., 2020) . These architectures have a common feature, that is, the representation of each node is updated using messages from its neighbors but without distinguishing the direction (or angle) of message passing between two nodes. Recent studies have shown that considering directed message passing between nodes can improve the performance of GNN and achieve success in related fields. For example, DimeNet (Klicpera et al., 2020) considers the spatial direction from one atom to another and can learn both molecular properties and atomic forces. R-GCN (Schlichtkrull et al., 2018) and Bi-GCN (Marcheggiani & Titov, 2017; Fu et al., 2019) are models for directed graph, applied in the field of natural language processing. We note that the above direction based model does not consider the bidirectional mixed passing of messages. But in real life, message passing is interactive in different directions. For example, node A obtains a message from node B. After processing the message, node A not only passes it to the next node C, but also feeds back to node B. Suppose there are only two directions for message passing, forward and backward, represented by 0 or 1, respectively. The symbol space of the first generation message passing path is {0, 1} = {0, 1} 1 , and that of the second generation message passing path is {00, 01, 10, 11} = {0, 1} 2 . Generally, the symbol space of the n-th generation message passing path is {0, 1} n and the size of the symbol space is 2 n . This means that the scope of message passing spreads with exponent 2. However, in Bi-GCN (similar to Bi-LSTM) and R-GCN architectures, the symbol space is {{0} n , {1} n }, and its size is 2, which indicates that a lot of information will be lost in the process of message passing (see Appendix A). How to characterize the above message passing patterns? We use two mappings to represent message passing process in two directions. Then the interactive passing of messages in different directions is equivalent to the composite operation of corresponding mappings. In addition, the direction of only occurs in the same direction, but also occurs interactively in different directions, which is more in line with the actual situation. For example, in layer 1, node 2 passes the processed message f 1 (m 2 ) to node 1, and then, in layer2, node 1 processes the received message f 1 (m 2 ) and returns the processed message f 0 (f 1 (m 2 )) to node 2. message passing is often random, so we endow the two mappings with an adjoint probability vector to reflect the randomness. Because the symbol space of the iterative path of the Iterated Function System (IFS) with two mappings is also {0, 1} n and the mapping is selected with a certain probability, the iterative process of IFS is similar to the message passing process. In other words, the above message passing pattern can be described perfectly by IFS with probabilities. We naturally present the Iterative Graph Neural Network System (IGNNS), whose core layer is constructed by IFS. Figure 1 describes the differences in message passing patterns among GCN, Bi-GCN and IGNNS. At the same time, we regard undirected graph as a directed graph with equal probability of bidirectional message passing (see Figure 1 (a)), so the IGNNS architecture can handle directed graph and undirected graph in a unified way.

2. PRELIMINARIES

A graph G = (V, E) is defined by its note set V = {v 1 , v 2 , ..., v N } and edge set E = {(v i , v j )|v i , v j ∈ V }. Let A ∈ R N denote the adjacency matrix of G, providing with relational information between nodes. A[i, j] denote i, jth element of A, A[i, :] means the ith row, and A[:, j] means the jth column. In this paper, we assume that all nodes of G are self adjacent, that is A[i, i] = 1, i = 1, 2, ..., N . let D = diag(d 1 , d 2 , ..., d N ) be the degree matrix of A, where d i = N j=1 A[i, j]. Neighborhood Normalization. There are two ways to normalize A. One approach is the following mean-pooling employed by Hamilton et al. (2017) and Veličković et al. (2019) for inductive learning: A mp = D -1 A. Another approach is the following symmetric normalization employed by Kipf & Welling (2017) : A sym = D -1 2 AD -1 2 . Iterated Function System. A mapping f : R N → R N is said to be a contractive mapping on R N if there exists a constant 0 < c < 1 such that f (x 1 ) -f (x 2 ) 2 < c x 1 -x 2 2 for all x 1 , x 2 ∈ R N . An iterated function system (Hutchinson, 1981) is defined by IFS = {R N ; f 1 , f 2 , ..., f m ; p}, where each f i : R N → R N is a contractive mapping and p = (p 1 , p 2 , ..., p m ) is an adjoint probability vector meaning that f i is selected by probability p i for each iteration. Hutchinson (1981) showed that there exists a unique nonempty compact set F such that F = m i=1 f i (F).



Figure 1: Message passing patterns. Where the symbol H is the representations of all the notes. (a) An undirected graph is transformed into a directed graph in a natural way. (b) Regardless of direction, simply gather information from neighbors. (c) Message is passed in the same direction (forward or backward), and get two hidden representations independently. (d) Message passing notonly occurs in the same direction, but also occurs interactively in different directions, which is more in line with the actual situation. For example, in layer 1, node 2 passes the processed message f 1 (m 2 ) to node 1, and then, in layer2, node 1 processes the received message f 1 (m 2 ) and returns the processed message f 0 (f 1 (m 2 )) to node 2.

