GRAPH DEFORMER NETWORK

Abstract

Convolution learning on graphs draws increasing attention recently due to its potential applications to a large amount of irregular data. Most graph convolution methods leverage the plain summation/average aggregation to avoid the discrepancy of responses from isomorphic graphs. However, such an extreme collapsing way would result in a structural loss and signal entanglement of nodes, which further cause the degradation of the learning ability. In this paper, we propose a simple yet effective graph deformer network (GDN) to fulfill anisotropic convolution filtering on graphs, analogous to the standard convolution operation on images. Local neighborhood subgraphs (acting like receptive fields) with different structures are deformed into a unified virtual space, coordinated by several anchor nodes. In space deformation, we transfer components of nodes therein into affinitive anchors by learning their correlations, and build a pseudo multi-granularity plane calibrated with anchors. Anisotropic convolutional kernels can be further performed over the anchor-coordinated space to well encode local variations of receptive fields. By parameterizing anchors and stacking coarsening layers, we build a graph deformer network in an end-to-end fashion. Theoretical analysis indicates its connection to previous work and shows the promising property of isomorphism testing. Extensive experiments on widely-used datasets validate the effectiveness of the proposed GDN in node and graph classifications.

1. INTRODUCTION

Graph is a flexible and universal data structure consisting of a set of nodes and edges, where node can represent any kind of objects and edge indicates some relationship between a pair of nodes. Research on graphs is not only important in theory, but also beneficial to in wide backgrounds of applications. Recently, advanced by the powerful representation capability of convolutional neural networks (CNNs) on grid-shaped data, the study of convolution on graphs is drawing increasing attention in the fields of artificial intelligence and data mining. So far, Many graph convolution methods (Wu et al., 2017; Atwood & Towsley, 2016; Hamilton et al., 2017; Velickovic et al., 2017) have been proposed, and raise a promising direction. The main challenge is the irregularity and complexity of graph topology, causing difficulty in constructing convolutional kernels. Most existing works take the plain summation or average aggregation scheme, and share a kernel for all nodes as shown in Fig. 1(a) . However, there exist two nonignorable weaknesses for them: i) losing the structure information of nodes in the local neighborhood, and ii) causing signal entanglements of nodes due to collapsing to one central node. Thereby, an accompanying problem is that the discriminative ability of node representation would be impaired, and further non-isomorphic graphs/subgraphs may produce the same responses. Contrastively, in the standard convolutional kernel used for images, it is important to encode the variations of local receptive fields. For example, a 3 × 3 kernel on images can well encode local variations of 3 × 3 patches. An important reason is that the kernel is anisotropic to spacial positions, where each pixel position is assigned to a different mapping. However, due to the irregularity of graphs, defining and operating such an anisotropic kernel on graphs are intractable. To deal with this problem, Niepert et al. (Niepert et al., 2016) attempted to sort and prune neighboring nodes, and then run different kernels on the ranked size-fixed nodes. However, this deterministic method is sensitive to node ranking and more prone to being affected by graph noises. Furthermore, some graph convolution methods (Velickovic et al., 2017; Wang et al., 2019) introduce an attention mechanism to learn the importances of nodes. Such methods emphasize on mining those significant struc- In our method, the irregular neighborhood is deformed into a unified anchor space, which is a pseudo-grid shape, and then the anisotropic convolution kernel is used to encode the space variations of deformable features. tures/features rather than designing anisotropic convolution kernels, so they cannot well represent local variations of structures in essence. In this work, we propose a novel yet effective graph deformer network (GDN) to implement anisotropic convolutional filtering on graphs as shown in Fig. 1 (b), exactly behaving like the standard convolution on images. Inspired by image-based convolution, we deform local neighborhoods of different sizes into a virtual coordinate space, implicitly spanned by several anchor nodes, where each space granularity corresponds to one anchor node. In order to perform space transformation, we define the correlations between neighbors and anchor nodes, and project neighboring nodes into the regular anchor space. Thereby, irregular neighborhoods are deformed into the anchorcoordinated space. Then, the image-like anisotropic convolution kernels can be imposed on the anchor-coordinated plane, and local variations of neighborhoods can be perceived effectively. Due to the importance of anchors, we also deform anchor nodes with adaptive parameters to match the feature space of nodes. As anisotropic convolution kernels are endowed with the fine-grained encoding ability, our method can better perceive subtle variations of local neighborhood regions as well as reduce signal confusion. We also show its connection to previous work, and theoretically analyze the stronger expressive power and the satisfactory property of the isomorphism test. Extensive experiments on graph/node classification further demonstrate the effectiveness of the proposed GDN.

2. OUR APPROACH

In this section, we elaborate on the proposed graph deformer method. Below we first give an abstract formulation for our method and then elaborate on the details. Denote G = (V, E) as an undirected graph , where V represents a set of nodes with |V| = n and E is a set of edges with |E| = e. According to the link relations in E, the corresponding adjacency matrix can be defined as A ∈ R n×n . And X ∈ R n×d is the feature matrix. To state conveniently, we use X i• or x i to denote the feature of the i-th node. Besides, for a node v i , the first-order neighborhood consists of nodes directly connected to v i , which is denoted as N 1 vi = {v j |(v j , v i ) ∈ E}. Accordingly, we can define s-order neighborhood N s vi as the set of s-hop reachable nodes.

2.1. A BASIC FORMULATION

Given a reference node v r in graph G, we need to learn its representation based on the node itself as well as its contextual neighborhood N vr . However, the irregularity causes difficulty in designing anisotropic spatial convolution. To address this problem, we introduce anchor nodes to deform the neighborhood. All neighboring nodes are calibrated into a pseudo space spanned by anchors. We denote the set of anchor nodes by V = {v 0 , v 1 , ..., v m-1 }. The convolution on N vr is formulated as: x r = (G * f )(v r ) = C(F (r) , K), F (r) i = vt∈Nv r D vt→vi (x i , x t , Θ), where • F (r) ∈ R m×d : the deformed multi-granularity feature from the neighborhood of node v r . Each granularities F (r) • v • , i corresponds to an anchor node v i . • C, K: the anisotropic convolution operation on anchor space and convolution kernel. G * f represents filter f acting on graph G. The relationship between anchor nodes can be built by some metrics such as Cosine distance, and anchor nodes may be format as a pseudo 2-D grid just like the patch in images. Please see the details in Section 2.3.2.

2.2. ANCHOR GENERATION

Anchor nodes are crucial to the graph convolution process, because neighborhood regions are unitedly calibrated with them. Rigid anchors will not adapt to the variations of the feature space during convolution learning. Thus we choose to optimize anchor nodes as one part of the entire network learning. In the beginning, we cluster some nodes randomly sampled from the graph as initial anchors. When enough anchors cover the space of neighborhood nodes, the anchors can be endowed with a strong expressive ability to encode neighborhoods like a code dictionary. Formally, we use the K-means clustering to generate initial anchors, V ← Clustering {(v i , x i )|v i ∈ V sampling }, where V sampling are the sampled node set, in which each node is randomly sampled from the graph, V = {(v k , x k )}| m-1 k=0 is the initial anchor set generated by clustering, in which v k represents kth anchor node and x k represents its feature vector, m is the number of anchor nodes. Note that when given anchor nodes, the response of our method will be invariant however to permute nodes of one graph during the training stage as well as testing stage. The clustering algorithm might affect the final anchors due to random sampling for initialization, but it cannot affect the property of permutation invariance, which just like random initialization on the network parameters. A larger m could increase the expression capacity of anchors, but causes more redundancy and a larger computational cost. Due to the sparsity of graphs, in practice, several anchors are sufficient to encode each neighborhood region. To better collaborate with node/feature variations during graph convolution learning, we transform the initial anchors into a proper space by parameterizing them: a k = ReLU (W A x k + b A ), k = 0, 1, • • • , m -1, where W A , b A are the learnable parameters, and ReLU is the classic activation function. Besides, other flexible multi-layer networks may be selected to learn deformable anchors.

2.3.1. SPACE TRANSFORMATION

Now we define the deformer function D in Eqn. ( 2), which transforms neighborhood nodes to the anchor space. For each node v j ∈ N vr , we derive the anchor-related feature (also query feature) and value feature vectors as q j = ReLU (W Q x j + b Q ), j = 0, 1, • • • , n r -1, (5) u j = ReLU (W U x j + b U ), j = 0, 1, • • • , n r -1, where W Q , W U are the learnable weight matrices, and b Q , b U are the biases. The query feature q j indicates how to transform v j to the anchor space by interacting with anchors, and the value vector u j is the transformable component to the anchor space. For the neighborhood N vr , the correlation to anchors defines a set of weights α = {α 1,1 , • • • , α 1,m , • • • , α nr,1 , • • • , α nr,m }, which measures the scores of all nodes within the neighborhood projected onto the directions of anchor nodes. Formally, α j,k = exp( q j , a k ) k exp( q j , a k ) , k = 0, 1, • • • , m -1, where •, • denotes the inner production, then normalization is done by softmax function. α j,k may be viewed as the attention score of the node v j w.r.t. the anchor v k . After obtaining the attention score, the irregular neighborhood can be transformed into the anchor-coordinated space, u k = j α j,k u j , j = 0, 1, • • • , n r -1. The deformed components are accumulated on each anchor, and form the final deformed features. Thus, any neighborhood with different sizes can be deformed into the virtual normalized space coordinated by anchors. In experiment, for simplicity, the query feature and value feature are shared with the same parameters in Eqns. ( 5) and (6).

2.3.2. ANISOTROPIC CONVOLUTION IN THE ANCHOR SPACE

Afterward, s-hop neighborhood of node v r is deformed into the size-fixed anchor space, i.e., N s vr → { u 0 , u 1 , • • • , u m-1 }. The anisotropic graph convolution can be implemented by imposing different mapping on each anchor as x (s) r = ReLU ( i K i u i + b), i = 0, 1, • • • , m -1, (s) r ∈ R d , the matrix K i is a d × d weight parameter imposed on the features w.r.t. an anchor, and b is the bias vector. In the convolution process, different filter weights are imposed on different features of anchor nodes, which is an anisotropic filtering operation. For an intuitive description, we assume the simplest case of 2-D space, which of course can be extended higher dimension. Assuming in 2-D, we project all neighborhood nodes onto anchor nodes, then employ different filters on different anchor nodes in 2-D plane, which likes the standard convolution, so called anisotropic convolution. In contrast to the traditional aggregation method, the deformer convolution has two aspects of advantages: i) well preserving structure information and reducing signal entanglement; ii) transforming different-sized neighborhoods into the size-fixed anchor space to well advocate anisotropic convolution like the standard convolution on images.

2.3.3. MULTI-SCALE EXTENSION

Intuitively, the first-order neighborhood is necessary to be used for node aggregation, because it indicates that two nodes linked by an edge are always similar. However, real-world graphs are often so sparse, and there exist many nodes that are similar to each other but not linked by direct edges. The first-order neighborhood alone is not sufficient for extracting useful features and preserving the structural information. It is natural to incorporate higher-order proximity to capture more information. Generally, second-order information is sufficient as most works (Tang et al., 2015; Wang et al., 2016) . Higher-order information can also be considered, but the computational complexity will increase. It can be understood as a trade-off of expression ability and computational complexity. In this paper, we consider both first-order and second-order neighborhoods. Specifically, we deform both first-order and second-order neighborhoods into feature space represented by anchor nodes, and convolve over them respectively. Then the learned different neighborhood representations and the original node feature are concatenated as the final filtering response, x r ← [x r ; x (1) r ; x (2) r ], where x r denotes the convolution response on the s-order neighborhood N s vr of node v r . Further, we can stack multiply layers to extract more robust features on larger receptive fields.

2.4. COARSENING

Graph coarsening can not only reduce the computational cost but also enlarge the receptive field to learn abstract features like image pooling. Below we simply introduce this operation used here. Node Classification. We do not need to remove nodes, and thus name it as pooling. The pooling is node-wise diffusion on a local region, and be performed over multi-scale neighborhoods. The pooling over S scale neighborhoods w.r.t. the reference node v r is P(G(v r )) = P({x j |v j ∈ N s vr , s = 1, • • • , S}), where the pooling P is usually defined as "max" or "mean". In practice, their performance has little difference in graph convolution, so we choose the mean operation in our experiments. Graph Classification. We employ the graph cut method used in (Jiang et al., 2019) to partition an entire graph into several subgraphs. During graph coarsening, a binary cluster matrix Z ∈ R n×c is obtained, where only one element in each row is non-zero, i.e., Z ic = 1 when the vertex v i falls into the cluster c. Then the adjacent matrix and feature matrix of the input graph are transformed into A ← Z AZ, X ← Z ⊗ X, where ⊗ represents a max operation. The output can be used as the input of the next convolutional layer. Then the graph convolution and coarsening can be alternatingly stacked into a deep network.

2.5. LOSS FUNCTION

For node and graph classifications, the final convolution output is denoted as Y after feeding several network layers forward. Then we use the cross-entropy loss on the training set D tr , L = 1 |D tr | vi∈Dtr (Y ij == 1) ln Y ij . However, in the scenario of semi-supervised node classification, a main limit is that a small portion of nodes is annotated as the training set. Our aim is to use labeled nodes as well as graph structure to train a model with a good generalization ability. A straightforward way is to add a regularization term to avoid overfitting. To this end, we employ a global consistency constraint through positive pointwise mutual information as used in (Zhuang & Ma, 2018) to regularize the loss function.

2.6. COMPUTATIONAL COMPLEXITY

For the computational complexity, we analyze the main module of graph convolution. In one-layer convolution, GCN (Kipf & Welling, 2016) is about O(edS + ndd S), where n, e are the numbers of nodes and edges, S is the scale of the neighborhood, and d, d are the dimensions of the input/hidden layers. For our GDN model, the computational complexity is mainly from two parts, i.e., "Space Transformation" and "Convolution in Anchor Space", which are about O(emd 2 S) and O(nmdd S), respectively, where m is the number of anchor nodes. Thus, the total computational complexity is O(emd 2 S + nmdd S), which is linearly proportional to GCN with the factor m when d and d have the same order number.

3. CONNECTION TO RELATED WORK

In contrast to previous methods (Kipf & Welling, 2016; Xu et al., 2019; Zhuang & Ma, 2018) , etc., the way of aggregation is obviously different. GCNs usually bypass the irregularity of graphs by utilizing a weighted-sum aggregation over the neighborhoods, which is an isotropic filter. Our GDN instead firstly transforms the local neighborhoods into a regular anchor space and then performs anisotropic filters on the regular anchor space. In contrast to Transformer (multi-head attention mechanism) (Vaswani et al., 2017) and its a generalization, graph attention network (Velickovic et al., 2017) , we give the following differences. In the Transformer, all center nodes are anchor nodes, and the attention coefficient is computed between each central node and its neighbor nodes. In our GDN, several anchor nodes are initially generated by K-means Cluster from global nodes which implicitly represents several different directions like a k × k patch (upper left, upper right, etc.) in images, and the attention coefficient is computed between local neighbors and anchor nodes. Because the anchor nodes are generated from the global graph, all local neighborhoods are projected into a common anchor space, some common property for all local neighborhood can be captured by imposing an anisotropic filter on anchors. In contrast, though multi-head mechanism is used, the Transformer is still locally aggregated. Another weakness of the Transformer is computationally intensive, especially for large graphs. Our proposed GDN is different form other peer works. P-GNN (You et al., 2019) selects several anchor nodes as position characterization, and concatenates the feature of node with features of anchor nodes, then mean aggregation is employed on different anchor sets followed by a fully-connected transform, which is an isotropic filter. LGCN (Gao et al., 2018) performs feature selection by sorting the top k-largest values on each feature dimension, and produces k new nodes (fixed size) for the next filtering process. CapsGNN (Xinyi & Chen, 2018 ) presents a capsule graph network to learn graph capsules of different aspects from node capsules by utilizing an attention module and routing mechanism. In contrast, our proposed method transforms local neighborhood into an anchor space spanned by several anchor nodes, the filtering is operated in the anchor space. Transforming irregular structures into a regular space also be studied in 3D domain. PointCNN (Li et al., 2018b ) learns an X-transformation from the input points and then applies convolution operator on the X-transformed features. KPConv (Thomas et al., 2019) takes radius neighborhoods as input and processes them with weights spatially located by a small set of kernel points. In essence, PointCNN (Li et al., 2018b) leverages the self-attention mechanism to produce fixed-size output for different-size local neighbor regions. KPConv (Thomas et al., 2019) projects neighbor 3D points into 3D kernel points, and this projection is only operated in 3D point space. In contrast, our proposed method focus on the more general graph domain. More related work can be found in Appendix.

4. THEORETICAL ANALYSIS

Here, we present a theoretical analysis about the expressive power of several aggregation methods including the mean, sum, and proposed graph deformer operation. Inspired by (Xu et al., 2019) , we evaluate them by verifying whether graphs are isomorphic. Then, we give the following propositions. Proposition 1. There exists a set of network parameters able to graph deformer process can distinguish two non-isomorphic graphs G 1 and G 2 , which cannot be distinguished by mean/sum aggregation. Proposition 2. The proposed anisotropic graph deformer convolution can be as powerful as the Weisfeiler-Lehman (WL) graph isomorphism test. The Proofs of the above two propositions can be found in the Appendix. We can draw the conclusion that the expressive power of the proposed graph deformer network is provably stronger than mean/sum aggregation, which can accomplish an injective as powerful as the Weisfeiler-Lehman (WL) graph isomorphism test. (Zhuang & Ma, 2018) 83.5 72.6 80.0

Spacial

MoNet (Monti et al., 2017) 81.7 -78.8 GAT (Velickovic et al., 2017) 83.0 72.5 79.0 JK-Net (Xu et al., 2018) 79.71 ± 0.62 69.03 ± 0.55 78.17 ± 0.27 GIN (Xu et al., 2019) 79 

5. EXPERIMENTS

In the section, we carry out extensive experiments to assess the proposed GDN model on both node and graph classification tasks. For node classification, three citation graphs are used: Cora, Citeseer, and Pubmed. For graph classification, we adopt seven datasets to assess our GDN method: MUTAG, PTC, NCI1, PROTEINS, ENZYMES, IMDB-BINARY, and IMDB-MULTI. The details of these datasets and experimental setups can be found in the Appendix.

5.1. COMPARISON WITH STATE-OF-THE-ARTS

Node classification. We compare the performance of GDN against several baseline works: ChebyNet (Defferrard et al., 2016) , GCN (Kipf & Welling, 2016) , MoNet (Monti et al., 2017) , GAT (Velickovic et al., 2017) , DGCN (Zhuang & Ma, 2018) , JK-Net (Xu et al., 2018) and GIN (Xu et al., 2019) . The accuracies are reported in Table 1 , which clearly indicates that our GDN obtains a remarkable improvement. DeepWalk and Planetoid aim to generate effective node embeddings, and our proposed GDN significantly outperforms these two methods. GCN is a first-order approximation of ChebyNet and has realized relatively higher results. Simultaneously, it can be regarded as a special case of Monet, and their classification accuracies are similar. Compared to GCN, our GDN achieves a relatively large gain. We attribute this improvement to the graph deformer convolution. We further compare GDN to GAT, still achieving superior performance on these three datasets. Though GDN utilizes global consistency constraint, there is still a marked improvement compared to DGCN. Compared to recent methods JK-Net and GIN, GDN also obtains a large margin over them. These demonstrate that the proposed GDN method performs well on various graph datasets by building the graph deformer process, where structure variations can be well captured and fine-grained node features can be extracted to enhance the discriminability between nodes.

Graph classification.

Table 2 shows the results on graph classification. Overall, except for PTC, our GDN approach achieves state-of-the-art performance on all datasets and obtains remarkable improvement. For graph kernel-based methods (WL (Shervashidze et al., 2011) , GK (Shervashidze et al., 2009) and DGK (Yanardag & Vishwanathan, 2015) ), we can observe the WL kernel can obtain better results on most datasets than GK and DGK. In contrast to WL, the proposed GDN is able to improve by a large margin of 5.9% on NCI1, 14.35% on ENZYMES, 6.44% on IMDB-BINARY, etc. For the feature-based methods (FB (Barnett et al., 2016) , DyF (Gomez et al., 2017) ), GDN obviously outperforms them. Also, GDN is better than SAEN (Orsini et al., 2017) . Recently the GNN-based work (PSCN (Niepert et al., 2016) , NgramCNN (Luo et al., 2017) , IGN (Maron et al., 2018) , PPGN (Maron et al., 2019) , GNTK (Du et al., 2019) , CapsGNN (Xinyi & Chen, 2018) , GIC (Jiang et al., 2019) , GIN (Xu et al., 2019) ) is superior to traditional machine learning methods. Compared to GIC, the GDN model still achieves superior performances, about 3 percentages on average, although a relatively lower result is gotten on the PTC dataset. This may be attributed to differences in the dataset or less appropriate model parameter settings. Compared with these baseline methods, our GDN can render impressive performance. In summary, the remarkable gains indicate that the proposed GDN is effective to deal with graph classification. The number of anchor nodes 

5.2. ABLATION STUDY

The scale s of neighborhood region. The influences of neighborhood scales are reported in Table 3 . GDN-N (0) denotes that only the feature of the node itself is used, GDN-N (0,1) includes the features of the node itself and first-order neighborhood, and so on. Due to the lack of structural information, we find that the performance of GDN-N (0) is obviously lower. As more information is considered, the accuracy of GDN-N (0,1,2) is generally superior to GDN-N (0,1) . This validates the importance of local neighborhood information, which is also a crucial property of traditional CNNs. The number m of anchor nodes. We select the value m in the range [1, 25] to observe the changes of performance. As shown in Fig. 2 , when m = 1, the accuracies are significantly lower, because only one anchor node is used, which is similar to sum aggregation. When m = 2, the performance is also relatively lower, which means 2 anchor nodes are insufficient to capture more variations. Then, the performance is relatively stable with m increasing. The reason should be that, as realworld graph data is rather sparse, e.g., on average about 2 edges for each node in Cora and Citeseer datasets, so a few anchors matching the neighborhood size should be saturated to represent the variations without the information loss. Graph convolution and graph coarsening. We further explore the effectiveness of graph convolution by removing the pooling layer from the GDN model, named as "GDN w/o P". Similarly, "GDN w/o C" means that graph convolution is removed. Table 4 shows the performance on citation datasets. Compared to GDN, both "GDN w/o P" and "GDN w/o C" obtain lower performances. But "GDN w/o P" is better than "GDN w/o C", while they are comparable on the Pubmed dataset. It indicates that the deformer convolution indeed improves the discriminability of nodes. Note that "GDN w/o C" is actually similar to the plain aggregation with the average operation. Attention scores α. We visualize correlation scores α between neighborhood nodes and anchors. We respectively select some nodes from Cora and Citeseer datasets as center nodes and compute the scores of their first-order neighbors to anchor nodes. As shown in Fig. 3 , for node V 20 in Cora dataset, the attention score of the neighbor "A" on anchor node a 4 is largest while the other four neighbors is close to anchor node a 1 . For node v 153 in Citeseer, the neighbors "A" and "C" are more inclined to anchor node a 6 , while neighbor "D" prefers a 12 , and neighbor "B" is similar in most directions. These neighbors place different emphases on anchor nodes, then different proportions of features are assigned to the directions of these anchors, and an anisotropic convolution is used to extract more fined-grained representation, which is superior to the sum/mean aggregation. 

6. CONCLUSION

In this work, analogous to the standard convolution on images, we proposed a novel yet effective graph deformer network (GDN) to fulfill anisotropic convolution filtering on graphs. We transformed local neighborhoods with different structures into a unified virtual space, coordinated by several anchor nodes. Anisotropic convolution kernels can thus be performed over the anchorcoordinated space to well encode subtle variations of local neighborhoods. Further, we built a graph deformer network in an end-to-end learning fashion by stacking the deformable convolutional layers as well as the coarsening layers. Our proposed GDN accomplishes significantly better performances on both node and graph classifications. In the future, we will extend the graph deformer method to allow more applications in the real world, such as link prediction, heterogeneous graph analysis, etc.



Figure 1: An illustration of ours vs the previous convolution. The red node is a reference node.(a) In traditional graph convolution, the convolution kernel is shared for all nodes due to the plain aggregation over all nodes in the neighborhood. (b) In our method, the irregular neighborhood is deformed into a unified anchor space, which is a pseudo-grid shape, and then the anisotropic convolution kernel is used to encode the space variations of deformable features.

Figure 2: Comparison on the number m of anchor nodes.

Figure 3: Visualization of attention scores in the first-order neighborhood of nodes v 20 and v 30 in the Cora dataset, and v 54 and v 153 in the Citeseer dataset. (a) Node v 20 has 5 neighbors while (b) node v 30 has 6 neighbors. (c) Node v 54 has 3 neighbors while (d) node v 153 has 4 neighbors.

Comparison with state-of-the-art methods on node classification. The number in parentheses ( * ) denotes the number of convolutional layers in a certain network.

Comparison with state-of-the-art methods on graph classification.

Comparison on the scales of neighborhood regions.

The verification of convolutional layer C and pooling layer P in our GDN method.

