RICCI-GNN: DEFENDING AGAINST STRUCTURAL AT-TACKS THROUGH A GEOMETRIC APPROACH

Abstract

Graph neural networks (GNNs) rely heavily on the underlying graph topology and thus can be vulnerable to malicious attacks targeting at perturbing graph structures. We propose a novel GNN defense algorithm against such attacks. In particular, we use a robust representation of the input graph based on the theory of graph Ricci flow, which captures the intrinsic geometry of graphs and is robust to structural perturbation. We propose an algorithm to train GNNs using re-sampled graphs based on such geometric representation. We show that this method substantially improves the robustness against various adversarial structural attacks, achieving state-of-the-art performance on both synthetic and real-world datasets.

1. INTRODUCTION

Recent years we have witnessed the success of graph neural networks (GNNs) on many graph applications including graph classification (Xu et al., 2019b) , node classification (Kipf & Welling, 2016; Veličković et al., 2018) , graph generation (You et al., 2018) and recommendations (Ying et al., 2018) . As GNNs have shown great potentials, their vulnerability to adversarial attacks (Szegedy et al., 2014; Goodfellow et al., 2015) becomes a serious concern that hinders their deployment in real life critical applications. For example, a GNN algorithm for fraud detection in financial transaction graphs (Wang et al., 2019a) needs to be robust against attacks aiming at disguising fraud transactions as normal ones. In health informatics, prediction of polypharmacy side effects (Zitnik et al., 2018) must be robust against attacks that intend to endanger certain patients. In a recommendation system, the developers need to consider potential attacks from spammers who may create fake followers to increase the influence scope of fake news (Zhou & Zafarani, 2018) . One way to attack a GNN model is to modify the graph topology by inserting or deleting edges (Jin et al., 2020a) . A small perturbation of the network topology can significantly impair the graph neural network's performance (Dai et al., 2018; Zügner & Günnemann, 2019b) . For example, Meta-Attack (Zügner & Günnemann, 2019a) can increase the misclassification rate of GCN on a political blog data set by over 18% with only 5% perturbed edges. This is not surprising as graph topology is essential for GNNs, both as the backbone of a GNN architecture and as important structural features. In particular, the local neighborhood of each node is commonly used to define receptive fields for the convolution operator. The statistics of local neighborhood, e.g., node degrees, are important structural information used as additional node features (Veličković et al., 2018) to re-calibrate the convolutional operation (Kipf & Welling, 2016) . In this paper, we focus on defending against global poisoning adversarial attacks which corrupt the graph topology in the training phase. Some existing approaches assume the graph is true and leverage known robust training techniques, e.g., enforcing priors on latent representation of data (Zhu et al., 2019) . These solutions can still be limited by the corrupted graph, considering how critical the underlying graph is for a GNN model. Other methods assume prior knowledge on the graph topology, and perform graph restructuring, e.g., via low-rank filtering (Entezari et al., 2020) or graph specification (Wu et al., 2019) , hoping to remove abnormal edges from the attack. These strong priors, although proven useful, also limit the generality of the method. 

1.1. A GEOMETRIC VIEW OF GRAPHS

We take a novel direction to find a robust representation of the graph topology through a geometric lens. We view a discrete graph in a continuous framework, in which nodes stay in an underlying metric space and the connectivity of two nodes has a stochastic nature, depending on the features of the two nodes, their respective neighborhoods and the entire node distribution. The input graph G is replaced by an ensemble of graphs, considered as (randomized) discrete realizations of the same underlying metric space in which G is taken. In order to do that, we recover the metric distance between two nodes in the underlying space through the Ricci flow metric on the input graph G. Note that we are not trying to explicitly find an embedding which would involve choices (e.g, Euclidean vs non-Euclidean, dimensionalities) that introduce extra and unnecessary distortion. Instead, we represent the underlying metric space via pairwise geodesic distance between nodes. Our geometrical approach is inspired by the Riemmanian geometry in the continuous setting (Hamilton, 1982; Perelman, 2002) . On a Riemmanian manifold, one can define Ricci curvature to measure the amount of 'bending' or 'curving' at each point. With Ricci curvature, one can define a diffusion process by changing the Riemannian metric (stretching or shrinking locally) such that curvature is uniform everywhere. This uniformization process is called Ricci flow. This theory can be extended to a graph setting (Ollivier, 2009) . Generally speaking, edges that are locally well connected have positive curvature while edges that are locally sparsely connected have negative curvature. In Ricci flow, edges of negative curvature are stretched (with increased edge weight) and edges of positive curvature are condensed (with decreased edge weight). These new edge weights that uniformize the Ricci curvature of the graph are called the Ricci flow metric. See Figure 1 for an illustration. Graph Ricci curvature and Ricci flow can be used to identify critical edges in a graph (Ni et al., 2015; Sandhu et al., 2015) and to identify community structures (Ni et al., 2019; Sia et al., 2019) . We also note that graph Ricci curvature has been used in GNN for node classification task (Ye et al., 2020) , but not for defending structural attacks to GNN. Robustness against topological perturbation. Ricci flow metric has been shown to be robust to random deletion and addition of edges (Ni et al., 2018) . This attributes to the fact that Ricci flow is a global process that tries to uncover the underlying metric space supported by the graph topology and thus embraces redundancy. Compared to other graph metrics such as the hop count metric and metric obtained by spectral embedding, Ricci flow metric provides a better trade-off between robustness and representation power of the graph metric, as shown in Figure 3 . When two edges are deleted, the Ricci flow metric is rarely affected (Figure 3 (a)), similar to the hop count metric (Figure 3 (c )); while the distance metric by spectral embedding is substantially more sensitive (Figure 3 (b) ). We note that the hop count metric is also robust to dynamic edge deletions due to the small world phenomena and multiple shortest paths in the graph; however the hop count metric takes only integer values and generally lacks descriptive power to provide desirable resolution and differentiation. To train a GNN using the Ricci flow metric, we generate an ensemble of sample graphs G 1 , G 2 , • • • , and use a new sample in each network layer of the GNN of every training epoch (Figure 2 ). Therefore the trained model is enforced to focus on the underlying metric information represented by the graph (which is much more robust) and not on the particular input graph topology (which could be corrupted) per se. Our method is agnostic to both models and attacks, thus can be applied to different GNNs and different structural attacks. We show in both synthetic and real-world datasets that the proposed algorithm effectively defends against various structural attacks, with improved performance compared to other defense schemes. We summarize our contributions as follows. • We are the first to take a geometric view of the GNN defense problem. We propose to train GNNs with the Ricci flow representation of a graph instead of its attacked topology. • We design a new algorithm to sample graphs based on the Ricci flow representation for training GNN. This effectively alleviate the impact of structural attacks by adversaries. • We demonstrate the efficacy of our method on various synthetic and real-world datasets, against state-of-the-art graph topology poisoning methods.

1.2. RELATED WORK

The vulnerability of deep neural network models w.r.t. adversarial attacks is well known. And graph neural networks are not an exception (Dai et al., 2018; Zügner et al., 2018; Zügner & Günnemann, 2019a ). Here we briefly review the methods for attacking and defending against GNNs. Adversarial attack on graphs. There are two categories of attacks: evasion attacks and poisoning attacks. Evasion attacks generate fake samples for the trained model in the testing time, while poisoning attacks directly modify the training data. Dai et al. (2018) Robustness of GNNs. To defend against these graph attacks, Miller et al. (2019) seek to increase model robustness by decoupling structure from attributes in the classifier and re-selecting the training data. But their method exhibits a trade-off between robustness and performance, i.e. the performance drops on clean data. Wang et al. (2019b) proposed graph encoder refining and adversarial contrastive learning. They investigate the vulnerabilities in every aggregation layer and the perceptron layer of a GNN encoder, and apply dual-stage aggregation and bottleneck perceptron to address those vulnerabilities. They mainly focus on targeted node attacks (e.g. Nettack) instead of global topology attacks. RGCN (Zhu et al., 2019) treats node features as a Gaussian distribution and encode the hidden representation of nodes by mean and variance matrices. They apply self-attention on the variance matrix to aggregate messages from neighboring nodes. However, this method only focuses on defense against random noise on node features. GCN-Jaccard (Wu et al., 2019) pre-processes the network by eliminating edges that connect nodes with sufficientely small Jaccard similarity of features. GCN-SVD (Entezari et al., 2020) proposes to vaccinate GCN with the low-rank approximation of the perturbed graph. Most of these existing methods provide insight of robustness from the perspective of optimization or matrix ranks. DropEdge (Rong et al., 2019) randomly removes a certain amount of edges from the input graph at each training epoch. It is designed to resolve the over-fitting and over-smoothing issue of developing deeep GCNs. However, it can also be used for improving the graphs robustness. Pro-GNN (Jin et al., 2020b) jointly learns a structural graph and a robust graph neural network model from the perturbed graph guided by exploring the graph properties of sparsity, low rank and feature smoothness to design robust graph neural networks. In this paper, we understand the graph robustness from a geometric view and provide an efficient sampling based model.

2. GEOMETRIC RE-SAMPLING FOR ROBUST GNNS

Our method uses the robust geometric representation of the input graph to train a GNN by randomly sampling a new graph based on the Ricci flow representation. Figure 2 shows the general framework. Our method is agnostic to both the GNN model and the attack strategy . We start by a brief review on graph neural networks and poisoning attacks against graphs.

2.1. BACKGROUND: GNNS AND POISONING ATTACKS

We focus on the semi-supervised node classification task. Consider a graph G = (V, E) with node features H = (h 1 , h 2 , • • • , h N ), h i ∈ R D , where N is the number of nodes and D is the feature dimension of each node. Only part of the graph nodes V ⊆ V are labeled and our main task is to predict the label of the remaining nodes V u ⊆ V given the node features H, edges E and labels of V . A GNN essentially learns the low-dimensional representation of the nodes given the node features and graph structure. There have been various types of GNNs (Bruna et al., 2014; Gori et al., 2005) , usually classified into two categories: the spectral ones and the spatial ones. Spectral graph neural networks extend CNNs to graphs by defining convolution filters in the spectral domain (Bruna et al., 2014; Defferrard et al., 2016; Kipf & Welling, 2016) . They utilize the concept of graph Fourier transformation, and define the spectral filters on the eigenvalues of the graph Laplacian matrix. Spatial graph neural networks, on the other hand, define graph filters in the spatial domain. They iteratively update graph nodes' representation by aggregating information from neighbors (Gilmer et al., 2017) . We can see both Laplacian matrix and the neighborhood relationship essentially represent the graph structure information, thus both types of GNNs highly rely on the graph topology and are sensitive to structural perturbations. Take the well known graph convolutional network (GCN) as an example (Kipf & Welling, 2016) . GCN consists of multiple layers. Layer t updates node representation from H t-1 to H t : H t = σ( ÂH t-1 W t ) where H 0 = H, W t is the parameter for layer t, Â is a normalized version of adjacency matrix A: Â = D -1/2 A D -1/2 with A = A + I and D as the degree matrix of A. As introduced in Section 1.2, a GNN is vulnerable to adversarial attacks. Usually the attackers introduce unnoticeable perturbations by imposing restrictions to ensure that the attack preserves the graph structure and node features. A non-targeted structural poisoning attack on graph G can be formulated as the following optimization problem: arg min G ∈Φ(G) L attack (f θ * (G )) (2.1) where f θ is the GNN function for node embedding where θ is the set of parameters, Φ(G) is the constraint set for the perturbed graph, θ * = arg min θ L train (f θ (G )). As indicated in (Zügner et al., 2018; Zügner & Günnemann, 2019a) , by treating the graph structure matrix A as a parameter (or hyper-parameter) and solving this optimization problem, the attackers can significantly decrease classification performance.

2.2. GEOMETRIC RESTRUCTURING OF GRAPHS

With the same insight as the unsupervised manifold hypothesis (Cayton, 2005; Narayanan & Mitter, 2010; Rifai et al., 2011) (real data in high dimensional spaces concentrate near low-dimensional manifolds), we view a graph as a discretization of an underlying manifold. Any manifold can be described by a collection of (local) charts, where each chart is homeomorphic to an open set in a Euclidean space and charts that overlap are compatible (transition from one chart to another is differentiable) (Lee & Lee, 2009) . This will allow one to define the geodesic distance between two points on the manifold. By using graph Ricci flow, we can recover this latent metric space that is intrinsic to the input graph, which is robust to topological perturbations. Specifically, the discrete Ollivier Ricci curvature (Ollivier, 2009) κ xy of the edge (x, y) involves the ratio between the Wasserstein distance (the optimal transport distance) W (m x , m y ) and their geodesic distance d(x, y) κ xy = 1 - W (m x , m y ) d(x, y) . (2.2) where m x , m y are two distributions defined on the neighborhood of x and neighborhood of y, respectively. The details are in the Appendix A.1. The computation of Ricci flow metric involves multiple iterations until edge weights do not change much (Ni et al., 2015; 2019; 2018) . In each iteration, we calculate the Ricci curvature of each edge, and adjust the current edge weight with a value proportional to the edge curvature. For the t-th iteration, all the new edge weights w (t+1) are calculated as w (t+1) (x, y) = d (t) (x, y) -κ (t) xy d (t) (x, y). (2.3) We re-normalize all the edge weights to keep the total edge weight unchanged at the end of each iteration. To speed up the computation process, we use the Sinkhorn distance (Cuturi, 2013) (on a sampled neighborhood) as an approximation of optimal transport distance. In practice, it takes less than 1 sec to compute Ricci curvature for Cora and Citeseer data set (of 4732 edges), and 6.9 secs for Polblogs (with 16714 edges) on a 36 cores machines. The detailed formulas of computing Ricci curvature and Ricci flow can be found in the Appendix A.2. Robustness of Ricci flow metric One way to visualize this is to consider embedding the graph on a manifold with uniform curvature. The edge weight on (u, v) describes the proximity of u, v on the manifold. In other words, it would require a lot of changes to the graph connectivity to create significant changes in this metric space and the underlying manifold. The benefit of the new metric, according to the curvature definition, is the following nice property that implies robustness to connectivity perturbation: the optimal transport distance from a distribution on neighbors of u to a distribution on neighbors of v is (1 -κ)d(u, v), which is just d(u, v) when curvature κ converges to zero. In other words, there are paths connecting neighbors of u and neighbors of v which bypass edge (u, v) and have similar length (in an average sense). This suggests that the removal of edge (u, v) has small changes to lengths of (shortest) paths in the graph. This is a good thing to have, as opposed to a graph metric where the removal of an edge incurs substantial changes to the distances of certain pairs of nodes (often in the neighborhood) -and these edges are especially susceptible to adversarial attacks. Depending on the graph topology, there are cases when the curvature upon convergence is not zero (for example, when the graph is sparse and tree-like the curvature converges to a negative value). Similarly the disturbance to the distances between nodes in the graph, upon the removal of a single edge, is disseminated in a global manner so all pairs distances suffer similar (small) damage due to the adversarial attack. Graph Re-sampling in GNN Instead of using the possibly poisoned input graph for training data, we re-sample a family of graphs from the Ricci flow metric and use this ensemble of graphs as the training data. The edges of a graph are sampled by imposing a Gaussian filter on each node using the Ricci flow metric distance. The use of a Gaussian kernel to convert the Euclidean distances between data points to a similarity measure is commonly used in settings that take a manifold viewpoint on input data. Then the all-pairs Ricci flow distance metric S between any two nodes are calculated as the geodesic distance based on the weights assigned to all the edges. To keep graph sparsity, we only sample edges between pairs that are within k hops of each other in G (we take k = 2 in the experiments). Two nodes is connected by an edge with probability: P (S) = 1 σ √ 2π exp - 1 2 S βσ 2 . In each epoch, we sample a graph G i for each layer i in the graph convolution method and apply classical GCN to learn weight parameter W for prediction. The pseudo code for one GNN epoch is shown in Algorithm 1. Intuition. Running Ricci flow provides us a chance to recover edges that should exist but are not formed yet (or removed by the attacker). Recall that a new graph is sampled at each layer in the graph convolution pipeline. If an edge is added from the attack and is not aligned with the main network structure (and the underlying metric space), it is unlikely to get consistent support in the re-sampling phase across multiple layers. If an edge from the attack is actually aligned well with the underlying metric space, it is not creating much damage in the performance. This observation is visualized in Figure 4 . To show the benefit of Ricci flow metric in addition to the ensemble approach, we also compared with re-sampling using other graph metrics such as spectral embedding and hop count metric. See the experiment section.

3. EXPERIMENTS

Attack and defense baselines. We test our methods against two poisoning attacks: RAND (randomly adding fake edges into the graph, provided by DeepRobust (Li et al., 2020 ) library) and META (meta-learning attack (Zügner & Günnemann, 2019a )), which treats the graph structure as a hyper-parameter and uses meta-gradients to solve the believe optimization problem in Eq. 2.1. We ran the meta-learning with self-training (using predicted labels on unlabeled nodes) and exact meta-gradients named as Meta-Self since it achieves state-of-the-art performance in most datasets. For defense methods, we also compare the performance of graph attention network (GAT) (Veličković et al., 2018) , RGCN (Zhu et al., 2019) , GCN-Jaccard (Wu et al., 2019) , GCN-SVD (Entezari et al., 2020) and Pro-GNN (Jin et al., 2020b) . The detailed descriptions of these baseline methods can be found in the supplementary materials. In all experiments, for each graph, we run the training and inference tasks 20 times and take the average accuracy. For each training procedure, we run 100 epochs and use the best model based on validation performance. Results on synthetic datasets. We evaluate our method on synthetic graphs generated from the Stochastic Block Model (SBM) (Holland et al., 1983) . We create 24 random graphs, each of which has 1000 nodes, equally partitioned into five communities. Within each community, two nodes are connected with intra-class probability p ∈ {0.07, 0.09, 0.11, 0.13}. Nodes from different classes have a lower probability to connect with each other with inter-class edge probability q ∈ {0.025, 0.03, 0.035, 0.4, 0.045, 0.05}. Community 4 and 5 only have edges to community 1. For each generated graph, we randomly select 100 nodes as the training set and the remaining 900 as testing set for attack and defense. We assign each node a node ID feature using the one-hot encoding. We use 2-layer GCN (Kipf & Welling, 2016) as the default graph neural network and meta-learning attacks (Zügner & Günnemann, 2019a) as the attacking method. 5% of the total edges are perturbed under the node degree distribution constraint as in (Zügner et al., 2018) . For attacking, we follow the setting in (Zügner & Günnemann, 2019a) . For Ricci-GNN, we also use 2 layers, and the hyperparameters are chosen from σ ∈ {0.2, 0.4, 0.6, 0.8, 1.0} and β ∈ {1, 2, 3, 4, 5}. The classification accuracy for the original graph, the attacked graph, and our Ricci-GNN method are 84%, 82%, and 87% respectively. Our proposed method successfully negates the impact of the attack and even improves classification accuracy. This is due to the power of the Ricci flow metric in terms of recovering the underlying community structure and the improved robustness and diversity with our re-sampling method. Results on real-world datasets. We evaluate our method on three real-world graph datasets: Cora, Citeseer, and Polblogs, that were often used in prior work (Zügner & Günnemann, 2019a) . Cora and Citeseer (Sen et al., 2008) are citation networks, where each node represents a document and each edge represents a citation relationship. In Polblogs (Adamic & Glance, 2005) , nodes are political blogs in 2004 US president election and edges represent citations between blog. Note that Polblogs does not have node features so we create an N dimension one hot feature for each node. We use the same training setup for the clean graph, poisoned graph and our method. This includes setting L2 regularization with λ = 0.0005, initializing by Glorot initialization and training by minimizing cross-entropy loss using Adam optimizer with learning rate r = 0.005. For random attack, we randomly add 5% extra edges. The result is shown in Table 1 . Our method is on par with other methods on Cora and achieve state-of-the-art for Citeseer and Polblogs dataset. We also ran the experiment with increasing perturbation ratio on Polblogs dataset. As shown in Figure 5 (b), by restructuring and sampling the graph based on the Ricci flow metric, our method can negate most of the effect of added random noise even when the noise ratio is large. Table 2 shows the classification accuracy (the higher the better) of different defense schemes after attacked by meta-learning attack with different perturbation ratios. The first row of the table in each dataset shows the accuracy of applying the defense method on the original clean graph. The result shows that directly using our method has no negative effect on GNN models. Specially for Citeseer and Polblog dataset, our method achieve state-of-the-art accuracy. When the perturbation increases (the attacker is more powerful), the accuracy gap between our method and other methods widens, clearly demonstrating the advantage of our method. Benefit of Ricci flow metric. To show the importance of Ricci flow metric, we also run the entire graph restructuring algorithm but with the hop count metric and the spectral embedding metric against meta-learning attacks. See Table 3 . In nearly all cases, using Ricci flow metric shows clear improvement on performance. On Cora, the one with spectral embedding is the worst, as the graph is relatively sparse and spectral embedding is least stable. On Polblogs, the one with hop count distance is the worst, as the graph is very dense and diameter is small (only 4). It shows that the Ricci flow metric is important for the probabilistic sampling framework to achieve the full defense potential.

4. CONCLUSION

We propose a novel approach to improve the robustness of GNNs against graph-topology focused attacks. The curvature and flow information can effectively capture the intrinsic geometry of the graph that is robust to structural perturbation. Our algorithm restructures and resamples the graphs using the underlying geometry. This helps training a robust graph neural network. Our method achieve superior performance on both synthetic and real work benchmarks under various attacks. 

A APPENDIX

In this supplemental material, we provide technical details on Ricci curvature and flow. We also provide additional details on the experiments. A TECHNICAL DETAILS OF RICCI CURVATURE AND FLOW In this sub-section, we will establish the precise mathematical formulation of Ricci curvature and Ricci flow in the discrete setting and describe their computation on un-directed graphs. We first define Ricci curvature, which is computed for each edge. Next, we explain how Ricci flow re-weight the edges iteratively so that Ricci curvatures of all edges are smoothed. The final weighted graph induces the Ricci flow metric; the geodesic distance between any two nodes in this weighted graph is their distance in the Ricci flow metric space. A.1 DISCRETE RICCI CURVATURE. In Ollivier's definition Ollivier (2009) of Ricci curvature for discrete space, one aims to measure the curvature κ xy between nodes x and y. By comparing the Wasserstein distance (also called earth mover distance) between the neighborhoods of x and y, we can determine the deviation of the edge (x, y) from being flat. For an undirected and edge-weighted graph G = (V, E), the neighborhood of a node x is the collection of immediately adjacent nodes (one-hop neighbors) N (x) = {x i : (x, x i ) ∈ E} associated with some probability measure m x (x i ) which sums to 1. Similarly, we have a probability measure m y on the neighborhood of y. The Wasserstein distance between thew two probability measures, W (m x , m y ), is the minimum total weighted cost to move m x to m y using the optimal transportation plan M: min M i,j d(x i , y j )M(x i , y j ) s.t. j M(x i , y j ) = m x (x i ), ∀i i M(x i , y j ) = m y (y j ), ∀j (A.1) where M(x i , y j ) is the quantity of probability mass transferred from x i to y j using the shortest path with graph geodesic d(x i , y j ). Then the Ricci curvature κ xy of the edge (x, y) takes the ratio between this Wasserstein distance and their geodesic: κ xy = 1 - W (m x , m y ) d(x, y) . (A.2) A negative curvature means the probability mass of the neighborhood m x is transported to m y mostly through the edge (x, y). This usually happens when (x, y) is a bridge joining two communities (red edges in Figure 1 (a)). Meanwhile, an edge within a community tend to have overlapping neighborhoods of x and y, resulting in positive curvature (blue edges in Figure 1(b) ). Notice that the curvature value depends on the edge weights. Later in Ricci flow, we will illustrate how curvature morphs when edge weights change. To define the probability measure of each neighborhood, we adopt a weight aware definition from Ni et al. ( 2019) that discounts neighbors which are further away. With portion γ ∈ [0, 1] and discount factor p ≥ 0, m γ,p x (x i ) =    γ if x i = x 1-γ C • exp(-d(x, x i ) p ) if x i ∈ N (x) 0 otherwise . where C = i:xi∈N (x) exp(-d(x, x i ) p ) is the normalizing constant to assign probability measure 1 to the entire neighborhood. When p = 0 (weight unaware), this definition of the probability measure reduces to the uniform distribution. We take γ = 0.5 and p = 2 following the heuristics in the literature.

A.2 RICCI FLOW

Recall the curvature describes the degree of a surface being curved. Ricci flow is an iterative process to restore the flatness everywhere such that Ricci curvature κ xy is constant for every edge. Starting with the original input graph, the flow iteratively updates edge weights. For the t-th iteration, all the new edge weights w (t+1) are calculated as w (t+1) (x, y) = d (t) (x, y) -κ (t) xy d (t) (x, y). (A.3) . Each update moves the edge weight in the opposite direction of the curvature. Geometrically, negatively curved edges acting as bridges will be extended while positively curved edges within the community will be shortened. We re-normalize edge weights after each iteration. When the process converges, the final set of edge weights induces a Ricci flow metric on the graph. Please see Figure 1 (b) for an illustration. To speed up the computation of the Wasserstein distance, we use an approximate version called Sinkhorn distance from Cuturi (2013) that smooths the optimal transportation cost with a regularization term and then can be computed by Sinkhorn-Knopp's matrix scaling algorithm. Tabel 5 shows the two new baselines performance on defending the meta learning attack under different perturbation rate. It worth mentioning that CurvGN (Ye et al., 2020) is not designed for improving the robustness of GNN. Thus CurvGN performs relatively worse than DropEdge (Rong et al., 2019) . We also move the performance of our Ricci-GNN to here to show that our method perform better than those two methods. C DETAILED DESCRIPTION OF BASELINE METHODS GAT. Graph attention network (Veličković et al., 2018) uses node feature to learn a self attention.

B.1 DATASETS DESCRIPTION

The attention is used to re-weight each message passed to the node. Neighboring nodes with features that are more important will receive higher weights. Since the message is solely learned from node features, GAT is inherently robust to graph structure perturbation. RGCN. RGCN (Zhu et al., 2019) models hidden layer representation of nodes as Gaussian distribution to counter adversarial attacks. It also use attention mechanism from GAT to penalize nodes with high variance. GCN-Jaccard. GCN-Jaccard (Wu et al., 2019) inherits the idea of feature importance from GAT. It choose important messages measured by Jaccard similarity of features and delete edges that considered irrelevant. GCN-SVD. GCN-SVD (Entezari et al., 2020) claims that most adversarial attacks will affect highrank spectrum of the graph, thus taking a low-rank approximation of the graph to defend the adversarial attacks. Note that it's originally designed to defend Nettack (Zügner et al., 2018) . However, it can be also used for meta-learning attack (Zügner & Günnemann, 2019a ) and random attack. Pro-GNN. Pro-GNN jointly learn a structural graph and a robust graph neural network model from the perturbed graph guided by exploring the graph properties of sparsity, low rank and feature smoothness to design robust graph neural networks. Topological attack-MinMax. Topological attack provides two attacking methods. a) attacking a pre-defined GNN (PGD) and b) attacking a re-trainable GNN (MinMax). We choose the MinMax because most recent work focus on improve the robustness of re-trained GNN. The topological attack the problem as a loss optimization problem and use the MinMax method to solve the problem. The attacker seeks to minimize the per-node attack loss while the GNN tries defend the attack by retraining W so that attacking GNN is more difficult.



Figure 1: An illustrative example of Ricci curvature and Ricci flow on graphs. 1(a): The bridge edges (red) between communities have negative curvature while the edges inside communities (blue) have positive. 1(b): The same graph after Ricci flow, in which length of edges are proportional to weights (Ricci flow metric). Nodes within one community are moved closer whereas the two communities are moved further apart.

Figure 2: An overview of our Ricci-GNN. We first compute the Ricci flow metric from the input (attacked) graph and re-sample edges by using a Gaussian filter on each node. A newly sampled graph is used for the training phase in each iteration of a standard GCN.

Figure 3: Changes in distance between all nodes and a fixed root in karate club graph for (a) Ricci flow metric, (b) spectral embedding, and (c) hop count, when two edges (shown in red) are removed from the network. Vertices are colored proportional to the magnitude of variations in distance.

One GNN epoch based on Ricci flow graph restructuring input Input adjacency matrix A, σ, β Pre-computed Ricci flow metric F from A Pre-computed all-pairs Ricci flow distance metric S of all nodes via F Pre-computed edge probability matrix: for t = 1 to T do 2: Sampling AR from P 3: AR = AR ∨ A and update weight parameter W t 7: end for Specifically, we first calculate the edge weights F by running Ricci flow on the attacked graph G.

Figure 4: The defense of meta-learning attack using Ricci-GNN on a Stochastic Block Model (SBM) graph with 5 communities. 4(a): The clean SBM graph. 4(b): The adjacency matrix of the graph while the edges in the clean graph are shown in blue. Edges added by the meta-learning attack are shown in red. These edges appear disproportionately to the original edge density. Sparser blocks in the clean graph receive more edges in the attack; nearly all edges in the attack are between different communities. 4(c): The heat map of the probability of an edge being connected in our re-structuring method. The probability of edges in the original community structure are higher than the attack edges. 4(d): The common edges of two randomly re-sampled graphs -the influence of the edges in the meta-learning attack is essentially eliminated.

Figure 5: Accuracy plot of different defense schemes with increasing perturbation rate under different attacks on Polblogs.

Figure6: Defense accuracy heat maps for synthetic data of 24 SBM graphs constructed from different {p, q}. From left to right: attacked graph, clean graph, defense result by our Ricci-GNN. For each heat map, x-axis is the intra-community probability, p, y-axis is the inter-community probability, q.

Classification accuracy for various defense schemes after random attack of 5% extra edges

Classification accuracy for various defense schemes after meta-learning attack

Classification accuracy for defense using Ricci flow metric (Ricci) vs. hop count metric (HC) and spectral embedding metric after meta-learning attack. Note that Citeseer dataset is by itself fairly robust to attacks (See Table2).

Daniel Zügner and Stephan Günnemann. Certifiable robustness and robust training for graph convolutional networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 246-256, 2019b. Daniel Zügner, Amir Akbarnejad, and Stephan Günnemann. Adversarial attacks on neural networks for graph data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2847-2856, 2018.

provides details of the datasets.

Statistics of Real-World graph datasets.

Classification accuracy for various defense schemes after meta-learning attack B.3.2 DEFENSE RESULT ON TOPOLOGICAL ATTACK-MINMAX Besides the Table6shows the extra experiment results on defending the topological attack-MinMax(Xu et al., 2019a)  under two different perturbation rate: 5% and 25%. Since the code of DropEdge has programming error on citeseer under MinMax attack, we can't report it's result. (For now) From the tabel, we can see that our methods out perform all other baselines. It confirms that our method improved the robustness of GNN on different attack methods.

Classification accuracy for various defense schemes after topological attack-MinMax.

