REVISITING GRAPH ADVERSARIAL ATTACK AND DEFENSE FROM A DATA DISTRIBUTION PERSPECTIVE

Abstract

Recent studies have shown that structural perturbations are significantly effective in degrading the accuracy of Graph Neural Networks (GNNs) in the semi-supervised node classification (SSNC) task. However, the reasons for the destructive nature of gradient-based methods have not been explored in-depth. In this work, we discover an interesting phenomenon: the adversarial edges are not uniformly distributed on the graph, and a majority of perturbations are generated around the training nodes in poisoning attacks. Combined with this phenomenon, we provide an explanation for the effectiveness of the gradient-based attack method from a data distribution perspective and revisit both poisoning attack and evasion attack in SSNC. From this new perspective, we empirically and theoretically discuss some other attack tendencies. Based on the analysis, we provide nine practical tips on both attack and defense and meanwhile leverage them to improve existing attack and defense methods. Moreover, we design a fast attack method and a self-training defense method, which outperform the state-of-the-art methods and can effectively scale to large graphs like ogbn-arxiv. We validate our claims through extensive experiments on four benchmark datasets. * Corresponding to Xiang Ao implementation details and the statistics of the datasets are provided in A.1. Many efforts have been made to study various properties of the gradient-based attack algorithms. GCN-SVD Entezari et al. (2020) discovers that attacks exhibit a specific behavior in the spectrum 1 Our focus is to revisit both attack and defense sides from a new view. These two algorithms are natural byproducts of this work, so we put them in the appendix.

1. INTRODUCTION

Graph Neural Networks (GNNs) have been widely explored in recent years for numerous graph-based tasks Li et al. (2015) ; Kipf & Welling (2017) ; Hamilton et al. (2017) ; Liu et al. (2021b) , primarily focused on the semi-supervised node classification (SSNC) task Xu et al. (2019b) ; Veličković et al. (2017) ; Huang et al. (2022) ; Liu et al. (2022) . The evidence that GNNs are vulnerable to adversarial structure perturbations is convincing Dai et al. (2018) ; Zügner et al. (2018) ; Zügner & Günnemann (2019) ; Wu et al. (2019) ; Geisler et al. (2021) ; Zhu et al. (2022b) . Attackers can degrade classification accuracy largely by unnoticeably modifying graph structure. Most attack methods are gradientbased Chen et al. (2020) ; Wu et al. (2019) ; Zügner & Günnemann (2019) ; Xu et al. (2019a) ; Geisler et al. (2021) , treating the adjacency matrix as a parameter and modifying it via the gradient of the attack loss. However, we still lack a general framework to explain their effectiveness. We posit that the destructive power of gradient-based methods stems from their ability to effectively increase the distribution shift between training nodes and testing nodes.To illustrate in more detail, we start with an interesting phenomenon: the malicious modifications generated by gradient-based methods are not uniformly distributed on the graph. As shown in Fig. 1 , most modifications are around the training nodes (ordered at the top of the adjacency matrix), while the largest part of the graph, Test-Test, is hardly affected. Specifically, we apply two representative attack method, MetaAttack Zügner & Günnemann (2019) and PGD Xu et al. (2019a) . The data split follows 10%/10%/80% (train/validation/test). Furthermore, we find that only MetaAttack can adaptively adjust the attack tendency (attack training nodes or testing nodes) according to the size of the training set, and such adaptivity makes MetaAttack outperform other methods regardless of the data split. It inspires us to study the effectiveness of attack methods from another perspective, Distribution Shift, which likewise considers the differences between the training set and the testing set. This begs the following challenge: How to formulate the distribution shift in graph adversarial attack scenario? To answer the above question, in this paper, we first clarify the differences between attack in the mainstream domain, e.g., image classification, and in SSNC: (1) SSNC is a transductive task in which attackers have access to both training nodes and testing nodes; (2) Nodes in the graphs have both features and structural information, while other data types may only contain features, e.g. images have pixels or text have words. Taking these two differences into account, we provide a formalization of the distribution shift in graph adversarial attack and theoretically prove that the perturbations around training nodes enlarge the distribution shift in an effective way. we explore factors that influence the location of adversarial edges, such as surrogate loss and gradient acquisition. Using the formulation of the distribution shift, some unexplained phenomena in previous works become clear. For example, why do gradient-based attack methods significantly outperform the heuristic homophily-based method, DICE? Why do most modifications are insertions instead of deletions? We will analyze them from both theoretical and empirical sides. With the advantage of the above analysis, several practical tips are proposed to improve and instruct attack and defense on graphs. To validate these tips, we conduct extensive corresponding experiments. Additionally, we design a fast and straightforward heuristic attack method underpinned by increasing the distribution shift, and it achieves comparable performance to gradient-based methods and can effectively scale to large graphs like ogbn-arxiv. We also provide a self-training-based method to improve the robustness of GNNs 1 . The codes are available at https://github.com/likuanppd/STRG. Our main contributions are summarized below: • We find an interesting phenomenon that perturbations are unevenly distributed on the graph, and it inspires us to revisit graph adversarial attack from a data distribution perspective and define the distribution shift in graph attack scenario. • We explore some unexplained phenomena and provide relevant theoretical proofs from the view of data distribution. We argue that the effectiveness of the graph attack essentially comes from increasing the distribution shift, as the fundamental nature of the adversarial attack. • We provide some practical tips to instruct both attack and defense on graphs. We conduct extensive experiments to support our claims and verify the validity of these tricks. The of the graph: only high-rank (low-valued) singular components of the graph are affected. Likewise, Chang et al. (2021) and Chang et al. (2022) also study the robustness of GNNs from the spectral perspective. Chang et al. (2021) indicates that not all low-frequency filters lick GCNs are robust to adversarial attacks and proposes GCN-LFR which enhance the robustness of various kinds of GCN-based models by a general robust co-training paradigm. Meanwhile, many efforts have been devoted to revealing the various properties of gradient-based attacks. Zügner & Günnemann (2019) demonstrates that the perturbations tend to increase the heterophily of the graph. Based on it, Zhang & Zitnik (2020) and Wu et al. (2019) invent GNNGuard and Jaccard, respectively, to prune the edges that link two dissimilar nodes. Geisler et al. (2020) and Chen et al. (2021) study the robustness of GNNs from the breakdown point perspective and propose more robust aggregation approaches. Xu et al. (2022) introduce a mutual information-based measure to quantify the robustness of graph encoders on the representation space. Zhan & Pei (2022) is another work that finds the uneven distribution of perturbations, mainly focusing on the attacking side and proposing a black-box attack method. Different from them, we prefer to study the mechanism behind it, including the reasons for its generation and the impact on the effectiveness of the attack methods. We fomulate the distribution shift in graph adversarial attack and leverage it to analyze other tendencies of gradient-based attack methods, which can provide some theoretical guidance and help us understand the robustness and vulnerability of GNNs. Meanwhile, several tips are proposed, covering all the structure attack aspects, and most of them are not mentioned in Zhan & Pei (2022) .

3. PRELIMINARIES

Notations. Let G = {V, E} denote an undirected, unweighted graph with N nodes, where V and E (without self-loops) are the sets of nodes and edges, respectively. The topology of the graph can also be represented as a symmetric adjacency matrix A ∈ {0, 1} N ×N , in which A ij = 1 denotes that node v i connects node v j , otherwise A ij = 0. The original features of all nodes can be summarized as a matrix X ∈ R N ×d . The first-order neighborhood of node v i is denoted as N i , including node v i itself. Moreover, the labels of all nodes are denoted as y. Each node is associated with a label y i ∈ C, where C = {c 1 , c 2 , ..., c K }. We use f θ (A, X) to denote a GNN, and θ refers to the parameters.

SSNC.

In this paper, we study the robustness of GNNs on the semi-supervised node classification (SSNC) task, which can be formulated as this: Given a graph G, the node features X, and a subset of node labels y L ⊂ y, the goal is to learn a function: V → C which maps the nodes to the label set so that we can predict labels of unlabeled nodes. The ground truth labels of unlabeled nodes are denoted as y U , and the corresponding node set is V U .

Graph Adversarial Attacks

In this paper, we explore the robustness of GNNs on SSNC in graybox non-targeted attack on graph structure. Under this setting, the attacker possesses the same data information as the defender, but the defense model and its trained weights are unknown. The adversarial attack can be divided into two categories, namely poisoning (training time) attack and evasion (testing time) attack Zügner et al. (2018) . The attacker aims to find an optimal perturbated graph Ĝ that degrades the overall performance of the classifier as much as possible, which can be formulated as Zügner & Günnemann (2019) ; Geisler et al. (2021) : argmin Â∈Φ(A) L atk (f θ * ( Â, X), y), ( ) where Â is the adjacency matrix of the purturbed graph Ĝ, and Φ(A) is a set of adjacency matrix that fits the unnoticeable constraint: ∥ Â-A∥ 0 ∥A∥ 0 ≤ ∆, in which ∆ is the maximum perturbation rate. L atk is often -L(f θ * ( Â, X) U , ŷU ) or -L(f θ * ( Â, X) L , y L ), where ŷU is the pseudo-label of unlabeled nodes predicted by the surrogate classifier. For notation simplicity, we call them L self and L train , respectively. The θ * refers to the parameters of the surrogate GNN, which is different in evasion and poisoning attack. It is fixed and trained on the clean graph in evasion attack, but it can be repeatedly retrained as the graph is gradually contaminated in poisoning attack. The key to the gradient-based attack methods is to treat the adjacency matrix as a parameter and modify the graph structure via the gradient of the attack loss ∇ A L atk .

4.1. THE LOCATION OF ADVERSARIAL EDGES

To better understand which part of the graph is perturbed, we propose an intention score (IS) to quantify how the adversarial edges are distributed on the graph. Since GNN models are permutation invariant Zou et al. (2021) , we can adjust the node index and then split the adjacency matrix into the following form: A = A 1 A 2 A ⊤ 2 A 3 , A 1 ∈ R N1×N1 , A 2 ∈ R N1×N2 , A 3 ∈ R N2×N2 , where A 1 is an matrix of the edges that connects two training nodes, A 2 is a matrix that represents edges between a training node and a testing node, and A 3 is a matrix where the element denotes an edge between two testing nodes. N 1 and N 2 are the numbers of the nodes in the training set and testing set. IS can describe in which part of the adjacency matrix (A 1 , A 2 , or A 3 ) the attack algorithm prefers to generate adversarial edges. It is be formulated as: π i = Ẽi Ẽ , λ i = |E i | |E| , IS i = π i λ i , i = 1, 2, 3 where E is the original edge set, Ẽ denotes the edges inserted or deleted by the attack model, and i denotes the part of the adjacency matrix. IS can demonstrate the density of the perturbations located in the adjacency matrix (see Table 1 , Table 4 , and Fig. 5 ) distribution is to be perturbed. The destructive power of these gradient-based methods may stem from the fact that they effectively increase the distribution shift. The first thing we need to do is to formulate the distribution shift in graph adversarial attack. Unlike attacks in other domain, like image classification, formulating the distribution shift in SSNC should consider the structural information. We assume all the node features are sampled from p(x|y) amd define a community feature to take structural information into account: xi = 1 |Ni| j∈Ni x j and assume the community feature of arbitrary node i follows a distribution: 1 |N i | j∈Ni x j ∼ p( x|y). The first-order neighborhood plays the most important role in structure information, so we do not consider higher-order here. All the community features and corresponding labels can be viewed as being sampled from the joint distribution p( x, y) on clean graphs. In graph adversarial attack, there are three key distributions-p train ( x, y), p test ( x, y), and the classifier p θ (y| x). In the clean graphs, we assume that p train ( x, y) and p test ( x, y) are the same as the true distribution p( x, y). In the perturbed graphs, we consider attacking the structures as perturbing the corresponding distribution. Concretely, we treat the structural changes to the training and test nodes as if these nodes were sampled from biased p train and p test . In image classification, the attackers degrade the performance of the classifier by perturbing p train or p test . When implementing the poisoning attack, the p train will be perturbed, and the classifier will fit a biased distribution so that it will fail to precisely predict the unbiased images sampled from p test . In evasion attack, the attackers have no access to the training set, so they perturb the testing images-this can be viewed as predicting on a biased distribution p test with an unbiased classifier. However, SSNC is semi-supervised. Attackers can modify the entire graph structure, including the training and test nodes. The difference between SSNC and image classification is shown in Fig. 2 . Unlike inductive learning under full supervision scenario, p train and p test can be simultaneously perturbed. Awareness of this difference is critical. For example, in the image poisoning attack, attackers can only perturb the training data to bias the model. In contrast, in the graph poisoning attack, attackers can perturb the testing data to make an unbiased model test on the biased data. After perturbing, p train , and p test might be different to the true distribution p( x, y) and this discrepancy may result in a distribution shift. By a simple factorization, we can write: p( x, y) = p(y)p( x|y) Labels are not flipped in the setting of structure attack, so we assume p(y) is shared across all distributions. The distribution shift arises when p train ( x|y) and p test ( x|y) differ due to the structural perturbations, so we difine the distribution shift in graph adversarial attack as: 1 |C| y∈C D KL (p train ( x|y = c i ), p test ( x|y = c i )) Essentially the attack is to increase the distribution shift, but specifically to perturb p train ( x|y) or p test ( x|y) in the poisoning attack and evasion attack is different. For evasion attack, perturbing p train is nearly invalid. The classifier is already trained, which can be viewed as an unbiased model. It is unwise to waste limited modifications on the training nodes. For poisoning attack, the implications of attacking the training and testing sets are different from a distribution perspective. Attacking the training set is to perturb the p train ( x|y) in such a way that the classifier p θ (y| x) will fit a biased distribution. On the other hand, perturbing the testing set in poisoning attack is similar to the case in evasion attack. The model is well trained on clean data and tested on a biased distribution. Our empirical results demonstrate that gradient-based methods tend to perturb the local structure of training nodes if the size of the training set is small. We speculate that it is because in SSNC, the smaller the training set, the more effective it is to attack the structure of the training nodes (Theorem 4.1, we give the proof in Appendix A.3). Assumption 4.1. We consider a graph G, where each node i has feature x i ∈ R d and label y i ∈ {0, 1}. We assume that (1) G is k-regular; (2) The feature of arbitrary node i is sampled from the normal distribution N (µ yi , Λ) associated with its label and independent to other nodes. Λ is a diagonal matrix and same for class 0 and 1; (3) The graph is a homophilous graph. The homophily ratio is h, which means each node connects kh nodes with the same label, and 0.5 ≤ h ≤ 1.  L self = L(f θ * ( Â, X) U , ŷU ) and L train = L(f θ * ( Â, X) L , y L ) are two widely used loss. We find that they lead to different attack tendencies. Attack methods with L train and L self will focus on the training and testing nodes, respectively. In addition, MetaAttack is a special case due to its ability to adjust the location of adversarial edges adaptively. We find that such adaptability comes from its way of calculating the gradient, i.e., meta gradient. For more details, see A.4.

5. EXPLANATION OF PHENOMENA IN GRAPH ATTACKS

With the advantage of the new perspective and the formulation of distribution shift, we could introduce some other phenomena in graph attacks and explain them. We provide the proof in Appendix A.5. Since adversarial attacks aim at performing unnoticeable perturbations, the perturbation rate ∆ is often very small, and in homophilous graphs, h is often much larger than 0.5 in homophilous graphs. Therefore, the assumption ∆k t(k+1) ≤ 1 -ln 2 and (2h -1)t > ∆ could be generally satisfied. To ensure this tendency is not caused by the sparsity of the graph, we build a dense synthetic graph and attack it by MetaAttack in Table 10 in Appendix A.5. The results hold the same conclusion that attack methods tend to insert but not delete.

5.2. GRADIENT-BASED METHODS OUTPERFORM HOMOPHILY-BASED METHODS

Under the homophily assumption McPherson et al. (2001) , i.e., connected nodes are more likely to have similar features and labels; one mainstream opinion is that attack methods tend to increase the heterophily of the graph Zügner & Günnemann (2019); Wu et al. (2019) ; Zhu et al. (2021a) . The homophily assumption ignores the location of the perturbations, so it cannot account for the superiority of the gradient-based approaches over DICE Waniek et al. (2018) , a heuristic method that directly increases the heterophily of the graph by randomly connecting nodes from different classes and disconnecting nodes from the same class. However, this is not surprising from a distribution perspective. On the one hand, DICE randomly attacks the entire graph, resulting in synchronous perturbation of p train and p test , while the gradient-based methods concentrate on attacking the training set. According to Theorem 4.1, attacking the smaller part is more effective to increase the distribution shift, so DICE performs worse. On the other hand, DICE algorithm perturbs both training and testing nodes using a similar rule, wherein heterophily is increased. Consequently, p train and p test may exhibit a biased shift in the same direction, leading to a reduction in distribution shift rather than an increase. We speculate that attack methods might be more destructive if they mainly perturb one of p train and p train . To support this, We turn the Train-Test perturbations generated by MetaAttack into direct edges. Fig. 3 shows the results and demonstrates two critical points. First, Directed→Train outperforms the vanilla MetaAttack, indicating that simultaneously perturbing p train and p test is worse than primarily perturb p train , indicating that perturbing p train and p test might be biased in the same direction. Second, Direct→Test is much weaker than Direct→Train. That is to say, what makes MetaAttack so destructive is the perturbations to the smaller part, training nodes. 

5.3. HIGH-DEGREE NODES

High-degree nodes are special. First, attack algorithms often avoid modifying the local structure around high-degree nodes Zügner & Günnemann (2019) . The more neighbors a node has, the more stable its community feature is. Attacking them will make less change to the distribution. Additionally, high-degree nodes are naturally high-density data (see A.6). High-density data possess two important properties: (1) They are easier to be classified correctly. (2) They are reliable neighbors to other nodes because they are located in high-density areas of the true distribution and are not attacked. We can leverage them to improve the robust GNNs (Table 11 ). For example, trust them more (for more details, see A.6).

6. PRACTICAL TIPS

With all the observation and analysis, we put forward several concrete tips to improve attack and defense methods.

6.1. POISONING ATTACK

Tip 1: It is better to focus on attacking the smaller part. For poisoning attack, the performance of GNNs can be degrade by perturbing p train or p test . According to Theorem 4.1, the smaller the size of the data set is, the greater the distribution is changed by injecting a fixed number of perturbations. MetaAttack is a good example of this tip, in which the perturbations are generated according to the size of the training set. We improve DICE and random perturbations (Random) along these lines and compare them to the vanilla version in Fig. 4 . For Cora, we focus the attack on the training nodes. As the public split on ogbn-arxiv is approximately 54%/17%/29% (train/val/test), we conduct the modifications around the testing nodes. We significantly improve the performance of both DICE and Random. The improvement is more remarkable in ogbn-arxiv. We conclude that such a strategy can be well-applied to large-scale graphs. Tip 2: Meta gradient is powerful. According to the comparison between MetaAttack and other variants, we conclude that meta gradient is a convenient tool in poisoning attack. Regardless of how the data set is split, meta gradient can help the model adaptively adjust the distribution of perturbations to enlarge the distribution shift effectively. Tip 3: Directed attack can outperform undirected attack. The results of Fig. 3 indicate that an adversarial edge can affect the aggregation of two end nodes, but only one of them may contribute to the accuracy decrease. It also suggests that perturbing both p train and p test may not be a good idea as p train and p test might be skewed in the same direction, failing to increase the distribution shift and, consequently, failing the attack.

6.1.1. A HEURISTIC ATTACK ALGORITHM

Considering all the properties of the gradient-based attacks from the data distribution perspective, we propose a simple heuristic poisoning attack algorithm that can achieve comparable performance and scale to large graphs. We provide the overall training algorithm, performance, and the runtime comparisons in A.7. Its core idea is to perturb the more easily perturbed distribution in p train and p test to enlarge the distribution shift. 5 and Table 7 , L self should be eschewed when meta-gradients are not used. Tip 5: The evasion attack is unlikely to cause higher performance degradation than the poisoning. Essentially, evasion attacks have a smaller effective range than poisoning attacks. Attackers can perform poisoning attack in an evasion form. As we mentioned before, only modifications around the testing nodes can degrade the performance, so one can make exactly the same perturbations to the test nodes when carrying out the poisoning attack. In this way, the model will be trained on nearly clean data and tested on contaminated data. In short, poisoning attack can at least completely replicate evasion attack. Tip 7: The high-degree nodes can help a lot. If the true distribution is like the normal distribution with the highest probability density located around the mean, the mean aggregation can reduce the variance of the center nodes. That is to say, high-degree nodes are high-density data and trustworthy.

6.3. DEFENSE

We provide an example of leveraging them to enhance the robustness of GNN in Table 11 . Tip 8: We can improve the robustness by decreasing the distribution shift. Once we know that the effectiveness of the attack algorithm stems from increasing the distribution shift, we can enhance the robustness by directly eliminating the inconsistency between the training set and the testing set. Based on this, we design a robust GNN STRG in Appendix A.8, which outperforms the SOTA methods. STRG leverages the local structures of testing nodes and pseudo-labels to train a GCN; in this case, p train and p test can be regarded as almost the same.

6.4. FOR DATASETS

Tip 9: Data split is the non-negligible part of graph adversarial attack. It is hard to maliciously manipulate the prediction made by GNNs without the data split. Meanwhile, the data split significantly affects the evaluation of attack and defense methods' effectiveness. We provide the details in A.9.

7. CONCLUSION

To better understand attacks on graphs, we revisit graph adversarial attack from a data distribution perspective and formulate the distribution shift in SSNC. Based on this, we argue that the tendencies of gradient-based methods and their destructive power essentially comes from increasing the distribution shift. We put forward several practical tips underpinned by what we found and provide some uses of the tips. Additionally, we give some open research ideas and hope they can spur research in this area.

8. REPRODUCIBILITY STATEMENT

The two proposed algorithms are not our focus in this work, and the key to their success is the thinkings behind them but not a technical novelty. They are both easy but effective. All the details are mentioned in the pseudo-code in Algorithm 1 and Algorithm 2, and the codes are provided at https://github.com/likuanppd/STRG. For other baselines, we provide the implementation details in Appendix A.1. Zhu et al. (2021a) shows that MetaAttack fails on evasion attack with a 10%/10%/80% data split, and it is consistent with our discovery. When the training set is small, such adaptivity will make MetaAttack focus on modifying the local structure around the training set. In the setting of evasion attack, the model is already trained on unbiased data, so testing on unchanged data does not cause worse performance. For example, when the training size is 10%, λ 3 ≈ 0.9 × 0.9 = 0.81 and IS 3 = 0.007, meaning that the Test-Test is the largest part of the graph, but most of the local structure around testing nodes is not malicious modified. N t = ∆k t more neighbours with different label due to the uniform insertions. Thus, the community features of the nodes in class 0 can be viewed as sampled from a biased distribution p train ( x|y = 0):

9. ETHIC STATEMENT

N ( (1 + hk)µ 0 + (k -hk + ∆k t )µ 1 k + 1 + ∆k t , Λ k + 1 + ∆k t ). ( ) The structures of testing nodes are not modified, so p test ( x|y = 0) is: N ( (1 + hk)µ 0 + (k -hk)µ 1 k + 1 , Λ k + 1 ). ( ) Let δ 0 and δ 1 denote the mean of p train ( x|y = 0) and p test ( x|y = 0), respectively. According to Eq. ( 6) and the KL-Divergence formula for two normal distribution, we have D KL (p train ( x|y = 0), p test ( x|y = 0)) = 1 2 log (k + 1 + ∆k t ) d (k + 1) d + d(k + 1) k + 1 + ∆k t + (k + 1)(δ 0 -δ 1 ) T Λ -1 (δ 0 -δ 1 ) -d (9) Let k+1+ ∆k t k+1 = S, then the first two terms of Eq. ( 9) can be rewritten as log S d + d S , S > 1. Take the derivative of Eq. ( 10) w.r.t. to S d S ( 1 ln 2 - 1 S ), apparently it is larger than 0 for S > 1, so Eq. ( 10) is monotonically increasing w.r.t S and monotonically decreasing w.r.t. the size of training set t. For the third term of Eq. ( 9), we first calculate δ 0 -δ 1 . For notational simplicity, we let P = (1 + hk)µ 0 + (k -hk)µ 1 δ 0 -δ 1 = P + ∆k t µ 1 k + 1 + ∆k t - P k + 1 = ∆k (k + 1)µ 1 -P (tk + t + ∆k)(k + 1) . Then we let ∆k (k+1)µ1-P (tk+t+∆k)(k+1) = v. Apparently, v T v is monotonically decreasing w.r.t. t, and we can easily prove v T Λ -1 v is also monotonically decreasing, because Λ -1 just introduces constants into each element of v. Therefore, Eq. ( 9) will become larger as t gets smaller. This also holds for nodes in class 1. Then we can get the conclusion that the smaller the t, the larger the distribution shift.

A.4 IMPACT OF SURROGATE LOSS AND GRADIENT COMPUTATION

The Impact of L train or L self Table 5 shows the attack tendencies and performance of PGD with L train or L self . We only provide the results on Cora, but similar results are observed on other commonly used datasets for SSNC. L train will make the attack algorithm tend to attack the training nodes, while L self will make it attack the testing nodes. L train focuses on the loss of the training nodes, which is mainly related to the local structure of these nodes. Therefore, when computing the gradient ∇ A L atk , only the part of the adjacency matrix associated with the training nodes will be modified, i.e., the Train-Train and Train-Test. PGD train works in poisoning attack but fails in evasion attack. This result is as expected because p train is well-perturbed but p test nearly remains unchanged. There are some edges in the Train-Test, but the largest part of the graph, the Test-Test, is barely changed. L self utilizes the pseudo labels of the unlabeled nodes to compute the loss, so the majority of modifications are around the unlabeled nodes. Although PGD self seems to work in evasion attack, the degradation in accuracy is not as pronounced as PGD train in poisoning attack. We suggest that there are mainly two reasons: (1) The testing set is much larger than the training set, so the same perturbation brings a smaller change in p test than in p train . (2) We use Cross-Entropy Loss to conduct these experiments. Geisler et al. (2021) shows that CE Loss makes algorithms primarily attack nodes that are already misclassified. Such attacks are valid in poisoning attack and can further bias the model p θ (y| x). However, in evasion attack, attacking misclassified nodes will not bring any drop in accuracy.Another example that uses L self is PR- BCD Geisler et al. (2021) , which is an novel evasion attack. We present the location statistics of perturbations on Citeseer attacked by it in Table 6 . We can see that all the perturbations are generated around the testing nodes, leading to effective performance. Gradient Computation MetaAttack is a special case due to its ability to adjust the distribution of adversarial edges. Even today, many practitioners still consider it to be the SOTA of poisoning attacks on small-scale graphs. According to the Theorem 4.1, this adaptivity makes MetaAttack incline to attack the more easily perturbed distributions in p train and p test . In other words, it identifies structural modifications that significantly increase distribution shift. We find that this adaptability comes from its way of calculating the gradient. Let's review the meta-gradient expressed in Zügner & Günnemann (2019) : ∇ A L atk (f θ T (A, X)) = ∇ f L atk (f θ T (A, X)) • [∇ A f θ T (A, X) + ∇ θ T f θ T (A, X) • ∇ A θ T ], ∇ A θ t+1 = ∇ A θ t -α∇ A ∇ θt L train (f θt (A, X) where T is the training steps to obtain the optimal parameter via vanilla gradient decent with learning rate α. The parameters θ are often fixed and detached when calculating the gradient ∇ A L atk in other gradient-based methods Xu et al. (2019a) ; Wu et al. (2019) . In MetaAttack, however, θ is iteratively retrained as the graph is gradually contaminated, and the derivatives w.r.t. the adjacency matrix are taken into account. The effectiveness and adaptive capability of MetaAttack stems from this gradient calculation method. MetaAttack will perform similar to PGD if the parameters are fixed or detached from the gradient computation.  (i) c * ) , where c * is the ground truth label and V + indicates correctly classified nodes. For CW and MCE, we can also divide them into L self and L train according to which nodes are used to calculate the loss. In Table 8 , we present the IS for Meta self with CW loss and MCE loss. Meta self with CW and MCE can also adjust the distribution of the perturbations according to the size of the training set, which is consistent with CE loss. With all the results, we conclude that Pseudo-labels in L self Attacks with pseudo labels behave similarly to those with ground-truth labels because (see the Table 9 ; Meta-true means Meta-self uses the ground-truth labels to generate perturbations instead of pseudo labels). We guess it is because, in homophilous graphs, the pseudo labels are generally accurate. Thus, attacks with L self will focus on modifying the local structure of the testing nodes to make the predictions away from the ground-truth labels. In many cases, the size of the testing set is relatively large, so it is hard to increase the distribution shift by modifying the testing structure according to Theorem 4.1. This way, methods like PGD self fail.

A.5 INSERTION VS. DELETION

Proof of Theorem 5.1 Proof. The Kl-Divergence between p train ( x|y = 0) and p test ( x|y = 0) of insertion is shown in Eq. ( 9), and we denote it as D KL-IN S . Similarly, we have D KL-DEL as followed: D KL-DEL (p train ( x|y = 0), p test ( x|y = 0)) = 1 2 log (k + 1 -∆k t ) d (k + 1) d + d(k + 1) k + 1 -∆k t + (k + 1)(δ ′ 0 -δ 1 ) T Λ -1 (δ ′ 0 -δ 1 ) -d , where ≥ ln 2 due to the assumption ∆k t(k+1) ≤ 1-ln 2. We already know that log S d + d S is monotonically increasing w.r.t. S if S ≥ ln 2. Therefore, δ ′ 0 = (1+hk-∆k t )µ0+(k-hk)µ1 log (k + 1 + ∆k t ) d (k + 1) d + d(k + 1) k + 1 + ∆k t > log (k + 1 -∆k t ) d (k + 1) d + d(k + 1) k + 1 -∆k t (16) Then, if v T v > (δ ′ 0 -δ 1 ) T (δ ′ 0 -δ 1 ), we can conclude that D KL-IN S > D KL-DEL . δ ′ 0 -δ 1 = ∆ k P -(k + 1)µ 0 (tk + t -∆k)(k + 1) (17) Let δ ′ 0 -δ 1 = u. This is equivalent to comparing |v| and |u|. |v| = ∆ k (k + 1)µ 1 -P (tk + t + ∆ k )(k + 1) = ∆k (hk + 1)(µ 1 -µ 0 ) (tk + t + ∆k)(k + 1) = ∆k (hk + 1)(µ 1 -µ 0 )(tk + 1 -∆k) (tk + t + ∆k)(tk + 1 -∆k)(k + 1) . We have |u|: |u| = ∆k (k -hk)(µ 1 -µ 0 )(tk + 1 + ∆k) (tk + t + ∆k)(tk + 1 -∆k)(k + 1) Neglecting the common terms of |v| and |u|, we only need to compare the following two terms: |(1 + hk)(tk + 1 -∆k)| |(k -hk)(tk + 1 + ∆k)| Both of of them are positive because 1 > h > 0.5 and (2h -1)t > ∆ . The former term subtract the latter term: 1 + (2h + t -∆ -1)k + (2th -t -∆)k 2 21) As h > 0.5 and (2h -1)t > ∆ due to the assumption, eq. ( 21) is positive. To sum up, D KL-IN S > D KL-DEL . This conclusion also holds for class 1. We can conclude that the distribution shift caused by insertion is larger.

Synthetic Graph

We build a graph containing two types of nodes. The features of class 0 ∈ R 10 and sampled from N (0, Λ 0.2 ), and the features of class 1 are sampled from N (1, Λ 0.2 ), where Λ 0.2 is a diagonal matrix, and each element is 0.2. There are 150 nodes in class 0 and class 1, respectively, and all nodes of the same class are connected to each other. Thus, this graph is dense and the possibilities to insert and delete is balanced. We conduct MetaAttack to attack this graph, and the results are listed in Table 10 . We find that MetaAttack still tends to inset but not delete edges. Suppose the node features follow distribution like the normal distribution, in which probability density is higher around the mean. In that case, the aggregation can make the high-degree nodes move to the high-density region and reduce the variance. High-density data can be easily classified and is insensitive to noise Zhu et al. (2022a) . According to Li et al. (2022) , we can trust them more and assign them higher weights during the aggregation: h t i = ReLU j∈Ni (d i d j ) 0.5 Z h t-1 j W t θ , ( ) where d is the degree, and Z is a normalization coefficient. This only modifies the aggregation weights in the GCN and can be merged into any Robust GNNs with vanilla GCN. Table 11 demonstrates that assigning high-density data higher weights can indeed improve the robustness of GNNs. This trick just slightly leverages the properties of high-degree nodes. However, one also might apply a more sophisticated method which we leave for future work.

A.7 HEURISTIC ATTACK

The algorithm of our heuristic attack is shown in Algorithm 1. We first construct the candidate attacking set C that contains the nodes whose degrees are lower than the average degree. Then we divide C into C train and C test . We generate cross-label edges on the graph according to the data split. Specifically, we compute λ 1 , λ 2 , and λ 3 using Eq. ( 3). Here we suppose the training set is much smaller than the testing set, i.e., λ 1 < λ 3 . The total perturbation is N ptb = ∆ |E|. We inject N ptb λ1 λ1+λ2 cross-label edges into the Train-Train area and N ptb λ2 λ1+λ2 into the Train-Test area. For the nodes in the training set we label them by the pseudo labels predicted by a two layer vanilla GCN. In Fig. 6 we compare the proposed algorithm with other attack methods on Citeseer and ogbn-arxiv. On Citeseer, our method can achieve comparable performance to MetaAttack, and even better under low perturbation rates. The average runtime of 10 runs is shown in Table 12 . Gradient-based methods need to optimize all possible entries in the dense adjacency matrix A, which comes with expensive computation and quadratic space complexity. Our methods and DICE are both rule-based, much faster, and 5 ). To summarize, the clean information includes the labels of training nodes and the local structure of test nodes. Thus, a straightforward method to enhance the robustness is self-training Li et al. (2018) . We can assign pseudo-labels to the testing nodes and then train the GNN via the pseudo-labels and the clean local structure. Here we provide a very simple implementation. We use an MLP instead of a GNN to acquire the pseudo-labels because the local structure of training nodes is contaminated. Specifically, we first train an MLP with given labels, then select the predictions with the highest confidence for each class by comparing the softmax scores and add them to a new label set V psu . We finally train a new GCN by computing cross-entropy loss on V psu . This Self-Training Robust GCN (STRG) is described in Algorithm 2. Considering this method from the perspective of data distribution, it actually reduces the distribution shift effectively by training with p test . Although the pseudo-labels will introduce some new problems, like label noise, it can be viewed as a trade-off between performance and robustness. We compare STRG with some baselines and SOTA robust GNNs on two datasets under MetaAttack in Table 13 . We set m = 80 in these two datasets GNNGuard Zhang & Zitnik (2020) and ProGNN Jin et al. (2020) 



Figure 1: Left: The adjacency matrix of Cora attacked by MetaAttack, in which the blue pots are original edges, and the reds are adversarial edges. The green dotted line is the boundary of training nodes and testing nodes. Right:The location statistics of adversarial edges on the Cora dataset under different perturbation rates. Train-Train means the perturbed edge links two nodes from the training set. Train-Test and Test-Test follow the same rule.

Figure 3: The performance of GCN attacked by MetaAttack and two variants. Directed→Train variant prevented testing nodes from aggregating information from training nodes through adversarial edges in Train-Test, while Directed→Test is on the contrary. The data split follows 10%/ 10%/ 80/%(train/ val/test).

Figure 4: Attacking GCN on Cora, Citeseer and ogbn-arxiv where the DICE imp and Ran imp denote variants of DICE and Random that attack the smaller part of the graph.

EVASION ATTACK Tip 4: Do not waste the limited bullets on the training nodes. The classifier is already trained in evasion attack. In other words, p θ (y| x) is fixed, while the only effect of attacking the training set is to bias the model. Thus, attackers should avoid modifying the structures of training nodes, especially the edges in the Train-Train area. According to Table

Figure 5: The adjacency matrix of Cora and Citeseer attacked by MetaAttack and PGD with different training size.

Now we compare D KL-IN S and D KL-DEL . k+1-∆k t k+1

Figure 6: The adversarial accuracy of GCN attacked by different methods on Citeseer and ogbn-arxiv. PGD and MetaAttack face the OOM problem on ogbn-arxiv.



The IS on Cora dataset under 10% perturbation rate.

TO INSERT BUT NOT DELETE

Tip 6: Defend the vulnerable part of the graph. To defend against graph adversarial attacks, many studies have been proposed around the central concept of Graph Structure Learning (GSL)Li et al. (2022);Wu et al. (2019);Zhang & Zitnik (2020);Jin et al. (2020);Zhu et al. (2021b), which aims to optimize the perturbed structure. An attacker can achieve good results by attacking distributions that can be easily perturbed. Likewise, the defender can mainly optimize the corresponding structures. Jaccard and GNNGuardZhang & Zitnik (2020) are two representative GSL methods that refine the graph structure via the similarity of the node features. In Table2, we present the experimental results for vanilla Jaccard and GNNGuard and the results if they only refine the structure of the training set. The improvement implies that the modifications made by vanilla Jaccard and GNNGuard on the testing structure are helpless. Adversarial accuracy(%) on Cora and Citeseer attacked by MetaAttack (stronger is bold). The asterisk indicates that the GSL methods only optimize the local structure of the traning nodes.

The robustness of GNNs has become an emerging research problem, especially for security-critical domains, e.g., credit scoring or fraud detection. For instance, in graph fraud transaction detection, fraudsters can conceal themselves by deliberately dealing with common users, which may generate adversarial edges. In this work, we provide a new perspective on studying the robustness of graph models. A better understanding of the attack methods and the structural vulnerability of GNNs can help us improve the security level in these domains. Meanwhile, although we offer examples of how to leverage the tips, most of them can be further explored and used in more sophisticated ways. In conclusion, we think this work will not pose a security risk and can positively affect research in this area. Dataset statistics.PGD Xu et al. (2019a),DICE Waniek et al. (2018), JaccardWu et al. (2019),SimpGCN Jin et al. (2021) andProGNN Jin et al. (2020). We perform FGSM according toDai et al. (2018).STABLE Li et al. (2022), GNNGuard, and ElasticLiu et al. (2021a)  are implemented with the code provided by the authors.All the hyper-parameters are tuned based on the loss and accuracy of the validation set. For Jaccard, the Jaccard Similarity threshold are tuned from {0.01, 0.02, 0.03, 0.04, 0.05}. For GNNGuard, ProGNN, SimpGCN, and Elastic, we use the default hyper-parameter settings in the authors' implementation.A.2 THE DISTRUBION OF ADVERSARIAL EDGESTable4shows the distribution of adversarial edges on the graph, attacked by MetaAttack, PGD, and FGSMDai et al. (2018). Only MetaAttack can adjust the distribution of the perturbations according to the size of the training set. Specifically, it consistently generates perturbations around the smaller one in the training set and the testing set. PGD and FGSM always focus on attacking training nodes, resulting in poor performance when the training set size becomes large. Fig. 5 is the visualization of the adjacency matrices of Cora and Citeseer. Compared with the results in Table4and Table1, we can see that IS can reflect the density of perturbations in each area of the adjacency matrix. MetaAttack adjust such density according to the size of training set. It focuses on attacking the smaller of the training set and the testing set, and the perturbations are nearly uniformly distributed when the training size is 0.5. On the contrary, PGD and FGSM consistently attacks the training nodes regardless of the data split. Consequently, MetaAttack can easily fool GCN no matter how large is the training set, while PGD failed when the training size increases.

The IS of MetaAttack, PGD, and FGSM under 10% perturbation rate. The Acc indicates the adversarial accuracy, and for attack methods a lower value is better. Clean is the accuracy of the vanilla GCN on clean graphs. We highlight the strongest attack in bold.

The location statistics of adversarial edges on Cora under 10% perturbation. Num represents the number of adversarial edges in the corresponding area. Evasion and Poison are the adversarial accuracy ofGCN Kipf & Welling (2017)  under evasion and poisoning attack. We highlight the stronger attack in bold.Attack IS 1 /Num IS 2 /Num IS 3 /Num clean Evasion Poison

The location statistics of adversarial edges on Citeseer attacked by PR-BCD.

Table7shows the performance of Meta self , Meta train , Detach self , Detach train , Fix self , and Detach train . Detach and Fix are two variants of MetaAttack. The former detaches the parameters from the meta gradient calculation and the latter fixes the parameters after training in the clean graph. Meta self is the only one that always attacks the more easily perturbed distribution in p train and p test . Other attack methods with L self will only attack the testing nodes, but Meta self will attack the training nodes when the size of the testing set is samll. It is because the meta gradient will take the training process into account by Eq. 3, which is associated with the training nodes. Thus, the local structure of the training nodes is related to the meta gradient computation. Detach and Fix perform like PGD that the tendency is closely related to the surrogate loss, i.e., L self or L train .As expected, Meta self outperforms all the variants regardless of the data split. It is worth noting that variants with L train usually performs better when the training set is small, while variants with L self are more destructive when the testing set is small. Different Surrogate Loss To ensure this tendency is not caused by a specific type of loss, we explore the effect of other two losses in Graph Attack, including the Carlini-Wagner lossXu et al.  (2019a)  CW= min(max c̸ =c * z c -z c * , 0).and Masked Cross EntropyMCE Geisler et al. (2021)   = 1/V + | i∈V+ -log(p

The location statistics of perturbations and the performance of GCN poisoning attack on Cora under 10% perturbation. IS 3 -0.8 is the IS 3 when the size of the training set is 0.8 and reveals the adaptivity of different variants. We highlight the strongest attack method in bold.AttackIS 1 /Num IS 2 /Num IS 3 /Num Poison IS 3 -0.8 Poison-0.8

The IS of Meta self with two different surrogate losses, CW, and MCE under 10% perturbation rate. The Acc indicates the adversarial accuracy, and for attack methods a lower value is better. Clean is the accuracy of the vanilla GCN on clean graphs.

Adversarial accuracy(%) on Cora and Citeseer attacked by Meta self and Meta true . Ptb rate Meta self Meta true

Share (in %) of edge deletions and insertions by MetaAttack on the synthetic graph.

Average time cost (s) of different attack methods on Citeseer and ogbn-arxiv after 10 runs. 80% is the most commonly used data split in graph adversarial attack, under which effective poisoning attack methods like MetaAttack and PGD tend to attack the training nodes, so the local structure of testing nodes is nearly clean (according to Fig.

are two robust GNNs with structure learning.SimpGCN Jin et al. (2021) utilizes a kNN Graph to keep the nodes with similar features close in the representation space and a self-learning regularization to keep the nodes with dissimilar features remote. ElasticLiu et al. (2021a)  introduces ℓ 1 -norm to graph signal estimator and proposes elastic message passing which is derived from one-step optimization of such estimator. The local smoothness adaptivity enables the Elastic GNNs robust to structural attacks.STABLE Li et al. (2022)  optimizes the graph structures by unsupervised representations learned by contrastive learning. The data split is 10%/10%/80%, and we set the perturbation rate from 0% to 20%. The implementation of these methods follows Appendix A.1, and the hyper-parameter t in STRG is 80.

acknowledgement

10 ACKNOWLEDGEMENT The research work supported by National Key R&D Plan No. 2022YFC3303302, the National Natural Science Foundation of China under Grant No. 61976204. Xiang Ao is also supported by the Project of Youth Innovation Promotion Association CAS and Beijing Nova Program Z201100006820062.

annex

space-saving. Our method runs faster than DICE in ogbn-arxiv because ours randomly sample nodes from the candidate set, which is much smaller than the entire node-set.Our heuristic attack method is coarse-grained and straightforward, and we do not aim to propose a SOTA mode. We try to clarify that by following Tips 1 to 5 and inheriting the tendencies which enlarge the distribution shift, performance close to that of the gradient-based method can be achieved efficiently. More than that, it can be scaled to large graphs. We can observe that STRG outperforms other methods under different perturbation rates. In particular, STRG achieves almost complete robustness on the Citeseer. The performance shows no degradation as the perturbation rate rises. Additionally, MLP is much faster than graph models, so this self-training strategy can be easily scaled to large graphs. Here we use GCN as the downstream task classifier, but in fact any GNNs can be merged with it.We design this method with a hypothesis, i.e., that adversarial edges are mostly located around training nodes, but attackers can perturb p test to make an unbiased model predict the biased data. However, here is also a trade-off between performance and unnoticeabilty for attackers. If the attacker spreads their attacks over the whole graph instead of focusing on the training set, the performance will drop considerably (we will elaborate on this in Appendix A.9). In a nutshell, STRG can successfully defend against effective attack methods, and when an attacker tries to bypass the defense strategy of STRG, the attack will fail.A.9 DISCUSSION ON THE DATA SPLIT of data split. Data split is necessary when implementing PGD and MetaAttack, so we conduct the attack in a random split, which is inconsistent with the split during testing the classifier. PGD without and Meta without are not as effective as the vanilla models, and their performance is even close to the random attack. It is hard to effectively manipulate the prediction without the information of the data split. Meanwhile, attackers and defenders can easily enhance their model by attacking or defending the more vulnerable part of the graph. It is essential to realize that the leakage of the data split can pose a severe security risk.Moreover, the public split, i.e., 20 nodes per class as the training set, will make the training set significantly small so that p train is easy to be perturbed. We list the performance of poisoning attack methods under different data splits in Table 14 . The attack algorithms can easily work on the public split. In addition, such a small training set might make the attack noticeable. As shown in Fig. 8 , under a small perturbation rate (e.g. 5%), if attackers only modify the local structure around the training nodes, the attack is easily detected.

