NEAR-BLACK-BOX ADVERSARIAL ATTACKS ON GRAPH NEURAL NETWORKS AS AN INFLUENCE MAX-IMIZATION PROBLEM

Abstract

Graph neural networks (GNNs) have attracted increasing interests. With broad deployments of GNNs in real-world applications, there is an urgent need for understanding the robustness of GNNs under adversarial attacks, especially in realistic setups. In this work, we study the problem of attacking GNNs in a restricted near-black-box setup, by perturbing the features of a small set of nodes, with no access to model parameters and model predictions. Our formal analysis draws a connection between this type of attacks and an influence maximization problem on the graph. This connection not only enhances our understanding on the problem of adversarial attack on GNNs, but also allows us to propose a group of effective near-black-box attack strategies. Our experiments verify that the proposed strategies significantly degrade the performance of three popular GNN models and outperform baseline adversarial attack strategies.

1. INTRODUCTION

There has been a surge of research interest recently in graph neural networks (GNNs) (Wu et al., 2020) , a family of deep learning models on graphs, as they have achieved superior performance on various tasks such as traffic forecasting (Yu et al., 2017) , social network analysis (Li et al., 2017) , and recommender systems (Ying et al., 2018; Fan et al., 2019) . Given the successful applications of GNNs in online Web services, there are increasing concerns regarding the robustness of GNNs under adversarial attacks, especially in realistic scenarios. In addition, the research about adversarial attacks on GNNs in turns helps us better understand the intrinsic properties of existing GNN models. Indeed, there have been a line of research investigating various adversarial attack scenarios for GNNs (Zügner et al., 2018; Zügner & Günnemann, 2019; Dai et al., 2018; Bojchevski & Günnemann, 2018; Ma et al., 2020) , and many of them have been shown to be, unfortunately, vulnerable in these scenarios. In particular, Ma et al. (2020) examine an extremely restricted nearblack-box attack scenario where the attacker has access to neither model parameters nor model predictions, yet they demonstrate that a greedy adversarial attack strategy can significantly degrade GNN performance due to the natural inductive biases of GNN binding to the graph structure. This scenario is motivated by real-world GNN applications on social networks, where attackers are only able to manipulate a limited number of user accounts, and they have no access to the GNN model parameters or predictions for the majority of users. In this work, we study adversarial attacks on GNNs under the aforementioned near-black-box scenario. Specifically, an attack in this scenario is decomposed into two steps: 1) select a small set of nodes to be perturbed; 2) alter the node features according to domain knowledge up to a pernode budget. The focus of the study lies on the node selection step, so as in Ma et al. (2020) . The existing attack strategies, although empirically effective, are largely based on heuristics (Ma et al., 2020) . We instead formulate the adversarial attack as an optimization problem to maximize the mis-classification rate over the selected set of nodes, and we carry out formal analysis regarding this optimization problem. The proposed optimization problem is combinatorial and seems hard to solve in its original form. In addition, the mis-classification rate objective involves model parameters which are unknown in the near-black-box setup. We mitigate these difficulties by rewriting the problem and connecting it with influence maximization on a special linear threshold model related to the original graph structure. Inspired by this connection, we show that, under certain distribu-tional assumptions about the GNN, the expected mis-classification rate is submodular with respect to the selected set of nodes to perturb. The expected mis-classification rate is independent of the model parameters and can be efficiently optimized by a greedy algorithm thanks to its submodularity. Therefore, by specifying concrete distributions, we are able to derive a group of near-black-box attack strategies maximizing the expected mis-classification rate. The connection with influence maximization also provides us nice interpretations regarding the problem of adversarial attack on GNNs. To empirically verify the effectiveness of the theory, we implement two near-black-box adversarial attack strategies and test them on three popular GNN models, Graph Convoluntioal Network (GCN) (Kipf & Welling, 2016) , Graph Attention Network (GAT) (Veličković et al., 2018) , and Jumping Knowledge Network (JKNet) (Xu et al., 2018) with common benchmark datasets. Both attack strategies significantly outperform baseline attack strategies in terms of decreasing model accuracy. Finally, we summarize the contributions of our study as follows. 1. We formulate the problem of adversarial attack on GNNs as an optimization problem to maximize the mis-classification rate. 2. We draw a novel connection between the problem of adversarial attacks on GNNs and influence maximization based on a linear threshold model. This connection helps us develop effective and efficient near-black-box adversarial attack strategies and provides interpretations regarding the adversarial attack problem. 3. We implement two variants of the proposed near-black-box attack strategies and empirically demonstrate their effectiveness.

2. RELATED WORK

There has been increasing research interest in adversarial attacks on GNNs recently. Detailed expositions of existing literature are made available in a couple of survey papers (Jin et al., 2020; Sun et al., 2018) . Given the heterogeneous nature of diverse graph structured data, there are numerous adversarial attack setups for GNN models. Following the taxonomy provided by Jin et al. (2020) , the adversarial attack setup can be categorized based on (but not limited to) the machine learning task, the goal of the attack, the phase of the attack, the form of the attack, and the model knowledge that attacker has access to. First, there are two common types of tasks, node-level classification (Zügner et al., 2018; Dai et al., 2018; Wu et al., 2019; Entezari et al., 2020) and graph-level classification (Tang et al., 2020; Dai et al., 2018) . The goal of the attack can be changing the predictions of a small and specific set of nodes (targeted attack) (Zügner et al., 2018; Dai et al., 2018) or degrading the overall GNN performance (untargeted attack) (Zügner & Günnemann, 2019; Sun et al., 2019) . The attack can happen at the model training phases (poisoning attack) (Zügner & Günnemann, 2019; Sun et al., 2019) or after training completes (evasion attack) (Dai et al., 2018; Chang et al., 2020) . The form of the attack could be perturbing the node features (Zügner et al., 2018; Ma et al., 2020) or altering the graph topology (Dai et al., 2018; Sun et al., 2019) . Finally, depending on the knowledge (e.g. model parameters, model predictions, features, and labels, etc.) the attacker has access to, the attacks can be roughly categorized into white-box attacks (Xu et al., 2019) , grey-box attacks (Zügner et al., 2018; Sun et al., 2019) , black-box attacks (Dai et al., 2018; Chang et al., 2020) or near-black-box (Ma et al., 2020) . However, it is worth noting that the borders of these three categories are blurry in literature. The setup of interest in this paper can be categorized as node-level, untargeted, evasional, and nearblack-box attacks by perturbing the node features. While each setup configuration might find its suitable application scenarios, we believe that near-black-box setups are particularly important as they are associated with many realistic scenarios. Among the existing studies on node-level blackbox attacks, most of them (Bojchevski & Günnemann, 2018; Chang et al., 2020; Dai et al., 2018) still allow access to model predictions or some internal representations such as node embeddings. In this paper, we follow the most strict near-black-box setup (Ma et al., 2020) to our knowledge, which prohibits any probing of the model. Compared to Ma et al. (2020) , we develop attack strategies by directly analyzing the problem of maximizing mis-classification rate, rather than relying on heuristics. We remark that there are also plenty of existing works investigating adversarial attacks on non-GNN models (Wang & Gong, 2019; Zhang et al., 2019) , which we consider less relevant to this work, and refer the readers to the survey papers (Jin et al., 2020; Sun et al., 2018) for more details.

3.1. NOTATIONS

We start by introducing notations that will be used across this paper. Suppose we have an attributed graph G = (V, E, X, y), where V = {1, 2, • • • , N } is the set of N nodes, E ⊆ V × V is the set of edges, X ∈ R N ×D is the node feature matrix with D-dimensional features, and y ∈ {1, 2, • • • , K} N is the node label vector with K classes. We also denote a random walk transition matrix on the graph as M ∈ R N ×N . For any 1 ≤ i, j ≤ N , M ij = 1/|N i | if (i, j ) ∈ E or i = j, and M ij = 0 otherwise. To ease the notation, for any matrix A ∈ R D1×D2 in this paper, we refer A j to the transpose of the j-th row of the matrix, i.e., A j ∈ R D2 . We consider a GNN model f : R N ×D → R N ×K that maps from the node feature matrix X to the output logits of all nodes (denoted as H f (X) ∈ R N ×K ). Let N i = {j ∈ V | (i, j) ∈ E} ∪ {i} be the set of neighbors of node i, including itself. We assume the GNN f has L layers, with the l-th layer (0 < l < L) at node i taking the form H (l) i = ReLU j∈Ni α ij W (l) H (l-1) j . W (l) is the learnable weight matrix, ReLU(•) is an element-wise ReLU activation function, and different GNNs have different normalization terms α ij . We also define H (0) = X and H = H (L) = j∈Ni α ij W (L) H (L-1) j . Later in Section 4, we carry out our analysis on a GCN model with α ij = 1/|N i | (Hamilton et al., 2017) .

3.2. THE NEAR-BLACK-BOX ADVERSARIAL ATTACK SETUP

Next we briefly introduce the near-black-box adversarial attack setup proposed by Ma et al. (2020) . The goal of the attack is to perturb the node features of a few carefully selected nodes such that the model performance is maximally degraded. The attack is decomposed into two steps. In the first step, the attacker selects a set of nodes S ⊆ V to be perturbed, under two constraints |S| ≤ r and |N i | ≤ m, ∀i ∈ S for some 0 < r N and 0 < m max i |N i |. These two constraints prevent the attacker from manipulating a lot of nodes or very important nodes as measured by the node degree, which makes the setup more realistic. In the second step, the attacker is allowed to add a small constant perturbation ∈ R D to each node in S, i.e., let the perturbed feature be X i X i + for i ∈ S. The perturbation vector is constructed based on the domain knowledge about the task but without access to the GNN model. For example, if the GNN model facilitates a recommender system for social media, an attacker may hack a handful of carefully selected users and manipulate their demographic features, posts, or browsing trajectories to get more users exposed to certain political content the attacker desires. In practice, the perturbation vector can be tailored for different nodes given personalized knowledge about each node. But following Ma et al. (2020) , we consider the worst case where no personalization is available.

3.3. INFLUENCE MAXIMIZATION ON A LINEAR THRESHOLD MODEL

Given an information/influence diffusion model on a social network, influence maximization is the problem of finding a small seed set of users such that they spread the maximum amount of influence over the network. In a linear threshold model (Kempe et al., 2003) , the influence among nodes is characterized by a weighted directed adjacency matrix I ∈ R N ×N where I ij ≥ 0 for each (i, j) ∈ E and I ij = 0 for each (i, j) / ∈ E. Given a seed set of nodes being activated at initial state, the influence passes through the graph to activate other nodes. There is a threshold vector η ∈ R N associated with the nodes, indicating the threshold of influence each node must have received from its active neighbors before it becomes activated. In particular, when the influence propagation comes to a stationary point, a node i outside the seed set will be activated if and only if j∈Ni,j is activated I ij ≥ η i . (1) Figure 1: An illustrative example of the linear threshold model on the derived directed bipartite graph. To simplify the visualization, the GNN is assumed to have 1 layer, and therefore the derived directed bipartite graph have links from its zero-th (itself) and first order neighbors in the original graph. For a GNN with k layers, the derived directed bipartite graph will have links from all its l-th order neighbors in the original graph, for any 0 ≤ l ≤ k. Each target node i has its own threshold θ i to be influenced (mis-classified). The edge weight depends on the random walk transition from the seed node to the target node.

4. ANALYSIS OF THE ADVERSARIAL ATTACK PROBLEM

In this section, we investigate how to develop adversarial attack strategies under the near-black-box setup stated in Section 3.2 in a principled way.

4.1. NODE SELECTION FOR MIS-CLASSIFICATION RATE MAXIMIZATION

Suppose an attacker wants to attack a well-trained L-layer GCN model f . Following the twostep attack procedure, the attacker first selects a valid node set S ∈ C r,m {T ⊆ V | |T | ≤ r, |N i | ≤ m, ∀i ∈ T } for some given constraints r and m. Then the constant perturbation is added to the feature of each node in S, which leads to a perturbed feature matrix X(S, ). Since our primary interest is the design of the node selection step, we shall omit and just write the perturbed feature as X(S) for simplicity. We denote the output logits of the model after perturbation as H(S) = f (X(S)). Clearly, H(∅) equals to the matrix of output logits without attack. In an untargeted attack, the attacker wants the model to make as many mistakes as possible, which is best measured by the mis-classification rate. Therefore we formulate the problem of selecting the node set as as an optimization problem maximizing the mis-classification rate over S, with the two constraints quantified by r, m: max S∈Cr,m N j=1 1 max k=1,••• ,K H jk (S) = H jyj (S) , where 1 [•] is the indicator function. We drop normalizing constant 1/N in mis-classification rate. At the first glance, the optimization problem (2) is a combinatorial optimization problem with a complicated objective function involving neural networks. In the following section, we demonstrate that, under a simplifying assumption, it can be connected to the influence maximization problem.

4.2. CONNECTION TO THE INFLUENCE MAXIMIZATION ON LINEAR THRESHOLD MODEL

We first introduce a simplifying assumption of ReLU that has been widely used to ease the analysis of neural networks (Choromanska et al., 2015; Kawaguchi, 2016) , including GCN (Xu et al., 2018) . Assumption 1 (Xu et al. (2018) ). All the ReLU activations activate independently with the same probability, which implies that all paths in the computation graph of the GCN model are independently activated with the same probability of success ρ. Under Assumption 1, we are able to define H(S) E path [H(S)] for any S ⊆ V , where E path [H(S)] indicates the expectation of H(S) over the random activations of ReLU functions in the model. Then we can rewrite problem (2) in a form that is similar to the influence maximization objective on a linear threshold model. The influence weight matrix is defined by the L-step random walk transition matrix B M L . And the threshold for each node is related to the original output logits H(∅), the perturbation vector , and the product of the GCN weights W ρ  • 1 l=L W (l) ∈ R K×D . Formally, 1 i∈S B ji > θ j , where, for kj = argmax k=1,••• ,K Hjk (S), θ j Hjyj (∅) -Hj kj (∅) (W kj -W yj ) T . ( ) In particular, if kj = y j , we define θ j = ∞. Interpretations of the new objective (3). The new optimization objective (3) has nice interpretations. The L-step random walk transition matrix measures the pairwise influence from input nodes to target nodes in the GCN model and i∈S B ji can be viewed as measuring the influence of nodes in S on a target nodes j. In each θ j , the numerator Hjyj (∅) -Hj kj (∅) can be viewed as the logit margin between the correct class and those wrong classes, which measures the robustness of the prediction on node j. The denominator (W kj -W yj ) T measures how effective the perturbation is. In combination, θ j measures how difficult it is to mis-classify the node j with perturbation . This new objective nicely separates the influence between nodes and the node-specific robustness. Note the form of each term inside the summation over N in Eq. ( 3), 1 i∈S B ji > θ j , is very similar to that of Eq. (1). In fact, the objective (3) can be viewed as the influence maximization objective on a directed bipartite graph derived from the original graph, as shown in Figure 1 . The derived bipartite graph has N nodes on both sides (assuming we call them the seed candidate side S and target node side T ), and there are edges pointing from side S to side T but not the converse way. The edge weight from the node i on the side S to the node j on the side T (1 ≤ i, j ≤ N ) is defined as B ji . Then it is easy to see that the problem (3) is equivalent to the influence maximization problem on the bipartite graph with the node-specific thresholds being θ j , j = 1, • • • , N . Two difficulties for solving the problem (3). While we now have got better interpretations of the original mis-classification rate maximization problem in terms of influence maximization, we still face two major difficulties before we can develop an algorithm to solve the problem. The first difficulty is that we do not known the value of θ in a near-black-box attack setup as it involves the model parameters. The second difficulty is that, even if θ is given, influence maximization on the seemingly simple bipartite graph is still NP-hard, as we show in Lemma 1. Lemma 1. The influence maximization problem on a directed bipartite graph with linear threshold model is NP-hard.

4.3. ASSUMPTIONS ON THE THRESHOLDS

In this section, we mitigate the aforementioned two difficulties by making distribution assumptions on the thresholds θ. It is well-known that if the threshold θ j of each node j is drawn uniformly at random from the interval [0, 1], the expected objective of a general linear threshold model is submodular, which leads to an efficient greedy algorithm that solves the expected influence maximization problem with a performance guarantee (Kempe et al., 2003) . In light of this fact regarding the general linear threshold model, we show (in Proposition 2) that a mild assumption on the distribution of θ will guarantee the expectation of the objective (3) to be submodular, thanks to the simple bipartite structure. Proposition 2. Suppose the individual thresholds are random variables drawn from some distributions, and the marginal cumulative distribution function of the threshold θ j for node j is F j , j = 1, • • • , N . If F 1 , • • • , F N are individually concave in the domain [0, +∞), then the expectation of the objective (3), h(S) E θ1,••• ,θ N N j=1 1 i∈S B ji > θ j , is submodular. Note that here we do not need the thresholds θ to be independent from each other, and we only require the marginal probability density function of each θ j to be non-increasing on the positive region. Proposition 2 partially addresses the second difficulty. While we still do not have a solution to the original problem (3), we now know that for a wide range of distributions of θ, the expected misclassification rate is submodular and can be approximated efficiently through a greedy algorithm. For the first difficulty, we propose to explicitly specify a simple distribution for θ and optimize the expected mis-classification rate h(S), which no longer involves any model parameters and gives us a near-black-box attack strategy. While this seems to radically deviate from the original optimization objective (3), in the following Section 5, we empirically show that we only need a crude characterization of the distribution of θ to obtain effective attack strategies. Concrete near-black-box attack strategies. Below we derive two concrete near-black-box attack strategies by specifying the distribution of θ to be uniform distributions and normal distributions respectively. Corollary 1. If a, b > 0 and θ j i.i.d. ∼ uniform (-b, a), then h(S) = 1 a + b N j=1 min( i∈S B ji , a) + b , and h(S) is submodular. Corollary 2. If σ > 0 and θ j i.i.d. ∼ N (0, σ 2 ), then h(S) = 1 2 N j=1 1 + erf i∈S B ji σ √ 2 , where erf(•) is the Gauss error function. And h(S) is submodular. Corollary 1 and 2 follow directly from Proposition 2 given the cumulative distribution functions of the uniform distribution and the normal distribution as well as the fact that they are concave at the positive region. In particular, Eq. ( 6) belongs to a well-known submodular function family named the saturated coverage function (Lin & Bilmes, 2011; Iyer & Bilmes, 2015) . Under assumptions in Corollary 1, the adversarial attack problem reduces to the classic influence maximization problem under the linear threshold model where the thresholds follow uniform distributions. We name the attack strategies obtained by greedily maximizing the objectives ( 6) and ( 7) as InfMax-Unif and InfMax-Norm respectively. Specifically, each strategy iteratively selects nodes into the set to be perturbed up to a given size. At each iteration, the node, combining with the existing set, that maximizes Eq. ( 6) or Eq. (7) will be selected.

4.4. DISCUSSIONS ON THE APPROXIMATIONS

From problem (3) to our final attack strategies, we have made two major approximations to address the two difficulties that we raised at the end of Section 4.2. The first approximation is we go from the original optimization problem to its expected version. Note that θ depends on both the model parameters and the data, which we do not have full access to. The first approximation treats them as random, and takes expectation over θ, which integrates out the randomness in data and the model training process. And the resulted expected objective function h(S) is submodular under the conditions in Proposition 2. A natural question regarding this approximation is how does the mis-classification rate (3) concentrate around its expectation (5)? If θ are independent, the indicator variables in (5) are also independent, and it is easy to show the mis-classification rate is well-concentrated for a large graph size N through Hoeffding's inequality. However, the independence assumption is unrealistic in the case of GNN as the predictions of adjacent nodes should be correlated. Further note that θ can be written in terms of linear combinations of node features. With extra assumptions on the node features and the graph structure, one may be able to carry out finer analysis on the covariance of θ, and thus how well the mis-classification rate concentrates. We leave this analysis for future work. The second approximation is that we further specify simple distributions of θ, which highly likely deviate much from the real distribution. On one hand, our superior empirical results shown in Section 5 suggest that these simple strategies are practical enough for some applications. On the other hand, this leaves room for further improvement in real-world scenarios if we have more knowledge regarding the distribution of θ. For example, if an attacker has a very limited number of API calls to access the model predictions, these calls are probably not enough to train a reinforcement-learningbased attack strategies but they can be effectively used to better estimate the distribution of θ.

5. EXPERIMENTS

In this section, we first empirically evaluate the performance of the proposed attack strategies, InfMax-Unif and InfMax-Norm, against several baseline attack strategies. We also visualize the distributions of θ to gain a better understanding of the approximations we made.

5.1. ATTACK STRATEGIES FOR COMPARISON

Implementation of InfMax-Unif and InfMax-Norm. For the proposed InfMax-Unif and InfMax-Norm, there are two hyper-parameters respectively to be specified. Recall B = M L , the first hyperparameter for both method is L. We set L = 4 following RWCS and GC-RWCS. We note that, for the attack strategies to be effective in practice, the hyper-parameter L does not have to be the same as the number of layers of the GNN being attacked, as we will show in the experiments. For InfMax-Unif, there are two additional distribution hyper-parameters a, b. However, b does not influence the selection of nodes so we only need to specify a. For InfMax-Norm, we need to specify the distribution parameter σ. We fix a = 0.01 and σ = 0.01 across all the experiment setups. Theoretically, the optimal choice of a or σ should depend on the perturbation vector as well as the dataset. However, we find the proposed InfMax-Unif and InfMax-Norm strategies are fairly robust with respect to the choice of a or σ (see the sensitivity analysis in Appendix A.3). Baseline strategies. We compare with five baseline strategies, Degree, Betweenness, PageRank, Random Walk Column Sum (RWCS), and Greedily-Corrected RWCS (GC-RWCS). The first three strategies, as suggested by their names, correspond to three node centrality scores. These strategies select nodes with the highest node centrality scores subject to the constraint C r,m . RWCS and GC-RWCS are two near-black-box attack strategies proposed by Ma et al. (2020) . RWCS is derived by maximizing the cross-entropy classification loss but with certain approximations. In practice, RWCS has a simple form: selects nodes with highest importance scores defined as I(i) = N j=1 [M L ] ji (recall that M is the random walk transition matrix). We set the hyperparameter L = 4 following Ma et al. (2020) . GC-RWCS further applies a few heuristics on top of RWCS to achieve better mis-classification rate. Specifically, it dynamically updates the RWCS importantce score based on a heuristic. It also removes a local neighborhood of the selected node after selecting each node. In the experiment, we set the hyper-parameters of GC-RWCS L = 4, l = 30, and k = 1 as suggested in their original paper. Interestingly, RWCS can be viewed as a special case of InfMax-Unif if we set a = ∞ (or large enough). And GC-RWCS without removing the local neighborhood step can also be viewed a modified version of InfMax-Unif.

5.2. THE ATTACK EXPERIMENT

Experiment setup. We follow exactly the same experiment setup in Ma et al. (2020) except for that we further include the GAT model (Veličković et al., 2018) . So we only briefly introduce the setup here due to the page limit, and refer to Appendix A.2 and Ma et al. (2020) for more details. We test attack strategies on 3 popular GNN models, (2-layer) GCN (Kipf & Welling, 2016) , (2layer) GAT (Veličković et al., 2018) , and (7-layer) JK-Net (Xu et al., 2018) , for 3 public benchmark datasets, Cora, Citeseer and Pubmed (Sen et al., 2008) . We apply the attack strategies following the two-step procedure stated in Section 3.2. For the node selection step, we limit the number of nodes to be attacked, r, as 1% of the graph size for each dataset. We test on two setups of the node degree threshold, m, by setting it equal to the lowest degree of the top 10% and 30% nodes respectively. For the feature perturbation step, we follow the same way as in Ma et al. (2020) to construct the constant perturbation vector . Experiment results. We provide the attack experiment results in Table 1 . We show the model accuracy after applying each attack strategy in each dataset and model combination, the lower the better. We also include the model accuracy without attack (None) and with an attack under random node selection (Random) for reference. As can be seen in Table 1 , both the proposed attack strategies achieve better attack performance than all baselines on all but one setups, out of the 18 setups in total. And most of the differences are statistically significant. We highlight that, compared to the strongest baseline, GC-RWCS, our methods have fewer hyper-parameters and better interpretation. In addition, the neighbor-removal heuristic also contributes to the performance of GC-RWCS method, while our methods outperform GC-RWCS without such additional heuristics.

5.3. VISUALIZING THE DISTRIBUTIONS OF θ

We also empirically investigate the distributions of θ to see how likely their PDFs are non-increasing on the positive domain. In particular, given the parameters of a well-trained GCN, we are able to approximately calculate θ with Eq. (4)foot_0 . We train a GCN on Cora and get one set of θ. We repeat this process with 1000 independent model initializations and get 1000 sets of θ. Then we can visualize a histogram over the 1000 values of θ j for each node j. In Figure 2 , we show the histograms of 3 randomly selected nodes. We show the histograms of more randomly selected nodes in Appendix A.4. As can be seen from the histograms, in most cases the empirical probability density decreases when θ j > 0, which is the assumption required for the expected mis-classification rate to be submodular in Proposition 2.

6. CONCLUSION

We present a formal analysis of near-black-box attacks on Graph Neural Networks, formulated as the problem of mis-classification rate maximization. By establishing a novel connection between the original optimization problem to an influence maximization problem upon a linear threshold model, we develop a group of efficient and effective near-black-box attack strategies with nice interpretations. Extensive empirical results demonstrate the effectiveness of the proposed strategies, which outperform state-of-the-art attacking strategies on multiple types of GNNs. In future work, we plan to explore how to perturb the graph structure under this near-black-box setup, as well as how to perturb node features under extra constraints (e.g. binary or nonnegative).

A APPENDIX A.1 PROOFS

We first give a more precise and restated version (Assumption 2) of Assumption 1, and introduce Lemma 2 about GCN, which is proved by Xu et al. (2018) . Assumption 2 (Xu et al. (2018) Restated.) . Recall that a ReLU function can be written as ReLU(x) = x • 1 [x > 0] . Suppose there are R ReLU functions in the GCN model and we index them with i = 1, 2, • • • , R. This assumption assumes that the i-th ReLU functions, for i = 1, 2, • • • , R, is replaced by the following function, ReLU i (x) = x • z i , where z 1 , z 2 , • • • , z R i.i.d. ∼ Bernoulli(γ). This assumption implies that all paths in the computation graph of an L-layer GCN model are independently activated with the same probability ρ = γ L . Lemma 2 (Xu et al. (2018) .). Given an L-layer GCN, under Assumption 1, for any node i, j ∈ V , E path ∂H j ∂X i = ρ[M L ] ji • 1 l=L W (l) , where M ∈ R N ×N is the random walk transition matrix, i.e., for any 1 ≤ i, j ≤ N , M ij = 1/|N i | if (i, j) ∈ E or i = j, and M ij = 0 otherwise. Proof for Proposition 1. Proof. Recall that H(S) = E path [H(S)] = E path [f (X(S))]. We first show H(S) is a linear function of X(S), which suffices to show that, for any i ∈ V and 1 ≤ l ≤ L, E path [H (l) i (S)] is a linear function of E path [H (l-1) (S)]. When l = L, E path [H (L) i (S)] = j∈Ni α ij W (L) E path [H (L-1) j (S)], so the statement holds. When 1 ≤ l < L, under Assumption 1, suppose each ReLU activates independently with probability p. E path H (l) i = E path   σ   j∈Ni α ij W (l) H (l-1) j     = p j∈Ni α ij W (L) E path [H (l-1) j (S)], so the statement also holds. Therefore H(S) is a linear function of X(S). In particular, E path [H] = H(∅) is a linear function of X. We know that X i (S) = X i + for i ∈ S and X i (S) = X i for i / ∈ S. And by Lemma 2, we can rewrite H(S) in terms of H(∅) and . For any j ∈ V , Hj (S) = Hj (∅) + i∈S ρ[M L ] ji • 1 l=L W (l) T . In Section 4.2, we have defined B = M L and W = ρ 1 l=L W (l) , so Hj (S) = Hj (∅) + W T i∈S B ji . (9) Now we look at the objective (2). If we replace H(S) with H(S) in this objective and plug Eq. ( 9) into it, then for each j ∈ V , we have 1 max k∈{1,••• ,K} Hjk (S) = Hjyj (S) =1 Hj kj (S) > Hjyj (S) =1 Hj kj (∅) + W T kj • i∈S B ji > Hjyj (∅) + W T yj • i∈S B ji =1 i∈S B ji > Hjyj (∅) -Hj kj (∅) (W kj -W yj ) T =1 i∈S B ji > θ j , where we have defined kj = argmax k=1,••• ,K Hjk (S) and recall the definition of θ j in Eq. ( 4). Therefore we get the optimization problem (3) max S∈Cr,m N j=1 1 i∈S B ji > θ j . Proof for Lemma 1. The proof follows similarly as the proof of Theorem 2.4 in Kempe et al. (2003) . Proof. We prove by reducing the NP-complete Set Cover problem to the influence maximization problem on directed biparatite graph with a linear threshold model. The Set Cover problem is defined as following. Suppose we have a ground set U = {u 1 , u 2 , • • • , u n } and a group of m subsets of U , S 1 , S 2 , • • • , S m . The goal is to determine whether there exists r ( r < n and r < m) of the subsets whose union equals to U . For any instance of the Set Cover problem, we can construct a bipartite graph with the first side having m nodes (each one corresponding to a given subset of U ), and the second side having n nodes (each one corresponding to an element of U ). There are only links going from the the first side to the second side. There will be a link with constant influence score α > 0 from a node on the first side to the second side if and only if the corresponding subset contains that element in U . Finally the node-specific thresholds of each node on the second side is set as α/2. And the influence maximization problem asks to select r nodes on the graph to maximize the number of activated nodes. The Set Cover problem is then solved by deciding if the maximized number of activated nodes on the bipartite graph is greater than n + r. Proof for Proposition 2. Proof. We first show that the expected mis-classifcation rate h(S) can be written in terms of the marginal CDFs of θ. h(S) = E θ1,••• ,θ N N j=1 1 i∈S B ji > θ j = N j=1 E θ1,••• ,θ N 1 i∈S B ji > θ j = N j=1 E θj 1 i∈S B ji > θ j = N j=1 P j i∈S B ji > θ j = N j=1 F j i∈S B ji , where P j is the marginal probability of θ j . Since B ji ≥ 0, so i∈S B ji is a non-decreasing submodular function of S with a lower bound 0. Each CDF F j is non-decreasing by definition, if it is also individually concave at the domain [0, +∞), we know F j i∈S B ji is submodular w.r.t. S and hence h(S) is submodular.

A.2 MORE EXPERIMENT DETAILS

Definitions of the node centralities. For each node i, the Degree centrality score is defined as C D (i) |Ni| N ; the Betweenness centrality score is defined as C B (i) j =i,k =i,j<k g jk (i) g jk , where g jk is the number of shortest paths connecting node j and k and g jk (i) is the number of shortest paths that node i is on; the PageRank centrality score is defined as the stationary scores achieved by iteratively updating P R(i) = 1-α N + α j∈Ni P R(j) |Nj | and we set the hyper-parameter α = 0.85. Detailed descriptions of GC-RWCS. GC-RWCS further applies a few heuristics on top of RWCS to achieve better mis-classification rate. Specifically, it iteratively selects nodes one by one up to r nodes, based on a dynamic importance score, i.e., I t (i) = N j=1 [Q t ] ji for the t-th iteration. Q t ∈ {0, 1} N ×N is a binary matrix that is dynamically updated over t. At the initial iteration, Q 1 is obtained by binarizing M L , assigning 1 to the top l nonzero entries in each row of M L and 0 to other entries. For t > 1, suppose the node i is selected at the t -1 iteration, then Q t is obtained from Q t-1 by setting to zero for all the rows where the elements of the i-th column is 1 in Q t-1 . GC-RWCS also applies another heuristic that, after each iteration, remove the k-hop neighbors of the selected node from the candidate set in the subsequent iterations. In the experiment, we set the hyper-parameters of GC-RWCS L = 4, l = 30, and k = 1 as suggested in their original paper. The iterative-selection process in GC-RWCS (without removing the k-hop neighbors) gives equivalent results as InfMax-Unif if we replace the matrix B in InfMax-Unif by Q 1 and set a = 1. More details for the experiment setup. We randomly split each dataset by 60%, 20% and 20% as the training, validation, and test sets and run 40 independent trials for each model and dataset combination. We apply the attack strategies following the two-step procedure stated in Section 3.2. For the node selection step, we limit the number of nodes to be attacked, r, as 1% of the graph size for each dataset. We test on two setups of the node degree threshold, m, by setting it equal to the lowest degree of the top 10% and 30% nodes respectively. For the feature perturbation step, we follow the same way as in Ma et al. (2020) to construct the constant perturbation vector . Ideally, the perturbation vector should be designed according to domain knowledge about the task in realworld scenario. For the experiments on benchmark datasets where we do not know the semantic meaning of the features, we simulate the domain knowledge by extremely limited information of the gradients due to the lack of semantic meaning of each features in benchmark datasets. The gradients are only used to select important features and the sign of perturbation rather than the magnitude. We construct the j ∈ R D j = λ • sign( N i=1 ∂L(H,y) ∂Xi,j ), if j ∈ arg top-J([| N i=1 ∂L(H,y) ∂Xi,j |] l=1,2,..,D ), 0, otherwise, where λ is the perturbation strength and is set to 1; and J is set to 2% of the number of features. The same perturbation vector is added to all selected nodes in S.

A.3 ADDITIONAL EXPERIMENTS

Attack performance with varying perturbation strengths. In Figure 3 , we demonstrate the attack performances of different attack strategies with varying perturbation strengths. We first observe that the proposed attack strategies with the fixed hyper-parameters (a = 0.01 for InfMax-Unif and σ = 0.01 for InfMax-Norm) outperform all baselines in more cases. It is also worth noting that, as suggested by Eq. ( 4), the distribution of θ is dependent on the perturbation and hence λ. In the approximated uniform and normal distributions for InfMax-Unif and InfMax-Norm respectively, the optimal choice of a and σ should be dependent on λ. Intuitively, smaller λ makes the θ have larger variance, so the choice of a and σ should also be larger. This is indeed suggested by the results in Figure 3 . Recall that, in Section 5.1, we discussed that RWCS can be viewed as a special case of InfMax-Unif with a = ∞. And in Figure 3 , we observe that RWCS (equivalent to InfMax-Unif with a = ∞) sometimes (e.g., for GCN) outperforms InfMax-Unif (with a = 0.01) when λ is very small. However, we leave further optimization of the hyper-parameters of the proposed strategies to future work. Sensitivity analysis of a for InfMax-Unif and σ for InfMax-Norm. In Figure 4 , we carry out a sensitivity analysis with resepct to a and σ for InfMax-Unif and InfMax-Norm respectively. In Section 5.2, we have fixed a = 0.01 and σ = 0.01 for all experiment settings. Here we vary them from 0.005 to 0.02 and show that the results of the proposed strategies, especially those of the InfMax-Norm, stay relatively stable with varying choices of the hyper-parameters. Targeting on the test set. In the experiments in Section 5.2, we use the objectives Eq. ( 6) and Eq. ( 7) that sum over the whole graph of N nodes, for an untargeted attack assuming the attacker does not know the test set to be evaluated on. If the targeted test set is known, we can adapt Eq. ( 6) and Eq. ( 7) to sum on the test set only. In Table 2 , we compare the performance of untargeted attacks and the performance of attacks targeting on the test set. As can be seen, when targeting on the test set, the proposed strategies are further improved compared to their untargeted versions. Synthetic data experiments. We further carry out experiments on synthetic datasets to demonstrate that the proposed attack strategies are effective in a pure black-box setting when sufficient domain knowledge regarding the node features is given. Following Ma et al. (2020) , we generate the synthetic datasets as follows. First, we generate a Barabási-Albert random graph (Barabási & Albert, 1999) with N nodes and adjacency matrix A. Then we generate node features X ∈ R N ×D randomly from a multivariate normal distribution with zero mean and covariance (L sym + I) -1 (L sym is the symmetric normalized graph Laplacian and I is identity matrix; this covariance introduces smoothness over the graph (Li et al., 2019) ), and take the absolute values elementwisely. Finally, node labels are generated by Y = 1 [ Sigmoid((A + I)XW ) > 0.5], where W ∈ R D is a given weight matrix. During the attack process, we assume that the attacker knows a few (0.2D) important features with the largest corresponding weights in W but has no access to the trained model. In Tabel 3, we experiment on 5 synthetic graphs generated by different seeds with N = 3000 and D = 10, and the proposed InfMax-Unif and InfMax-Norm outperform baseline attack strategies. Constructing based on the training partition only. To verify the coarse gradient information we use to construct the perturbation vector is not sensitive to the set of nodes, we further repeat the experiments in Table 1 with the only difference that, when constructing following Eq. ( 10), we use the average gradients on the training partition only rather than all nodes. The results are shown in Table 4 , which are very similar to those in Table 1 . This additional study verifies our belief. 



We can only do it approximately because we do not know ρ. For the visualization, we just set ρ = 1.



we have the following Proposition 1. Proposition 1. If we replace H(•) by H(•) in problem (2), then we can rewrite the optimization problem as follows,

Figure 2: Each figure shows a histogram of θ j for a fixed node j over 1000 independent trials of GCN on Cora. The 3 nodes are randomly selected from the union of the validation set and test set.

Figure 5: Each figure shows a histogram of θ j for a fixed node j over 1000 independent trials of GCN on Cora. The 15 nodes are randomly selected from the union of the validation set and test set.

Summary of the attack performance in terms of test accuracy (%), the lower the better attack. Bold denotes the best performing strategy in each setup. Underline indicates our strategy outperforms all the baseline strategies. Asterisk (*) means the difference between our strategy and the best baseline strategy is statistically significant by a pairwise t-test at significance level 0.05. The error bar (±) denotes the standard error of the mean by 40 independent trials. The thresholds correspond to the node degree constraint m.

ACKNOWLEDGEMENT

We would like to thank the anonymous reviewers for their detailed comments and suggestions, which help significantly improve this paper.

