CORRATTACK: BLACK-BOX ADVERSARIAL ATTACK WITH STRUCTURED SEARCH

Abstract

We present a new method for score-based adversarial attack, where the attacker queries the loss-oracle of the target model. Our method employs a parameterized search space with a structure that captures the relationship of the gradient of the loss function. We show that searching over the structured space can be approximated by a time-varying contextual bandits problem, where the attacker takes feature of the associated arm to make modifications of the input, and receives an immediate reward as the reduction of the loss function. The time-varying contextual bandits problem can then be solved by a Bayesian optimization procedure, which can take advantage of the features of the structured action space. The experiments on ImageNet and the Google Cloud Vision API demonstrate that the proposed method achieves the state of the art success rates and query efficiencies for both undefended and defended models.

1. INTRODUCTION

Although deep learning has many applications, it is known that neural networks are vulnerable to adversarial examples, which are small perturbations of inputs that can fool neural networks into making wrong predictions (Szegedy et al., 2014) . While adversarial noise can easily be found when the neural models are known (referred to as white-box attack) (Kurakin et al., 2016) . However, in real world scenarios models are often unknown, this situation is referred to as black-box attack. Some methods (Liu et al., 2016; Papernot et al., 2016) use the transfer-based attack, which generates adversarial examples on a substitute model and transfer the adversarial noise to the target model. However, the transferability is limited and its effectiveness relies highly on the similarity between the networks (Huang & Zhang, 2020) . If two networks are very different, transfer-based methods will have low success rates. In practice, most computer vision API such as the Google Cloud Vision API allow users to access the scores or probabilities of the classification results. Therefore, the attacker may query the black-box model and perform zeroth order optimization to find an adversarial example without the knowledge of the target model. Due to the availability of scores, this scenario is called score-based attack. There have been a line of studies on black-box attack which directly estimate the gradient direction of the underlying model, and apply (stochastic) gradient descent to the input image (Ilyas et al., 2018; 2019; Chen et al., 2017; Huang & Zhang, 2020; Tu et al., 2018; Li et al., 2019) . In this paper, we take another approach and formulate score-based attack as a time-varying contextual bandits problem. At each state, the attacker may change the adversarial perturbation and get the reward as the reduction of the loss. And the attacker would receive some features about the arms before making the decision. By limiting the action space to image blocks, the associated bandits problem exhibits local correlation structures and the slow varying property suitable for learning. Therefore, we may use the location and other features of the blocks to estimate the reward for the future selection of the actions. Using the above insights, we propose a new method called CorrAttack, which utilizes the local correlation structure and the slow varying property of the underlying bandits problem. CorrAttack uses Bayesian optimization with Gaussian process regression (Rasmussen, 2003) to model the correlation and select optimal actions. A forgetting strategy is added to the algorithm so that the Gaussian process regression can handle the time-varying changes. CorrAttack can effectively find queries and higher success rates than prior methods with a similar action space (Moon et al., 2019) . It is worth noting that BayesOpt (Ru et al., 2020) and Bayes-Attack (Shukla et al., 2019 ) also employ Bayesian optimization for score-based attack. However, their Gaussian process regression directly models the loss as a function of the image, whose dimension can be more than one thousand. Therefore, their speed is slow especially for BayesOpt, which uses slow additive kernel. CorrAttack, on the other hand, searches over a much limited action space and models the reward as a function of the low dimensional feature. Therefore, the optimization of CorrAttack is more efficient, and the method is significantly faster than BayesOpt. We summarize the contributions of this work as follows: 1. We formulate the score-based adversarial attack as a time-varying contextual bandits, and show that the reward function has slow varying properties. In our new formulation, the attacker could take advantage of the features to model the reward of the arms with learning techniques. Compared to the traditional approach, the use of learning in the proposed framework greatly improves the efficiency of searching over optimal actions. 2. We propose a new method, CorrAttack, which uses Bayesian optimization with Gaussian process regression to learn the reward of each action, by using the feature of the arms. 3. The experiments show that CorrAttack achieves the state of the art performance on ImageNet and Google Cloud Vision API for both defended and undefended models.

2. RELATED WORK

There have been a line of works focusing on black-box adversarial attack. Here, we give a brief review of various existing methods. Transfer-Based Attack Transfer-based attack assumes the transferability of adversarial examples across different neural networks. It starts with a substitute model that is in the same domain as the target model. The adversaries can be easily generated on the white-box substitute model, and be transferred to attack the target model (Papernot et al., 2016) . The approach, however, depends highly on the similarity of the networks. If two networks are distinct, the success rate of transferred attack would rapidly decrease (Huang & Zhang, 2020) . Besides, we may not access the data for training the substitute model in practice. Score-based Attack Many approaches estimate the gradient with the output scores of the target network. However, the high dimensionality of input images makes naive coordinate-wise search impossible as it requires millions of queries. ZOO (Chen et al., 2017 ) is an early work of gradient estimation, which estimates the gradient of an image block and perform block-wise gradient descent. NES (Wierstra et al., 2008) and CMA-ES (Hansen, 2016) are two evolution strategies that can perform query efficient score-based attack Ilyas et al. (2018) ; Meunier et al. (2019) . Instead of the gradient itself, SignHunter (Al-Dujaili & O'Reilly, 2020a) just estimates the sign of gradient to reduce the complexity. AutoZOOM (Tu et al., 2018) uses bilinear transformation or autoencoder to reduce the sampling space and accelerate the optimization process. In the same spirit, data prior can be used to improve query efficiency (Ilyas et al., 2019) . Besides, MetaAttack (Du et al., 2020) takes a meta learning approach to learn gradient patterns from prior information, which reduces queries for attacking targeted model. Many zeroth order optimization methods for black-box attacks rely on gradient estimation. However, there are some research works using gradient free methods to perform black-box attack. BayesOpt and Bayes-Attack (Ru et al., 2020; Shukla et al., 2019) employ Bayesian optimization to find the adversarial examples. They use Gaussian process regression on the embedding and apply bilinear transformation to resize the embedding to the size of image. Although the bilinear transformation could alleviate the high dimensionality of images, the dimension of their embeddings are still in the thousands, which makes Bayesian optimization very ineffective and computationally expensive. A different method, PARSI, poses the attack on ∞ norm as a discrete optimization problem over {-ε, ε} d (Moon et al., 2019) . It uses a Lazy-Greedy algorithm to search over the space {-ε, ε} d to find an adversarial example. SimBA (Guo et al., 2018 ) also employs a discrete search space targeted at 2 norm. Decision-based Attack Decision-based attack assumes the attacker could only get the output label of the model. Boundary Attack and its variants (Brendel et al., 2017; Chen et al., 2020; Li et al., 2020) are designed for the setting. However, the information received by the attacker is much smaller than score-based attack, and it would take many more queries than score-based attack to successfully attack an image.

3. PRELIMINARIES

A Gaussian process (Rasmussen, 2003) is a prior distribution defined on some bounded set Z, and is determined by a mean function µ : Z → R and a covariance kernel κ : Z × Z → R. Given n observations D n = {(z i , f (z i ))} n i=1 , the prior distribution on f (z 1:n ) is f (z 1:n ) ∼ Normal(µ 0 (z 1:n ), κ 0 (z 1:n , z 1:n )), where we use compact notation for functions applied to collections of input points: z 1:n indi- cates the sequence z 1 , • • • , z n , f (z 1:n ) = [f (z 1 ), • • • , f (z n )], µ 0 (z 1:n ) = [µ 0 (z 1 ), • • • , µ 0 (z n )], κ 0 (z 1:n , z 1:n ) = [κ 0 (z 1 , z 1 ), • • • , κ 0 (z 1 , z n ); • • • ; κ 0 (z n , z 1 ), • • • , κ 0 (z n , z n ); ]. Now we wish to infer the value of f (z) at some new point z, the posterior process f (z)|D n is also a Gaussian process (GP) with mean µ n and covariance σ 2 n : f (z)|D n ∼ Normal(µ n (z), σ 2 n (z)), µ n (z) = κ 0 (z, z 1:n )κ 0 (z 1:n , z 1:n ) -1 (f (z 1:n ) -µ 0 (z 1:n )) + µ 0 (z), σ 2 n (z) = κ 0 (z, z) -κ 0 (z, z 1:n )κ 0 (z 1:n , z 1:n ) -1 κ 0 (z 1:n , z). As a optimization method to maximize a function f , Bayesian optimization models the function to make decisions about where to evaluate the next point z. Assuming we already obtained observations D t-1 = {(z i , f (z i ))} t-1 i=1 , to determine the next point z t for evaluation, we first use the posterior GP to define an acquisition function ϕ t : Z → R, which models the utility of evaluating f (z) for any z ∈ Z. We then evaluate f (z t ) with z t = arg max Z ϕ t (z). (3) In this work, we use the expected improvement (EI) acquisition function (Mockus et al., 1978 ) ϕ t (z) = σ 2 n (z)(γ(z)Φ(γ(z)) + φ(γ(z))) with γ(z) = µ n (z) -f (z best ) σ 2 n (z) , which measures the expected improvement over the current best value z best = arg max zi f (z i ) according to the posterior GP. Here Φ(•) and φ(•) are the cdf and pdf of N (0, I) respectively.

4. SCORE-BASED BLACK-BOX ATTACK

Suppose a classifier F (x) has input x and label y. An un-targeted adversarial example x adv satisfies: arg max j∈{1,•••C} F (x adv ) j = y and x adv -x p ≤ ε, ( ) where C is the number of classes. While an adversarial example for targeted attack means the maximum position of F (x) should be the targeted class q: arg max j∈{1,•••C} F (x adv ) j = q. In order to find x adv , we may optimize a surrogate loss function (x, y) (e.g hinge loss). In this work, we consider adversarial attack as a time-varying contextual bandits problem. At each time t, we observe a state x t which is a modification of the original input x 0 . Before taking arm a t ∈ A ⊂ R d , we could observe the feature z of arms. And a t would modify state x t to x t+1 according to x t+1 = arg min s∈{xt+at,xt} (Π Bp(x,ε) (s) , y) with reward function r(x t , a t ) = (x t+1 , y) -(x t , y) and the checking step tries to remove negative reward. In this frame, we would like to estimate the reward r(x t , a t ) with feature z t using learning, and then pick a t to maximize the reward. Observe that r(x t , a t ) ≈ ∇ x (x t , y) (x t+1 -x t ), where the gradient ∇ xt (x t , y) is unknown. It follows from the formulation that we may rewrite r(x t , a t ) as a function r(x t , a t ) ≈ f (x t , x t+1 -x t ). Since in general, we make small steps from one iteration to the next iteration, δ t (a t ) = x t+1 -x t is small. We may approximate the reward with fixed gradient locally with f (x t , δ t ) = ft (a t ), We may consider the learning of reward as a time-varying contextual bandits problem with reward function ft (a t ) for arm a t at time t. Since x t+1 -x t is small, this time-varying bandits has slowvarying property: the function ft changes slowly from time t to time t + 1. In the proposed framework, our goal is to learn the time-varying bandits reward ft (a t ) with feature z t . We use Gaussian process regression to model the reward function using recent historic data since the reward function is slow-varying, and describe the details in the subsequent sections. We note that the most general action space contains all a t ∈ R d , where d is the number of image pixels. However, it is impossible to explore the arms in such a large space. In this work, we choose a specific class of actions A = {a i } n i=1 , n is the image blocks of different sizes. It covers the space of the adversarial perturbations while maintaining good complexity. We also find the location and the PCA of the blocks a good component of the feature z associated with the arm. Besides, modifying a block only affects the state locally. Therefore the reward function remains similar after state changes.

4.1. STRUCTURED SEARCH WITH GAUSSIAN PROCESS REGRESSION AND BAYESIAN OPTIMIZATION

Define the block size as b, we divide the image into several blocks E = {e 000 , e 001 , • • • , e hwc }, where the block is b × b square of pixels and (h, w, c) = (height/b, width/b, channel). Each block e ijk is associated with the feature z e ijk such as the location of the block. Suppose we have time-varying bandits with state x t and unknown reward function ft at time t. By taking the action a e ijk , we change the individual block e ijk of x t and get x t+1 with reward ft (a e ijk ). We consider two ways of taking action a e ijk on block e ijk : CorrAttack Diff and CorrAttack Flip . Finite Difference CorrAttack Diff : For action a e ijk , the attacker will query (x t + ηe ijk , y) and (x t -ηe ijk , y), and choose a e ijk = arg min s∈{ηe ijk ,-ηe ijk } (x t + s, y). The action space A = {a e ijk |e ijk ∈ E}. In our framework, the bandits problem can also be regarded as learning the conditional gradient over actions. That is, when η is small, we try to choose action a t with a t = arg min e ijk ∈E e ijk ∇ xt (x t , y) which is the conditional gradient over the set of blocks. Discrete Approximation CorrAttack Flip : In general, adversarial attack with ∞ budget can be formulated as constrained optimization with x adv -x ∞ ≤ . However, PARSI (Moon et al., 2019) limits the space to {-ε, +ε} d , which leads to better performance for black-box attack (Moon et al., 2019) . The continuous optimization problem becomes a discrete optimization problems as follows: maximize (x adv , y) =⇒ maximize (x adv , y) (10) subject to x adv -x ∞ ≤ subject to x adv -x ∈ { , -} d . Following PARSI, we consider two stages to perform structured search. When flipping ε to -ε, a e ijk changes the block to -ε and A = {-2εe ijk |e ijk ∈ E}. When changing -ε to ε, A = {2εe ijk |e ijk ∈ E} instead. Gaussian Process (GP) Regression: We model the difference function g t (a t ) = (Π Bp(x,ε) (x t + a t ) , y) -(x t , y) instead of the reward function ft (a t ) ≥ 0, as the difference function could be negative, providing more information about the negative arms in A. We would collect historic actions with feature and difference {z k , g k (a k ))} t k=1 and learn the difference to make choices at a later stage. At each time t, we use the Gaussian process regression to model the correlation between the features z e ijk and use Bayesian optimization to select the next action. More specifically, the same as eq. ( 2), we let g t (a e ijk )|D t ∼ Normal(µ t (z e ijk ), σ 2 t (z e ijk )), where D t = {z k , g k (a k ))} t k=t-τ is the difference of evaluated blocks e t-τ :t with feature z et-τ:t and τ is a parameter to forget old samples. Then we use EI acquisition function to pick up the next action a t+1 in A. More specifically, the same as eq. ( 4), we let a t+1 = arg max A ( σ 2 t (z e ijk )(γ(z e ijk )Φ(γ(z e ijk )) + φ(γ(z e ijk )))) As the difference function g t is varying, we take two strategies in Algorithm 2 to update the previous samples to make sure GP regression learns the current difference function well. 

4.2. FEATURES AND SLOW VARYING PROPERTY

Features of Contextual Bandits: We use a four dimensional vector as the feature z e ijk : z e ijk = (i, j, k, pca) (14) where i, j, k is the location of the block. And pca is the first component of PCA decomposition of [x 0 (e 000 ), x 0 (e 001 ), • • • x 0 (e hwc )]. x 0 (e ijk ) means the block of natural image at the given position. The reward function depends on the gradient in Equation ( 7). It has been shown that the gradient ∇ x (x, y) has local dependencies (Ilyas et al., 2019) . Suppose two coordinates e ijk and e lpq are close, then ∇ x (x, y) ijk ≈ ∇ x (x, y) lpq . We consider the finite difference of the block e ijk ∆ t (e ijk ) = (x t + ηe ijk , y) - (x t -ηe ijk , y) ≈ 2ηe ijk ∇ xt (x t , y) ( ) where η is a small step size. When η is small, the reward can be approximated by the average of the gradients around a small region, which also has local dependencies. In fact, the local structure of the reward will also be persevered when the block size and η is large. Figure 1 shows one example of the finite difference ∆ t (e ijk ) obtained on ImageNet dataset with ResNet50. This shows blocks with closer locations are more likely to have similar reward. Therefore, we add the location of the block as the feature so that it uses historic data to find the arm with the largest reward. In addition to the location of the difference, we may add other features. The block of the image itself forms a strong feature for the regression, but the dimension of the block is too high for GP regression. Therefore, we use PCA to lower the dimension and add the first component into the feature vector. Slow Varying Property In addition to the local dependencies of finite difference, the difference would also be slow varying if we just change a small region of x t . Let x t+1 = x t -ηe itjtkt , Figure 2 shows the difference of ∆ t (e ijk ) and ∆ t+1 (e ijk ), which is centralized in a small region near e itjtkt . Reward function is based on the finite difference, which also has the slow varying property. It could be explained by the local property of convolution. When η is small, the finite difference can be approximated with gradient and the local Hessian: ∆ t+1 (e ijk ) -∆ t (e ijk ) ≈ η 2 e ijk ∇ 2 xt (x t , y)e itjtkt ( ) The difference is much smaller than ∆ t (e ijk ). Today's neural networks are built with stacks of convolutions and non-linear operations. Since these operations are localized in a small region, the Hessian of a neural network is also localized and the reward function only changes near e itjtkt . 

5. EXPERIMENTS

We evaluated the number of queries versus the success rates of CorrAttack on both undefended and defended network on ImageNet (Russakovsky et al., 2015) . Moreover, we attacked Google Cloud Vision API to show that CorrAttack can generalize to a true black-box model. We used the common hinge loss proposed in the CW attack (Carlini & Wagner, 2017) The details of the Gaussian processes regression and the hyperparameters of CorrAttack are given in the Appendix B. We shall mention that CorrAttack is not sensitive to the hyperparameters. The hyperparameters of other methods follow those suggested by the original papers.

5.1. UNDEFENDED NETWORK

We randomly select 1000 images from the validation set of ImageNet and only attack correctly classified images. The query efficiency of CorrAttack is tested on VGG16 (Simonyan & Zisserman, 2014) , Resnet50 (He et al., 2016) and Densenet121 (Huang et al., 2017) , which are the most commonly used network structures. We set ε = 0.05 and the query limit to be 10000 except for BayesOpt and Bayes-Attack. For targeted attacks, we randomly choose the target class for each image and the target classes are maintained the same for the evaluation of different algorithms. The results are shown in Ablation study on features Appendix C.5 demonstrates how the feature of the contextual bandits affects the performance of attack. PCA would help to improve the efficiency of attack.

5.2. DEFENDED NETWORK

To evaluate the effectiveness of CorrAttack on adversarially defended networks, we tested our method on one of the SOTA robust model (Xie et al., 2018) on ImageNet. The weight is downloaded from Githubfoot_0 . "ResneXt DenoiseAll" is chosen as the target model as it achieves the best performance. We set ε = 0.05 and the maximum number of queries is 10000. As BayesOpt runs very slowly, the attack is also performed on 10 images and the query limit is 1000. The result is shown in Table 3 . CorrAttack Flip still outperforms other methods.

5.3. GOOGLE CLOUD VISION API

We also attacked Google Cloud Vision API, a real world black-box model for classification. The target is to remove the top-1 label out of the classification output. We choose 10 images for the ImageNet dataset and set the query limit to be 500 due to high cost to use the API. We compare CorrAttack Flip with NAttack, BayesOpt and PARSI. The result is shown in Table 4 . We also show one example of the classification output in Appendix C.9 

6. CONCLUSION AND FUTURE WORK

An = {2εe ijk ∈ E|e ijk (x k -x) < 0} 8: Run CorrAttack flipping -ε to ε xk = CORRATTACK ( (•, •), x k , y, An, c, τ, α) 9: Ap = {-2εe ijk ∈ E|e ijk (x k -x) > 0} 10: Run CorrAttack flipping ε to -ε x k+1 = CORRATTACK ( (•, •), xk , y, Ap, c, τ, α) 11: if b > 1 then 12: Split the blocks into finer blocks using Algorithm 3 E = SPLITBLOCK(E, b) 13: b ← b/2 14: end if 15: until converges 16: return xK ;

B DETAILS OF EXPERIMENT SETTING

We use the hinge loss for all the experiments. For un-targeted attacks, untarget (x, y) = max F (x) y -max j =y F (x) j , -ω and for targeted attacks, target (x, y) = max max j F (x) j -F (x) t , -ω . ( ) Here F represents the logits of the network outputs, t is the target class, and ω denotes the margin. The image will be projected into the ε-ball. Besides, the value of the image will be clipped to range [0, 1].

B.1 GAUSSIAN PROCESS REGRESSION AND BYAESIAN OPTIMIZATION

We further provide details on both the computational scaling and modeling setup for the GP regression. To address computational issues, we use GPyTorch (Gardner et al., 2018) for scalable GP regression. GPyTorch follows (Dong et al., 2017) to solve linear systems using the conjugate gradient (CG) method and approximates the log-determinant via the Lanczos process. Without GPyTorch, running BO with a GP regression for more than a few thousand evaluations would be infeasible as classical approaches to GP regression scale cubically in the number of data points. On the modeling side, the GP is parameterized using a Matérn-5/2 kernel with ARD and a constant mean function for all experiments. The GP hyperparameters are fitted before proposing a new batch by optimizing the log-marginal likelihood. The domain is rescaled to [0, 1] d and the function values are standardized before fitting the GP regression. We use a Matérn-5/2 kernel with ARD for CorrAttack and use the following bounds for the hyperparameters: (length scale) λ i ∈ [0.005, 2.0 ], (output scale) λ i ∈ [0.05, 20.0], (noise variance) σ 2 ∈ [0.0005, 0.1].

B.2 HYPERPARAMETERS

For CorrAttack in Algorithm 4 and Algorithm 5, we set the initial block size b to be 32 and the step size η for CorrAttack Diff is 0.03. In Algorithm 1, we use the initial sampling ratio m = 0.03n at the start point for Gaussian process regression, the threshold c = 10 -4 to decide when to stop the search of current block size. In Algorithm 2, the threshold is different for different block size. For CorrAttack Flip , α = 1, 1, 2, 2, 3 for block size 32, 16, 8, 4, 2 and for CorrAttack Diff , α = 0, 0, 1, 1, 2 for block size 32, 16, 8, 4, 2. We set τ = 3m = 0.09n to remove the earliest samples from D once |D| > α. The Adam optimizer is used to optimize the mean µ and covariance κ of Gaussian process, where the iteration is 1 and the learning rate is 0.1. For PARSI, the block size is set to 32 as CorrAttack , other hyperparameters are the same as the original paper. For Bandits, Bayes-Attack and BayesOpt, the hyperparameters are the same as the original paper. We optimize the hyperparameters for ZOO, NES. For un-targeted attack on NES, we set the sample size to be 50, learning rate to be 0.1. For targeted attack on NES, the sample size is also 50 and the learning rate is 0.05. The learning is decay by 50% if the loss doesn't decrease for 20 iterations. For NAttack, we set the hyperparameters the same as NES and add momentum and learning rate decay, which are not mentioned in the original paper. For ZOO, we set the learning rate to 1.0 and sample size to be 50. Other setting follows the original paper.

C ADDITIONAL EXPERIMENTS

C.1 OPTIMALITY OF BAYESIAN OPTIMIZATION Figure 3 shows the reward function that the Bayesian optimization could find in the action set. CorrAttack could find the action with high reward within just a few queries. It shows that the Gaussian process regression could model the correlation of the reward function and the Bayesian optimization could use it to optimize the time-varying contextual bandits.

C.2 VARYING THE ADVERSARIAL BUDGET

We test CorrAttack on different adversarial budget on ImageNet for both un-targeted attack and targeted attack. 

C.5 ABLATION STUDY ON FEATURES

Table 11 shows the success rate and average queries for CorrAttackwith different features. We perform ablation study on the features of the contextual bandits. One contains just the location of the block and the other contains both the location and the PCA feature. PCA helps the learning process of the reward and achieve higher success rate and lower number of queries. PCA feature achieves significant improvement on CorrAttack Flip . We may find more useful features in the future.

C.6 COMPARISON BETWEEN CORRATTACK FLIP , BAYESOPT AND BAYES-ATTACK

The main difference between BayesOpt and Bayes-Attack is using different types of GP regression (Standard GP for Bayes-Attack and Additive GP for BayesOpt), so we will consider these two models as a group when comparing with our model CorrAttack. Difference between CorrAttack, BayesOpt and Bayes-Attack: For l ∞ attacks, assume there are no hierarchical structure, we have blocks E = {e 000 , e 001 , • • • , e hwc }, where the block is b×b square of pixels and (h, w, c) = (height/b, width/b, channel). CorrAttack, BayesOpt (Ru et al., 2020) At each iteration, in BayesOpt and Bayes-Attack, the changes of overall perturbation is δ t -δ t-1 = {δ t e000 ∪ δ t e001 ∪ • • • ∪ δ t e hwc } -{δ t-1 e000 ∪ δ t-1 e001 ∪ • • • ∪ δ t-1 e hwc }. However, in CorrAttack, δ t -δ t-1 = δ t e ijk -δ t-1 e ijk . In conclusion, BayesOpt and Bayes-Attack view each block as a dimension, try to search the overall perturbation directly. CorrAttack defines a low dimension feature space, keep an overall perturbation and try to search an action on single block. We compare the running time for CorrAttack Flip with BayesOpt and Bayes-Attack on 20 images from ImageNet. Table 12 shows the running time for the un-targeted attack. We use PyTorchfoot_1 to develop these two models. All experiments were conducted on a personal workstation with 28 Intel(R) Xeon(R) Gold 5120 2.20GHz CPUs, an NVIDIA GeForce RTX2080Ti 11GB GPU and 252G memory. BayesOpt models the loss function with a very high dimensional Gaussian process. The decomposition of additive kernel also needs to be restarted several times. Even though we try to optimize the speed of BayesOpt with GPU acceleration, it is still very slow and takes hundreds of times more computational resources than CorrAttack . Bayes-Attack could be regarded as a simpler version of BayesOpt, which does not add additive kernel. We do not evaluate it on targeted task (when query>1000) since GP inference time grows fast as evaluated query increases, e.g. For Bayes-Attack, when 150 <query< 200, Time=1.6s/query; 800 <query< 1000, Time = 10.5s/query. CorrAttack solves this problem with Time=0.1s/query even when query reaches 10000. Since we forget the previous samples before t -τ , our input sample n will be smaller than τ . The forgetting technique can not be applied into the Bayes-Attack and BayesOpt since they are searching the perturbation of all blocks so each sample needs to be remembered.

C.7 GROWING CURVE OF SUCCESS RATE

The number of average queries is sometimes misleading due to the the heavy tail distribution of queries. Therefore in Figure 4 , we plot the success rates at different query levels to show the detailed behaviors of different attacks. It shows that CorrAttack is much more efficient than other methods at all query levels. 



https://github.com/facebookresearch/ImageNet-Adversarial-Training https://pytorch.org/



Figure 1: Finite difference of the perturbation for three channels on one image from ImageNet with ResNet50. h = w = 28, b = 8 and η = 0.05. Lighter block means larger finite difference.

Figure 2: Difference of finite difference on each block after changing block e 15,18,1 of Figure 1, which is the lightest pixel in the picture. Darker blocks imply smaller difference in finite difference, which is almost zero in the majority of the image except the part near the changed block.4.3 HIERARCHICAL BAYESIAN OPTIMIZATION SEARCHRecent black-box approaches(Chen et al., 2017;Moon et al., 2019) exploit the hierarchical image structure for query efficiency. Following these approaches, we take a hierarchical approach and perform the accelerated local search in Algorithm 1 from a coarse grid (large blocks) to a fine grid (smaller blocks). The algorithm for hierarchical attack iteratively performs Algorithm 1 at one block size, and then divides the blocks into smaller sizes. At each block size, we build a Gaussian process to model the difference function, and perform structured search with the blocks until max A ϕ t (z e ijk ) < c. When dividing the blocks into smaller sizes, we will build a new block set E with action a e ijk and new feature z e ijk , but keep the x t in last block size as x 0 in new block size. Define the stage as S = {0, 1, • • • , s} and initial block size as b. The block at stage s is b 2 s × b 2 s

Loss function (•, •), Input image x and its label y, Block size b, Set of blocks E containing all blocks of the image, Threshold c, τ , α, Adversarial budget ε 1: x0 = x 2: for e ijk ∈ E do 3: Randomly draw v from {-ε, ε} 4: x0[e ijk ] = v + x0[e ijk ] 5: end for 6: repeat 7:

Figure 3: The rank of the reward function that the Bayesian optimization could find in the action set for different block size. The rank and query are normalized by the cardinality of the action set.

Time complexity and running time:The time complexity of fitting GP regression is O(dn 2 ) where d is the dimension of input and n is the number of samples. And the dimension for CorrAttack (d = 4 for z e ijk = (i, j, k, pca)) is much smaller than BayesOpt and Bayes-Attack (d = 6912 if h = w = 48, c = 3). Moreover, we can convert the continuous search space of BayesOpt and Bayes-Attack from [-, ] 6912 to discrete search space E = {e 000 , e 001 , • • • , e hwc }, whose number is only 6912, smaller search space could save the computation time of acquisition function.

Figure 4: Success rate of black-box attack at different query levels for undefended ImageNet models.

Figure 7: Example result of attacking Google Cloud Vision API

The first strategy is to remove old samples in D t . Even if the bandits are slowly varying, the difference function will change significantly after a significant number of rounds. Therefore, we need to forget samples before t -τ . The second strategy is to remove samples near the last block e itjtkt in D t . As we discuss later, the difference function may change significantly in a local region near the last selected block. Therefore previous samples in this local region will be inaccurate. The resulting algorithm for CorrAttack is shown in Algorithm 1, which mainly follows standard procedure of Bayesian optimization.

Success rate and average queries of un-targeted attack on 1000 samples of ImageNet. ε = 0.05. Since BayesOpt and Bayes-Attack needs thousands of hours to run all samples, we only test 20 samples, which are marked as *, the complexity and running time could be referred to C.6.

Success rate and average queries of targeted attack on ImageNet. ε = 0.05 and query limit is 10000. As BayesOpt and Bayes-Attack run very slow, we do not include them for the targeted attack.

. We compared two versions of CorrAttack : CorrAttack Diff and CorrAttack Flip , to ZOO(Chen et al., 2017), NES(Ilyas et al., 2018), NAttack(Li et al., 2019), Bandits(Ilyas et al., 2019), PARSI(Moon et al., 2019), Square Attack(Andriushchenko et al., 2020), SignHunter(Al-Dujaili & O'Reilly, 2020b), BayesOpt(Ru et al., 2020) andBayes-Attack (Shukla et al., 2019). We only test adversarial attack on ∞ norm.

Success rate and average queries of un-targeted attack on defended model. Since BayesOpt and Bayes-Attack take thousands of hours to run, we only tested on 10 samples from ImageNet with ε = 0.05 and 1000 query limit, which are marked as *.

Success rate and average queries of un-targeted attack on Google Cloud Vision API. ε = 0.05 Attack takes tens of thousands of hours to attack 1000 images, we compare them with CorrAttack Flip only on 20 images and un-targeted attack. The query limit is also reduced to 1000 as the time for BayesOpt and Bayes-Attack quickly increases as more samples add into the Gaussian distribution. The time comparison between three models is shown in Appendix C.6.

We formulate the score-based adversarial attack as a time-varying contextual bandits and propose a new method CorrAttack. By performing structured search on the blocks of the image, the bandits has the slow varying property. CorrAttack takes advantage of the the features of the arm, and uses Bayesian optimization with Gaussian process regression to learn the reward function. The experiment shows that CorrAttack can quickly find the action with large reward and CorrAttack achieves superior query efficiency and success rate on ImageNet and Google Cloud Vision API.We only include basic features for learning the bandits. Other features like embedding from the transfer-based attack Huang & Zhang (2020) may be taken into account in the future work. While our work only focuses on adversarial attack on ∞ norm, the same contextual bandits formulation could be generalized to other p norm to improve query efficiency. Besides, defense against CorrAttack may be achieved with adversarial training on CorrAttack , but it may not be able to defend other attacks in the meantime. Loss function (•, •), Input image x and its label y, Initial Block size b, Set of blocks E containing all blocks of the image, Threshold c, τ , α, Step size η, Adversarial budget ε

Table 5 and Table 6 show the success rate and average queries for ε = 0.04, 0.05, 0.06. CorrAttack Flip achieves the best performance among all methods.

Success rate and average queries of un-targeted attack on different ε. Query limit is 10000

Success rate and average queries of targeted attack on different ε. Query limit is 10000

Table8show the ablation study on the strategy to choose action x t+1 in the line 6 of Algorithm 1. The process of Bayesian optimization helps to accelerate the optimization. As targeted attack is more complicated and requires larger number of queries, CorrAttack has more advantage in this scenario.C.4 ABLATION STUDY ON HIERARCHICAL ATTACKWe perform un-targeted attack on Resnet50 as shown in Table10. Hierarchical attack lowers the average queries and improves the query efficiency. Besides, hierarchical attack avoids the problem of choosing block size. As shown in Table10, block size for non-hierarchical is essential for the performance.

Ablation study on random choices with success rate and average queries of un-targeted attack on ImageNet. ε = 0.05 and query limit is 10000

Ablation study on random choices with success rate and average queries of targeted attack on ImageNet. ε = 0.05 and query limit is 10000

Ablation study on random choices with success rate and average queries of un-targeted attack on defended model ImageNet. ε = 0.05 and query limit is 10000

Success rate and average queries of un-targeted attack on Resnet50 for Hierarchical Strategy.

and Bayes-Attack(Shukla et al., 2019) all try to search the adversarial noise on E with perturbation δ ∈ [-, ] d where d = h × w × c, the perturbation of block e ijk at time t is δ t e ijk .BayesOpt and Bayes-Attack use a GP regression directly on δ ∈ [-, ] d (all blocks),

Ablation study on features with success rate and average queries of targeted attack on ImageNet. ε = 0.05 and query limit is 10000. We use feature z e ijk = (i, j, k, pca) for CorrAttack Diffw pca and CorrAttack Flipw pca , use z e ijk = (i, j, k) for CorrAttack Diffw/o pca and CorrAttack Flipw/o pca .

Comparsion of running time between CorrAttack Flip and BayesOpt on un-targeted attack. "Per Query" means the average time needed to perform one query to the loss-oracle and "Per Image" denotes the average time to successfully attack an image. Since BayesOpt needs thousands of hours to run all samples, we only tested on 20 samples from ImageNet, which will be marked as *.

