SMALL INPUT NOISE IS ENOUGH TO DEFEND AGAINST QUERY-BASED BLACK-BOX ATTACKS

Abstract

While deep neural networks show unprecedented performance in various tasks, the vulnerability to adversarial examples hinders their deployment in safety-critical systems. Many studies have shown that attacks are also possible even in a blackbox setting where an adversary cannot access the target model's internal information. Most black-box attacks are based on queries, each of which obtains the target model's output for an input, and many recent studies focus on reducing the number of required queries. In this paper, we pay attention to an implicit assumption of these attacks that the target model's output exactly corresponds to the query input. If some randomness is introduced into the model to break this assumption, query-based attacks may have tremendous difficulty in both gradient estimation and local search, which are the core of their attack process. From this motivation, we observe even a small additive input noise can neutralize most query-based attacks and name this simple yet effective approach Small Noise Defense (SND). We analyze how SND can defend against query-based black-box attacks and demonstrate its effectiveness against eight different state-of-the-art attacks with CIFAR-10 and ImageNet datasets. Even with strong defense ability, SND almost maintains the original clean accuracy and computational speed. SND is readily applicable to pre-trained models by adding only one line of code at the inference stage, so we hope that it will be used as a baseline of defense against query-based black-box attacks in the future.

1. INTRODUCTION

Although deep neural networks perform well in various areas, it is now well-known that small and malicious input perturbation can cause them to malfunction (Biggio et al., 2013; Szegedy et al., 2013) . This vulnerability of AI models to adversarial examples hinders their deployment, especially in safety-critical areas. In a white-box setting, where the target model's parameters can be accessed, strong adversarial attacks such as Projected Gradient Descent (PGD) (Madry et al., 2018) can generate adversarial examples using the internal information. However, recent studies have shown that adversarial examples can be generated even in a black-box setting where the model's interior is hidden to adversaries. These black-box attacks can be largely divided into transfer-based attacks and query-based attacks. Transfer-based attacks take advantage of transferability that adversarial examples generated from a network can deceive other networks. Papernot et al. (2017) train a substitute model that mimics the behavior of the target model and show that the adversarial example created from it can successfully disturb different models. However, due to differences in training methods and model architectures, the transferability of adversarial examples can be significantly weakened, and thus, transfer-based attacks usually result in lower success rates (Chen et al., 2017) . For this reason, most black-box attacks are based on queries, each of which measures the target model's output for an input. Query-based attacks create adversarial examples through an iterative process based on either local search with repetitive small input modifications or optimization with estimated gradients of an adversary's loss with respect to input. However, requesting many queries in their process takes a lot of time and financial loss. Moreover, many similar query images can be suspicious to system administrators. For this reason, researchers have focused on reducing the number of queries required to make a successful adversarial example. Compared to the increasing number of studies on query-based attacks, the number of defenses against them is still very small (Bhambri et al., 2019) . Also, existing defenses developed for whitebox attacks may not be effective against query-based black-box attacks. Dong et al. (2020) find that existing defenses such as ensemble adversarial training (Tramèr et al., 2018) do not effectively defend against decision-based attacks. Therefore, it is necessary to develop new defense strategies that respond appropriately to the query-based attacks. To defend against query-based black-box attacks, we pay attention to an implicit but important assumption of these attacks that the target model's output exactly corresponds to the query input. If some randomness is introduced into the model to break this assumption, they can have tremendous difficulty in both gradient estimation and local search, which are the core of their attack process. This intuition is illustrated in Fig. 1a . In this paper, however, we highlight that simply adding small Gaussian noise into an image is enough to defeat various query-based attacks by breaking the above core assumption while almost maintaining clean accuracy. One may think that additive Gaussian noise cannot defend against most adversarial attacks unless we introduce large randomness. This idea is valid for white-box attacks (Gu & Rigazio, 2014) , but our experimental results show that small noise is surprisingly effective against query-based black-box attacks. Our second intuition regarding the minimization of clean accuracy loss can be seen in Fig. 1b . Dodge & Karam (2017) find that the classification accuracy decreases in proportion to the variance of Gaussian noise, but for a sufficiently small variance, the accuracy drop is negligible. Considering that the robustness against additive Gaussian noise is positively correlated to the distance to the decision boundary (Gilmer et al., 2019) , the above observation implies that clean images have a relatively long distance to the decision boundary. We think an adversarial defense should have the following goals: (1) preventing malfunction of a model against various attacks, (2) minimizing the computational overhead, (3) maintaining the accuracy on clean images, and (4) easily applicable to existing models. The proposed defense against query-based attacks meets all of the above objectives, and we name this simple yet effective defense Small Noise Defense (SND). Our contributions can be listed as follows: • We highlight the effectiveness of adding a small additive noise to input in defending against query-based black-box attacks. The proposed defense, SND, can be readily applied to pretrained models by adding only one line of code in the Pytorch framework (Paszke et al., 2019) at the inference stage (x = x + sigma * torch.randn_like(x)) and almost maintains the performance of the model. • We analyze how SND can efficiently interfere with gradient estimation and local search, which are the core of query-based attacks. • We explain the difficulty of evading SND. We devise an adaptive attack against SND and explain its limitations in terms of query-efficiency. • We have shown that the proposed method can effectively defend against eight different stateof-the-art query-based black-box attacks with the CIFAR-10 and ImageNet datasets. Specifically, we experimented with four decision-based and three score-based attacks in order to show strong defense ability against various attacks, including local search-based and optimizationbased methods.

2. BACKGROUND

Adversarial setting. In this paper, we will deal with adversarial attacks on the image classification task. Suppose that a neural network f (x) classifies an image x among total N classes and returns a class-wise probability vector y = [y 1 , ..., y N ] for x. For notational convenience, we also denote the probability of i th class (i.e., y i ) as f (x) i and the top-1 class index as h(x) = arg max i∈C y i , where C = {1, ..., N }. In a black-box threat model, an adversary has a clean image x 0 whose class index is c 0 and wants to generate an adversarial example x = x 0 +δ to fool a target model f . In the following, we denote the adversarial example at t th step in an iterative attack algorithm as xt . The adversary should generate an adversarial example within a perturbation norm budget and query budget Q. If we let q be the number of queries used to make δ, then we can write the adversary's objective as follows: min δ (x 0 + δ), subject to ||δ|| p ≤ and q ≤ Q, where In the following, we briefly introduce various query-based attacks used in this paper. ( x) = f ( x) c0 -max c =c0 f ( x) c for untargeted attacks and ( x) = max c =ĉ f ( x) c -f ( x Bandit optimization with priors (Bandit-TD). Ilyas et al. (2018) observe that the image gradients in successive steps of an iterative attack have strong correlation. In addition, they find that the gradients of surrounding pixels also have strong correlation. Bandit-TD exploits this information as priors for efficient gradient estimation. Simple Black-box Attack (SimBA & SimBA-DCT). For each iteration, SimBA (Guo et al., 2019a) samples a vector q from a pre-defined set Q and modify the current image xt with xtq and xt + q and updates the image in the direction of decreasing y c0 . Inspired by the observation that lowfrequency components make a major contribution to misclassification (Guo et al., 2018) , SimBA-DCT exploits DCT basis in low-frequency components for query-efficiency. Boundary Attack (BA). BA (Brendel et al., 2018) updates xt on the decision-boundary so that the perturbation norm gradually decreases via random walks while misclassification is maintained. Sign-OPT. Cheng et al. (2019a) treat a decision-based attack as a continuous optimization problem of the nearest distance to the decision boundary. They use the randomized gradient-free method (Nesterov & Spokoiny, 2017) for estimating the gradient of the distance. Cheng et al. (2019b) propose SIGN-OPT which uses the expectation of the sign of gradient with random directions to efficiently estimate the gradients without exhaustive binary searches. Hop Skip Jump Attack (HSJA). Chen et al. ( 2020) improve Boundary Attack with gradient estimation. For each iteration of HSJA, it finds an image on the boundary with a binary search algorithm, and estimates the gradients, and calculates the step-size towards the decision boundary. GeoDA. Rahmati et al. (2020) propose a geometry-based attack that exploits a geometric prior that the decision boundary of the neural network has a small curvature on average near data samples. By linearizing the decision boundary in the vicinity of samples, it can efficiently estimate the normal vector of the boundary, which helps to reduce the number of required queries for the adversarial attack.

2.1. ADVERSARIAL DEFENSES

As Dong et al. (2020) observe that randomization is important for effective defense against querybased attacks, we focus on randomization-based defenses among various defense methods. In what follows, we briefly explain three different randomization-based defenses along with PGDadversarial training. Random Self-Ensemble (RSE). RSE (Liu et al., 2018) adds Gaussian noise with σ inner = 0.1 to the input of each convolutional layer, except for the first convolutional layer where σ init = 0.2 is used. To stabilize the performance, they use an ensemble of multiple predictions for each image. Parametric Noise Injection (PNI). 

3.1. OUR APPROACH

To defend against query-based black-box attacks, we add Gaussian noise with a sufficiently small σ to the input as follows. f η (x) = f (x + η) , where η ∼ N (0, σ 2 I) and σ 1. (2) For an adversary, since the exact value of η is unknown, there can be multiple output values for any x, so f η is a random process. In what follows, we explain how this transform introduces tremendous difficulty in both gradient estimation and local search in query-based black-box attacks.

3.2. DEFENSE AGAINST OPTIMIZATION-BASED ATTACKS

In this subsection, we will explain how small Gaussian input noise can disturb the gradient estimation in optimization-based attacks. We first look at defense against score-based attacks and then deal with decision-based attacks. The core of optimization-based attacks is an accurate estimation of ∇ (x), which needs to be approximated with finite difference because of the black-box setting. For instance, the gradient can be estimated as g by Random Gradient-Free method (Nesterov & Spokoiny, 2017) as follows. g = 1 B B i=0 g i , where g i = ( xt + βu) -( xt ) β u and u ∼ N (0, σ 2 I). Conceptually, by introducing small Gaussian noise into input, g can differ greatly from the true gradient ∇ as shown in Fig. 1 . To illustrate it more formally, let us represent η by replacing it with η(x) to clarify η depends on both time and x. Suppose f * η(x) is a sample function of the random process f η(x) (x) at some time. Then, this function is very noisy because of η(x) with regard to x. We also assume that * is derived from f * η(x) , then unless Var[ * (x + u)] is near zero, * is discontinuous and nondifferentible and thus, ∇ * does not exist. Therefore, the gradient estimation using finite differences does not converge to the target gradient ∇ . In decision-based attacks, xt is likely to be in the vicinity of the decision boundary. Therefore even the small noise can move xt across the boundary so that the output is changed. The estimated gradient through frequently erroneous predictions hinders the generation of adversarial examples. We illustrate the above defense mechanism in Appendix A. On the other hand, the error of the binary search algorithm, which is widely used to calculate the distance to the decision boundary, can be amplified due to η. Therefore, algorithms such as HSJA, which assume that x is near the decision boundary, are likely to work incorrectly.

3.3. DEFENSE AGAINST LOCAL SEARCH-BASED ATTACKS

Almost all local search-based attacks do not take into account the uncertainty of the model output. Suppose an adversary recognizes that the attack objective loss decreases for x + τ where τ is a perturbation, and updates x as x + τ . However, since the actual evaluated input of f is x + τ + η where η ∼ N (0, σ 2 I), the attack objective loss might be rather increases at the originally intended input x + τ . This prediction error makes the attack algorithm stuck in the iterative process and prevents generating adversarial examples.

3.4. DIFFICULTY OF EVADING SMALL NOISE DEFENSE

Although the input of the function, x + η, is a Gaussian random process, but f (x + η), the result of the nonlinear function f , is no longer a Gaussian random process. This makes it very difficult for query-based attacks to bypass SND. One can approximate f (x) as E η [f η (x)] by taking the expectation through multiple queries using the fact that E(η) = 0. However, this attempt requires many queries for each iteration and greatly diminishes query efficiency. In addition, even if a large amount of queries are used, E η [f η (x)] may be different from f (x) because of the nonlinearity of the deep neural networks. With a simple example, we explain how the expectation value differs from the actual value when Gaussian noise is added to input of nonlinear functions. Let F (x) = x T x where F : R d → R, and F η (x) = F (x + η) where η ∼ N (0, σ 2 I). Suppose we want to estimate F (0) with F η (0). Then, E[F η (0)] = E[(0 + η) T (0 + η)] = E[η T η] = dσ 2 . Therefore, E[F η (0)] = dσ 2 = 0 = F (0) and if d is very large (e.g., for an image of size 224×224×3, d=150, 528), then the estimation error would be high. Additionally, in the case of a simple network consisting of an affine layer and ReLU activation, F (x) = ReLU(W x + b) and F : R - → R. Since F (x) is max(0, W x + b) and E[F η (x)] is W x + b -Φ(-W x+b |W |σ ) + |W |σφ(-W x+b |W |σ ) , the estimation error would exist (The detailed proof is in Appendix E). From the proof on the simple network, we can expect that the average of the output may have an error with the actual output even in a deep neural network.

4.1. EXPERIMENTAL SETTINGS

In this section, we evaluate the defense ability of SND against eight different query-based black-box attacks: BA, Sign-OPT, HSJA, GeoDA, SimBA, SimBA-DCT, Bandit-TD, Subspace Attack along with other defense methods: PNI, RSE, R&P, PGD-AT. We use the CIFAR-10 ( Krizhevsky et al., 2009) and ImageNet (Deng et al., 2009) datasets for our experiments and following previous studies (Chen et al., 2020; Guo et al., 2019a; He et al., 2019) , we use ResNet-20 for CIFAR-10 and ResNet-50 (He et al., 2016) for ImageNet for target networks. Following Brendel et al. (2018) , we randomly sampled 1,000 and 250 correctly classified images from the CIFAR-10 test set and the ImageNet Validation set for evaluation. We describe a detailed experimental setting in Appendix B. For evaluation metrics, we first define a successfully attacked image as an image from which an attack can find an adversarial image within the perturbation budget and query budget Q. With this definition, we use attack success rate, which is the percentage of the number of successfully attacked images over the total number of evaluated images. Note that since we evaluate defense performance, a lower attack success rate is better. We use = 1.0 for the CIFAR-10 dataset, and = 5.0 for the ImageNet dataset. In addition, we denote the q th query image as xq . Note that xq and xt can be different.

4.2. EXPERIMENTAL RESULTS

Evaluation of clean accuracy. We first evaluate the clean accuracy of models with defenses on the original test split (10k images) of the CIFAR-10 and validation split (50k images) of the ImageNet dataset. As shown in Table 1 , SND hardly reduces the clean accuracy compared to other methods. The accuracy drop caused by SND is not significant at σ ≤ 0.01 and becomes large at σ = 0.02, which implies that sufficiently small σ is important for maintaining clean accuracy. Evaluation on the CIFAR-10 dataset. We performed four different decision-based attacks against models with defenses, and Table 2 shows the evaluated attack success rates. SND shows competitive defense ability despite having more than 5% higher clean accuracy compared to other defenses. Moreover, due to significant performance drop, RSE and PNI cannot be applied to the models for large-scale image classification with the ImageNet dataset. Evaluation on the ImageNet dataset. We performed six different query-based attacks against models with defenses, and Table 3 shows the evaluated attack success rates. When the query budget Q is 20k, the average of the attack success rates over the attacks against the baseline is 87.9%, whereas SND with σ = 0.01 significantly reduces it to 10.0%. SND with σ = 0.001 also significantly reduces the average attack success rate to 29.5%, which is comparable to the second-best method, R&P (22.5%). We also calculate the average 2 norm of adversarial perturbations ||x 0 -xq || 2 at the predefined query budget Q to show whether the perturbation norm diverges or not. If an attack stops in the middle without requesting all Q queries, we use the last query image instead. In decision-based attacks, it can be seen that randomization-based defenses, SND and R&P, significantly increase the perturbation norm as q increases. In SimBA and SimBA-DCT, the perturbation norm is minimal in SND and R&P, which implies that the attacks have significant difficulty in finding a perturbation which decreases y c0 . Empirical evidence for assumptions of SND. To provide the supporting evidence for assumptions of SND in score-based attacks, we calculate σ = Var[ (x + η)] where η ∼ N (0, σ 2 I). With σ = 0.01 and 1,000 test images of the CIFAR-10, we evaluate (x + η) for 100 iterations for each clean image and calculate the σ averaged over all images. In our experiment, σ = 0.04 which is small but sufficient to make (x + η) to be non-differentiable about x. In decision-based attacks, our assumption in Section 3.2 is that if x is near the decision boundary, sufficiently small noise can easily move the image across the boundary. We evaluate P mis := P (h(x) = h(x + η)) through experiments. We count the above mismatch case for all queries during the attack process. With σ = 0.001, 0.01 and the CIFAR-10 test images, the average P mis for all attacks is calculated as 0.22 and 0.25, respectively. In constrast, on clean images, P mis is obtained as 0.002 and 0.021 respectively. Therefore, the results shown in Table 5 support our argument. Evaluation of an adaptive attack against SND. As described in Section 3.4, we devise an adaptive attack against SND that takes the expectation of predictions for repetitive T queries. In this experiment, we perform HSJA against SND with σ = 0.01 on the CIFAR-10 dataset. Since HSJA is a decision-based attack, we regard the most predicted class in T queries as the expected class. We measure the attack success rate and P mis according to the query budget, and the adaptive attack clearly shows a higher attack success rate than the baseline (T = 1), as shown in Table 4 . On the same query budget, however, the adaptive attack shows a lower attack success rate (e.g., 22.7% (T =1) > 18.2% (T =5) at Q=10k and 29.3% (T =5) > 27.3% (T =10) at Q=50k). Therefore, the expectation-based adaptive attack has limitations due to the restricted query budget. Moreover, even if T increases, the P mis does not decrease and this reinforces our argument in Section 3.4. We also apply the adaptive attack with T =10 to BA, SO, and GeoDA for comparison. The results are shown in Table 6 . Varying σ for each inference. So far, we have used a fixed σ for SND. Changing σ for each query may reduce clean accuracy while maintaining the defense ability. From this motivation, we multiply η with k which is randomly determined between 0 and 1 using the beta distribution with three (3) Sampling from a beta distribution with α=β=0.5 whose PDF is ∪-shaped. We calculate clean accuracy and average 2 norm of perturbations for each method. Among the three ways, α=β=2 is better than the others, but SND with fixed σ=0.01 is better than SND with variable σ, which implies large randomness in each query is crucial for a strong defense. Detailed results can be found in Appendix C. Defense against hybrid black-box attack. Recently proposed Subspace attack (Guo et al., 2019b) exploits transferability-based priors, gradients from local substitute models trained on a small proxy dataset. We use pre-trained ResNet-18 and ResNet-34 (He et al., 2016) as reference models for gradient priors. For this attack, we perform an attack based on ∞ norm because the authors provide parameter settings only for ∞ norm. Detailed results are described in Appendix D, but the result shows that SND alone cannot effectively defend against the hybrid attack with gradient priors. However, when SND is combined with PGD-AT, it effectively protects the model and decreases the attack success rate from 100% to 42.4% at Q=20k and σ=0.01. To focus on the defensive ability against gradient estimation, we recalculate the attack success rate without initially misclassified images. Then, the newly obtained attack success rate decreases from 100% to 16.4% at Q=20k. This result implies that SND can be combined with other defenses against transfer-based attacks to achieve strong defense ability against all types of black-box attacks.

5. RELATED WORK

History-based detection methods against query-based black-box attacks. To the best of our knowledge, studies that mainly target defending against query-based black-box attacks have not yet been published. However, history-based detection techniques for query-based attacks have been proposed recently (Chen et al., 2019; Li et al., 2020) . Considering that adversary requires many queries of similar images for finding an adversarial example, they store information about past query images to detect the unusual behavior of query-based attacks. Certified defense with additive Gaussian noise. Li et al. (2019) analyze the connection between robustness of models against additive Gaussian noise and adversarial perturbations. They derive the certified bounds on the norm bounded adversarial perturbation and they propose a new training strategy to improve the certified robustness. Simlilarly, randomized smoothing (Cohen et al., 2019) creates a smoothed classifier that correctly classifies when Gaussian noise is added to the classifier's input. Cohen et al. (2019) prove that this smoothed classifier can have 2 certified robustness for an input. Both SND and the above certified defenses add Gaussian noise to the input. However, the purpose of the addition of noise in the certified defenses is to induce the classifier to gain certified robustness. Whereas, SND adds noise to disturb an accurate measurement of the output to defend against query-based black-box attacks at the inference. In addition, the certified defenses use a much larger σ (≥ 0.25) than SND (0.01).

6. CONCLUSION

In this paper, we highlight that even a small additive input noise can effectively neutralize querybased black-box attacks and name this approach Small Noise Defense (SND). We demonstrate its effectiveness against eight different query-based attacks with CIFAR-10 and ImageNet datasets. Our work suggests that query-based black-box attacks should consider the randomness of the target network as well. SND is readily applicable to pre-trained models by adding only one code line without any structural changes and fine-tuning. Due to its simplicity and effectiveness, we hope that SND will be used as a baseline of defense against query-based black-box attacks in the future. We can list various exemplary future work as follows: (1) Finding a better type of noise. Other types of noise, such as salt-and-pepper or image processing like random contrast, can be used instead of Gaussian noise as long as it can disrupt query-based attacks while maintaining clean accuracy; (2) Devising effective black-box attacks against SND while maintaining query efficiency. The uncertainty of randomized predictions can be mitigated through expectation, but it requires many queries and hurts query-efficiency; (3) Rigorous proof of the defense ability of small input noise. We explain how SND can defend against query-based attacks, but one can establish theorems like the lower bound of mean squared error of estimated gradient with small input noise. PGD-AT: We use the adversarilly trained ResNet-50 model for 2 norm with train = 3 provided from robustness library (Engstrom et al., 2019) with PGD on the ImageNet dataset for comparison.

RSE:

We train a RSE-based ResNet-20 with σ init = 0.2 and σ inner = 0.1. Considering computational efficiency, we use 5 ensembles for each prediction of RSE. R&P: R&P applies random resizing and random padding to its input sequentially. It first rescales the input image of size W × H × 3 with a scale factor s which is sampled from [s min , s max ], and places it in a random position within an empty image of size s max W × s max Y × 3. Following the authors, we set s min and s max as 310 299 and 331 299 respectively. 



https://github.com/cmhcbb/attackbox https://github.com/Jianbo-Lab/HSJA https://github.com/thisisalirah/GeoDA https://github.com/cg563/simple-blackbox-attack https://github.com/MadryLab/blackbox-bandits https://github.com/ZiangYan/subspace-attack.pytorch



Figure 1: Illustrations of our intuitions. (a) Small noise can effectively disturb gradient estimation of query-based attacks which use finite difference (b) Compared to large noise, small noise hardly affects predictions on clean images.

We train a ResNet-20 model on the CIFAR-10 dataset for 200 epochs and use this model for our experiments. For the ImageNet dataset, we use the pre-trained ResNet-50 model provided by the Pytorch library. PNI: We use the pre-trained ResNet-20 model with PNI-W (channel-wise) provided by the authors trained on the CIFAR-10 dataset.

He et al. (2019) propose a method to increase the robustness of neural networks by adding trainable Gaussian noise to activation or weight of each layer. They introduce learnable scale factors of noise and allow them to be learned with adversarial training.

Comparison of clean accuracy. For randomization-based methods, we denote mean and standard deviation of clean accuracy in 5 repetitive experiments with different random seeds.

Evaluation of attack success rates (%) on the CIFAR-10 dataset.

Evaluation of attack success rates against defenses on the ImageNet dataset. We denote the average 2 norm of perturbations in the parenthesis.

Evalutation of SND with different T .

Evaluation of P (h(x) = h(x + η)).

Evalutation of the adaptive attack against SND with T =10. Uniformly random (the same as α=β=1) (2) Sampling from a beta distribution with α=β=2 whose probability density function (PDF) is ∩-shaped.

Experimental results of varying σ with the CIFAR-10 dataset. We evaluate mean and standard deviation of clean accuracy in 5 repetitive experiments on the original test dataset.Clean Acc. (%) E q ||x 0 -xq || 2

Evaluation of attack success rates of Subspace Attack against defenses on the ImageNet dataset. We also calculate the attack success rate without initially misclassified images and denote it in the parenthesis.

acknowledgement

Sign-OPT: We port the code 1 provided by the authors into our framework without changing the special parameters of the attack.

HSJA:

We adopt HSJA provided by Adversarial Robustness Toolbox (ART) library with default parameters except for increasing the maximum number of iterations to 64 to follow the authors' code 2 .GeoDA: We port the code 3 provided by the authors into our framework without changing the special parameters of the attack. SimBA & SimBA-DCT: We port the code 4 provided by the authors into our framework. Following the authors, we use freq_dims=28, order=strided, and stride=7 for SimBA-DCT.Bandit-TD: We port the code 5 provided by the authors into our framework with default parameters except for batch_size=1 and epsilon=4.9.Subspace Attack: We port the code 6 provided by the authors into our framework with the original setting for ∞ norm untargeted attack for the ImageNet. We use pre-trained ResNet-18 and ResNet-34 trained on the imagenetv2-val dataset as reference models that are provided by the authors.

annex

] can be obtained by the law of total expectation.Using the truncated normal distribution, we recall the fact as follows:where (7)

