ENHANCING THE TRANSFERABILITY OF ADVERSAR-IAL EXAMPLES VIA A FEW QUERIES AND FUZZY DO-MAIN ELIMINATING

Abstract

Due to the vulnerability of deep neural networks, the black-box attack has drawn great attention from the community. Though transferable priors decrease the query number of the black-box query attacks in recent efforts, the average number of queries is still larger than 100, which is easily affected by the query number limit policy. In this work, we propose a novel method called query prior-based method to enhance the attack transferability of the family of fast gradient sign methods by using a few queries. Specifically, for the untargeted attack, we find that the successful attacked adversarial examples prefer to be classified as the wrong categories with higher probability by the victim model. Therefore, the weighted augmented cross-entropy loss is proposed to reduce the gradient angle between the surrogate model and the victim model for enhancing the transferability of the adversarial examples. In addition, the fuzzy domain eliminating technique is proposed to avoid the generated adversarial examples getting stuck in the local optimum. Specifically, we define the fuzzy domain of the input example x in the -ball of x. Then, temperature scaling and fuzzy scaling are utilized to eliminate the fuzzy domain for enhancing the transferability of the generated adversarial examples. Theoretical analysis and extensive experiments demonstrate that our method could significantly improve the transferability of gradient-based adversarial attacks on CIFAR10/100 and ImageNet and outperform the black-box query attack with the same few queries.

1. INTRODUCTION

Deep Neural Network (DNN) has penetrated many aspects of life, e.g. autonomous cars, face recognition and malware detection. However, the imperceptible perturbations fool the DNN to make a wrong decision, which is dangerous in the field of security and will cause significant economic losses. To evaluate and increase the robustness of DNN, the advanced adversarial attack methods need to be researched. In recent years, the white-box attacks make a great success and the blackbox attacks make great progress. However, because of the weak transferability (with the low attack strength) and the large number of queries, the black-box attacks can still be further improved. Recently, a number of transferable prior-based black-box query attacks have been proposed to reduce the number of queries. For example, Cheng et al. (2019) proposed a prior-guided random gradientfree (P-RGF) method, which takes the advantage of a transfer-based prior and the query information simultaneously. Yang et al. (2020) also proposed a simple baseline approach (SimBA++), which combines transferability-based and query-based black-box attacks, and utilized the query feedback to update the surrogate model in a novel learning scheme. However, the average query number of the most query attacks is larger than 100 in the evaluations on ImageNet. In this scenario, the performance of these query attacks may be significantly affected when the query number limit policy is applied in the DNN application. Besides, many black-box transfer attacks have been proposed to enhance the transferability of the adversarial examples, e.g. fast gradient sign method (FGSM) (Goodfellow et al., 2015) , iterative FGSM (I-FGSM) (Kurakin et al., 2017) , momentum I-FGSM (MI-FGSM) (Dong et al., 2018) , diverse input I-FGSM (DI-FGSM) (Xie et al., 2019) , scale-invariant Nesterov I-FGSM (SI-NI-FGSM) (Lin et al., 2020) and variance-tuning MI-FGSM (VMI-FGSM) (Wang & He, 2021) . Zhang et al. (2022a) also proposed the relative cross-entropy loss (RCE) to enhance the transferability by maximizing the logit's rank distance from the ground-truth class. However, these transfer attacks achieve weak transferability of adversarial examples under the constraint of low attack strength. Therefore, to solve the above problems, we make the following contributions: • First, we propose the query prior-based attacks to enhance the transferability of adversarial examples with few queries under the constraint of low attack strength. Specifically, we find that: (i) The better the transferability of the transfer black-box attack, the smaller the gradient angle between the surrogate model and the victim model. (ii) The successful attacked adversarial examples prefer to be classified as the wrong categories with higher probability by the victim model. Based on the aforementioned findings, the weighted augmented cross-entropy (WACE) loss is proposed to decrease the gradient angle between the surrogate model and the victim model for enhancing the transferability of adversarial examples, which is proved in Appendices A.4 and A.5. The proposed query prior-based method enhances the transferability of the family of FGSMs by integrating the WACE loss and a few queries (this contribution is described in detail in Appendix C). • Second, when the query prior is not achieved, the fuzzy domain eliminating technique is used to enhance the transferability of adversarial examples. Specifically, we explore the effectiveness of the temperature scaling in eliminating the fuzzy domain and propose the fuzzy scaling to eliminate the fuzzy domain. By combining the temperature scaling and fuzzy scaling, fuzzy domain eliminating based cross-entropy (FECE) loss is proposed to enhance the transferability of the generated adversarial examples. In addition, the weighted augmented fuzzy domain eliminating based cross-entropy (WFCE) loss, which consists of the WACE and FECE loss, can further enhance the transferability of adversarial examples. • Third, theoretical analysis and extensive experiments demonstrate that: (i) On the premise of allowing query, the WACE loss is better than cross-entropy (CE) and RCE losses. (ii) The temperature scaling and fuzzy scaling can effectively eliminate a part of the fuzzy domain. (iii) Under the constraint of low attack strength, the query prior-based method and fuzzy domain eliminating technique can significantly improve the attack transferability of the family of fast gradient sign methods on CIFAR10/100 (Krizhevsky, 2009) and ImageNet (Russakovsky et al., 2015) .

2. PRELIMINARIES

The family of FGSMs and the RCE loss are briefly introduced, which is helpful to understand our methods in Section 3 and is regarded as the baselines in Section 4.

2.1. FAMILY OF FAST GRADIENT SIGN METHODS

The methods mentioned in this section are referred as the black-box transfer attacks with the objective of enhancing the transferability of adversarial examples. Fast gradient sign method (FGSM) (Goodfellow et al., 2015) is the first transfer attack, which generates the adversarial examples x adv by maximizing the loss function L(x adv , y o ; θ) with a onestep update: x adv = x + • sign (∇ x L (x, y o ; θ)) where is the attack strength, y o is the ground truth, θ is the model parameters, sign(•) is the sign function and ∇ x L (x, y o ; θ) is the gradient of the loss function w.r.t. x. Iterative FGSM (I-FGSM) (Kurakin et al., 2017) is the iterative version of FGSM by applying FGSM with a small step size: x 0 = x, x adv t+1 = Clip x x adv t + α • sign ∇ x L x adv t , y o ; θ where Clip x (•) function restricts the generated adversarial examples to be within the -ball of x. Momentum I-FGSM (MI-FGSM) (Dong et al., 2018) integrates the momentum into I-FGSM to escape from the poor local maxima and enhance the transferability of adversarial examples: g t+1 = µ • g t + ∇ x L x adv t , y o ; θ ∇ x L x adv t , y o ; θ 1 , x adv t+1 = Clip x x adv t + α • sign (g t+1 ) where g t is the accumulated gradient at iteration t, and µ is the decay factor of g t . Diverse inputs I-FGSM (DI-FGSM) (Xie et al., 2019) applies random transformations T r(•) to the input images at each iteration with probability p instead of only using the original images to generate adversarial examples. Scale-invariant Nesterov I-FGSM (SI-NI-FGSM) (Lin et al., 2020) integrates Nesterov Accelerated Gradient (NAG) into I-FGSM to leverage the looking ahead property of NAG, i.e. substitutes x adv t in Eq. 3 with x adv t + α • µ • g t , and build a robust adversarial attack. Due to the scale-invariant property of DNN, the scale-invariant attack method is also proposed to optimize the adversarial perturbations over the scale copies of the input images. Variance tuning MI-FGSM (VMI-FGSM) (Wang & He, 2021) further considered the gradient variance to stabilize the update direction and escape from the poor local maxima instead of directly using the current gradient for the momentum accumulation: g t+1 = µ • g t + ∇ x L x adv t , y o ; θ + v t ∇ x L x adv t , y o ; θ + v t 1 , v t+1 = 1 N N i=1 ∇ x L x adv ti , y o ; θ -∇ x L x adv t , y o ; θ , x adv t+1 = Clip x x adv t + α • sign (g t+1 ) where v t+1 is the gradient variance as the t-th iteration, x adv ti = x adv t +r i , r i ∼ U [-(β • ) d , (β • ) d ], and U a d , b d stands for the uniform distribution in d dimensions and β is a hyperparameter.

2.2. RELATIVE CROSS-ENTROPY (RCE) LOSS

To escape from the poor local maxima, RCE loss (Zhang et al., 2022a) is a new normalized CE loss that guides the logit to be updated in the direction of implicitly maximizing its rank distance from the ground-truth class: Sof tmax(z i ) = e zi C c=1 e zc , L CE (x, y o ; θ) = -log Sof tmax(z o ), L RCE (x, y o ; θ) = L CE (x, y o ; θ) - 1 C C c=1 L CE (x, y c ; θ) where z o is the logit of the ground truth label y o , C is the number of category, y c is the category with index c. Note that, Proposition 7 explains the effectiveness of the RCE loss from the perspective of our domain fuzzy eliminating in the targeted transfer attacks.

3. METHODOLOGY

In this section, the motivation is introduced first. Then, the weighted augmented cross-entropy (WACE) loss is proposed and a corresponding theoretical analysis is described. By combining the WACE loss with a few queries, our query prior-based method is mentioned. The fuzzy domain eliminating based cross-entropy (FECE) loss is proposed and its theoretical analysis is described. Finally, By combining the advantages of the WACE and FECE losses, the WFCE loss is proposed.

3.1. MOTIVATION

First, though the transferable prior-based black-box query attacks (Cheng et al., 2019; Yang et al., 2020) significantly reduce the query number, the average number of queries is still larger than 100. Algorithm 1 Query prior-based VMI-FGSM (QVMI-FGSM) Input: The surrogate model f with parameters θ f ; the victim model h with parameters θ h ; the WACE loss L W ACE ; an example x with ground truth label y o ; the magnitude of perturbation ; the number of iteration T and decay factor µ; the factor β for the upper bound of neighborhood and the number of example N for variance tuning; the maximum number of queries Q and number of the wrong top-n categories n. Output: An adversarial example x adv . α = /T g 0 = 0; v 0 = 0; x adv 0 = x for t = 0 → T -1 do Query the logit output of the victim model: Z h =        h x adv t , if Q ≥ T h x adv t , if Q < T ∧ t ∈ T Q • i | i = 0, 1, • • • , Q -1 Z h , if Q < T ∧ t / ∈ T Q • i | i = 0, 1, • • • , Q -1 Calculate the gradient g t+1 = ∇ x L W ACE x adv t , y o ; θ f , Z h , n Update g t+1 by variance tuning-based momentum: g t+1 = µ • g t + ∇ x L W ACE x adv t , y o ; θ f , Z h , n + v t ∇ x L W ACE x adv t , y o ; θ f , Z h , n + v t 1 Update v t+1 by sampling N examples in the neighborhood of x: v t+1 = 1 N N i=1 ∇ x L W ACE x adv ti , y o ; θ f , Z h , n -∇ x L W ACE x adv t , y o ; θ f , Z h , n Update x adv t+1 by applying the sign of gradient x adv t+1 = Clip x x adv t + α • sign (g t+1 ) end for x adv = x adv T return x adv The performance of these query attacks may be greatly affected by the query number limit policy of the DNN applications. On the contrary, we can use the results of a few queries as the priors to enhance the transferability of the black-box transferable attacks. Specifically, we find the preference of the attacked victim model (i.e., Proposition 2). Then a novel black-box transfer attack is designed to achieve higher transferability through the combination of the preference and the results of a few queries. Note that the detailed motivation is described in Appendix D. Second, a common phenomenon occurs in the black-box transfer attacks: under the same attack strength, with the increase of the attack strength, the attack success rate (ASR) of the white-box attacks fastly converges to 100%, but the ASR of the black-box transfer attacks is slowly approaching 100%. This phenomenon shows that there is a fuzzy domain between the surrogate model and the victim model for the black-box attacks. The fuzzy domain is a locally optimal region where the generated adversarial examples make the surrogate model wrong but the victim model still correct. Therefore, the fuzzy domain eliminating technique can enhance the transferability of the black-box transfer attacks. In this paper, the temperature scaling and fuzzy scaling are used to eliminate the fuzzy domain.

3.2. WEIGHTED AUGMENTED CROSS-ENTROPY LOSS

In this section, we first introduce the characteristics and preference of the victim model, and then propose the WACE loss based on the preference and give the theoretical analysis. For the iterative gradient-based attacks, let f and h denote the surrogate model and the victim model, respectively. We use θ f and θ h to denote the parameters of the surrogate model and the victim model, respectively. In the following, Definitions 1 and 2 are mentioned to define the gradient angle between f and h, and the top-n wrong categories and the top-n wrong categories attack success rate (ASR) respectively, which are used in the introduction and Proofs of Propositions 1 and 2 to analyze the preference of the victim model. Propositions 1 and 2 are proved in Appendices A.1 and A.2. Definition 1 (Gradient angle between the surrogate model and the victim model) For the t-th iteration adversarial example x adv t of the surrogate model f , the angle between ∇ x L(x adv t , y o ; θ f ) and ∇ x L(x adv t , y o ; θ h ) is the gradient angle between f and h at iteration t. Proposition 1 When the step size α is small, the better the transferability of the transfer black-box attack, the smaller the gradient angle between the surrogate model and the victim model. Definition 2 (Top-n wrong categories and top-n wrong categories attack success rate (ASR)) For the example (x, y o ), if the output of the victim model h is h(x), the top-n wrong categories are n number of categories with the largest value in h(x) except the ground truth y o , which is denoted as {y τi |i n}. The top-n wrong categories ASR denotes the accuracy of the adversarial example x adv classified as the wrong category in the top-n wrong categories. Proposition 2 When the victim model h is attacked by the white-box gradient-based attacks, the successful attacked adversarial examples prefer to be classified as the wrong categories with higher probability (i.e. the top-n wrong categories {y τi |i n}). Meanwhile, the higher the probability of the wrong category, the more likely the adversarial example is to be classified as this category. Therefore, according to Propositions 1 and 2 (the details in Appendix A.3), for the untargeted attack, the weighted augmented CE (WACE) loss is proposed to enhance the transferability of the adversarial examples. Besides maximizing the loss function L CE (x adv , y o ; θ f ), the WACE loss also minimizes the loss function L CE (x adv , y τi ; θ f ) where y τi belongs to the top-n wrong categories {y τi |i n}: L W ACE (x, y o ; θ f , Z h , n) = L CE (x, y o ; θ f ) - 1 n n i=1 w i • L CE (x, y τi ; θ f ) w i = e z h,τ i n j=1 e z h,τ j where Z h = h(x) = [z h,1 , z h,2 , • • • , z h,C ] is the query logit output of the victim model h to x, n is the number of the top-n wrong categories. Note that n i=1 w i = 1, and the higher the logit value of the wrong category, the larger the weight w i . According to Proposition 1, the following Theorem 1 verified that the transferability of the transfer black-box attack based on the WACE loss is better than that based on the RCE and CE losses. Theorem 1 is proved in Appendices A.4 and A.5.  ; θ h ). Propositions 1 and 2 and Theorem 1 are the theoretical analysis of the WACE loss, which explained the high transferability of the WACE loss-based attacks.

3.3. QUERY PRIOR-BASED ATTACKS

The family of fast gradient sign methods in Section 2.1 uses the CE loss. However, on the premise of allowing a few queries, the CE loss is replaced by the WACE loss in the family of fast gradient sign methods. Therefore, VMI-FGSM (Wang & He, 2021) is transformed into query prior-based VMI-FGSM, namely QVMI-FGSM, which is described in Algorithm 1 in detail. Specifically, two changes are made, compared with VMI-FGSM algorithm. First, the CE loss is replaced by our WACE loss. Second, according to Eq. 9, if the maximum number of queries Q is greater than or equal to the number of attack iteration T , QVMI-FGSM queries the logit output of the victim model at each iteration, otherwise, QVMI-FGSM starts from 0 and performs equidistant query with T Q as the interval. Similarly, FGSM (Goodfellow et al., 2015) , I-FGSM (Kurakin et al., 2017) , MI-FGSM (Dong et al., 2018) , DI-FGSM (Xie et al., 2019) and SI-NI-FGSM (Lin et al., 2020) are transformed into Q-FGSM, QI-FGSM, QMI-FGSM, QDI-FGSM and QSI-NI-FGSM by combining the query priors.

3.4. FUZZY DOMAIN ELIMINATING BASED CROSS-ENTROPY LOSS

In this section, we first define the fuzzy domain in the untargeted attacks and targeted attacks, respectively. Then, the temperature scaling and fuzzy scaling are introduced. Finally, the FECE loss is proposed based on these two scaling techniques and gives the theoretical analysis. In Definitions 3 and 4, p is a probability threshold to identify whether the adversarial example x is locally optimal, ĉ and τ are respectively the wrong category with the highest probability and the target category, and p ĉ and p τ are their corresponding probability in the probability vector of the adversarial example x predicted by the surrogate model f , respectively. The ground truth of x is y o . Definition 3 (The Fuzzy Domain in the untargeted attacks) In the spherical neighborhood B (x, ) with the input x as the center and as the radius, the subdomain containing the local optimal region of the surrogate model f is A f,-(p) = {x|x ∈ B (x, ) ∧ p ĉ < p}. On the contrary, the subdomain without the local optimal region is A f,+ (p) = B (x, ) -A f,-(p) = {x|x ∈ B (x, ) ∧ p ĉ p}. For the victim model h, in the domain B (x, ), the subdomain with correct classification is B h,+ = {x|x ∈ B (x, ) ∧ arg max h (x) = y o }, and the subdomain with wrong classification is B h,-= B (x, ) -B h,+ = {x|x ∈ B (x, ) ∧ arg max h (x) = y o }. Therefore, in the domain B (x, ), the fuzzy domain in the untargeted attacks (i.e., M N T (x,f,h) where N T represents the nontargeted attacks) is the region that makes the surrogate model fall into the local optimum and the victim model classification correct: M N T (x,f,h) = A f,-(p) ∩ B h,+ Definition 4 (The Fuzzy Domain in the targeted attacks) In the spherical neighborhood B (x, ) with the input x as the center and as the radius, the subdomain containing the local optimal region of the surrogate model f is A f,-(p) = {x|x ∈ B (x, ) ∧ p τ < p}. On the contrary, the subdomain without the local optimal region is A f,+ (p) = B (x, ) -A f,-(p) = {x|x ∈ B (x, ) ∧ p τ p}. For the victim model h, in the domain B (x, ), the subdomain classified as the target category τ is B h,+ = {x|x ∈ B (x, ) ∧ arg max h (x) = y τ }, and the subdomain classified as other categories is B h,-= B (x, ) -B h,+ = {x|x ∈ B (x, ) ∧ arg max h (x) = y τ }. Therefore, in the domain B (x, ), the fuzzy domain in the targeted attacks (i.e., M T a (x,f,h) where T a represents the targeted attacks) is the region that makes the surrogate model fall into the local optimum and is classified as other categories by the victim model: M T a (x,f,h) = A f,-(p) ∩ B h,- The recent researches (Dong et al., 2018; Xie et al., 2019; Lin et al., 2020; Wang & He, 2021) are trying to avoid stucking into the local optimum of the generated adversarial examples and make progress on the transferability. Therefore, the local optimal region in A f,-is closely related to B h,+ in the untargeted attacks and B h,-in the targeted attacks. Assumption 1 Because the local optimal region in A f,-is closely related to B h,+ in the untargeted attacks and B h,-in the targeted attacks, eliminating the domain A f,-can achieve the task of eliminating the fuzzy domain M N T (x,f,h) or M T a (x,f,h) . Based on Assumption 1, the temperature scaling and fuzzy scaling are used to eliminate A f,-. The temperature scaling was firstly proposed by Hinton et al. (2015) on knowledge distillation. The fuzzy scaling uses a penalty parameter K to apply to the logit of the correct category in the untargeted attacks (K > 1) or the logit of the target category in the targeted attacks (0 < K < 1). By combining the temperature scaling and fuzzy scaling, the fuzzy domain eliminating based crossentropy (FECE) loss is proposed for the untargeted attacks: F ESof tmax (z i ; T , K) =    e K•zo /T e K•zo /T + C c=1∧c =o e zc /T , i = o e z i /T e K•zo /T + C c=1∧c =o e zc /T , i = o (14) L F ECE (x, y o ; θ, T , K) = -log F ESof tmax (z o ; T , K) ( ) where T is the temperature parameter in the temperature scaling (T > 1), FESoftmax is a fuzzy domain eliminating based Softmax. For the targeted attacks, the ground truth category y o replaces as y τ in Equations 14 and 15. Based on Assumption 1, Propositions 3 and 4 prove that the temperature scaling (T > 1 and K = 1) can eliminate a part of the fuzzy domain in the untargeted attacks and targeted attacks, respectively. Propositions 5 and 6 prove that the fuzzy scaling can eliminate a part of the fuzzy domain in the untargeted attacks (T = 1 and K > 1) and targeted attacks (T = 1 and 0 < K < 1) respectively. Note that Propositions 3, 4, 5 and 6 are proved in Appendix A.6. Proposition 3 In the untargeted attacks, when p > 0.5, the temperature scaling (T > 1 and K = 1) can eliminate a part of the fuzzy domain M N T (x,f,h) . Proposition 4 In the targeted attacks, when p > 0.5, the temperature scaling (T > 1 and K = 1) can eliminate a part of the fuzzy domain M T a (x,f,h) . Proposition 5 In the untargeted attacks, the fuzzy scaling (T = 1 and K > 1) can eliminate a part of the fuzzy domain M N T (x,f,h) . Proposition 6 In the targeted attacks, the fuzzy scaling (T = 1 and 0 < K < 1) can eliminate a part of the fuzzy domain M T a (x,f,h) .

3.5. WEIGHTED AUGMENTED FUZZY DOMAIN ELIMINATING BASED CROSS-ENTROPY LOSS

To combine the advantages of the WACE and FECE losses, the weighted augmented fuzzy domain eliminating based cross-entropy (WFCE) loss is proposed: L W F CE (x, y o ; θ f , Z h , n, T , K) = L F ECE (x, y o ; θ f , T , K) - 1 n n i=1 w i • L F ECE (x, y o ; θ f , T , K)

4. EXPERIMENTS

To validate the effectiveness of the proposed query prior-based attacks and the fuzzy domain eliminating technique, we conduct extensive experiments on CIFAR10/100 (Krizhevsky, 2009) and Im-ageNet (Russakovsky et al., 2015) . Attacking a naturally trained model. As shown in Tables 1 and 2 , in comparison with different loss functions (the CE and RCE losses), our FECE loss can significantly enhance the transferability of the gradient iterative-based attacks when attacking the naturally trained model and the attack strength = 8/255 on different datasets (CIFAR10/100). Table 3 shows that our FECE loss can enhance the transferability of the latest gradient iterative-based attacks. Attacking an adversarially trained model. As shown in 

4.4. ABLATION STUDY ON THE UNTARGETED ATTACKS

Different numbers of the top-n wrong categories n and the query number Q. Figures 5 and 6 respectively evaluate the effect of different n and Q on the attack success rates of five naturally trained victim models and two adversarially trained victim models when these victim models are attacked by QI-FGSM ( = 8/255) with VGG16 for CIFAR10/100 and ImageNet. As shown in Figure 5 , when n is greater than a certain threshold, the attack success rate will not be improved. As shown in Figure 6 , the more the query, the greater the attack success rate. Different sizes of the penalty parameter K and the temperature T . Figures 9 and 10 respectively evaluate the effect of different K and T on the attack success rates of ResNet50 to VGG16 using various transfer attacks for CIFAR10/100 and ImageNet. As shown in Figure 9 , with the increase of K, the attack success rates of the gradient iterative-based attacks are significantly increased on CIFAR10 except for SI-NI-FGSM. Complementarily, as shown in Figure 10 , with the increase of T , the attack success rate of the SI-NI-FGSM is increased on CIFAR10, the attack success rates of all gradient iterative-based attacks are significantly increased on CIFAR100 and the attack success rates of the latest gradient iterative-based attacks (MI-FGSM, SI-NI-FGSM and VMI-FGSM) are increased by a reasonable T on ImageNet. Figure 11 further explores the optimal parameter combinations of K and T on different datasets, which are summarized in Table 30 .

4.5. COMPARISON WITH OR WITHOUT THE FUZZY DOMAIN ELIMINATING TECHNIQUE ON THE TARGETED ATTACKS

As shown in Figure 12 , slightly decreasing K from 1 can slightly increase the targeted attack success rates of several gradient iterative-based attacks on CIFAR10/100. As shown in Figure 13 , with the increase of the T , the targeted attack success rates of almost all the FECE (K = 1) based attacks are increased and close to that of the RCE based attacks (Propositions 4 and 7 explain the result). 5 RELATED WORK (Wu et al., 2020b; Inkawhich et al., 2019; 2020; Huang et al., 2019; Zhang et al., 2022b; Zhou et al., 2018; Ganeshan et al., 2019; Wang et al., 2021b) enhance the transferability of adversarial examples by destroying the features of the intermediate layers or critical neurons. The gradient generation-based attacks (Li et al., 2020a; Guo et al., 2020; Xie et al., 2019; Gao et al., 2020; Lin et al., 2020; Dong et al., 2018; Han et al., 2022; Wang & He, 2021) enhance the transferability of adversarial examples by changing the way of gradient generation. The data augmentation-based attacks (Wang et al., 2021a; Li et al., 2020b; Zou et al., 2020; Huang et al., 2021) (Cheng et al., 2019; Yang et al., 2020; Tashiro et al., 2020) use the prior knowledge of the surrogate model to decrease the query number. Recently, Tashiro et al. (2020) proposed Output Diversified Sampling to maximize diversity in the target model's outputs among the generated samples.

5.2. ADVERSARIAL DEFENSES

Adversarial training (Madry et al., 2018; Zhang et al., 2019; Wong et al., 2020; Pang et al., 2021) is the most effective method to defend against adversarial examples. Recently, Zhang et al. (2019) designed a new defense method to trade off the adversarial robustness against accuracy. Wong et al. (2020) discovered that adversarial training can use a much weaker and cheaper adversary, an approach that was previously believed to be ineffective, rendering the method no more costly than standard training in practice. Pang et al. (2021) investigated the effects of mostly overlooked training tricks and hyperparameters for the adversarially trained models.

6. CONCLUSION

Though transferable priors decrease the query number of the black-box query attacks, the average number of queries is still larger than 100, which is easily affected by the number of queries limit policy. On the contrary, we can utilize the priors of a few queries to enhance the transferability of the transfer attacks. In this work, we propose the query prior-based method to enhance the transferability of the family of FGSMs. Specifically, we find that: (i) The better the transferability of the transfer attack, the smaller the gradient angle between the surrogate model and the victim model. 

A PROOFS

We provide the proofs in this section. 

A.1 PROOF OF PROPOSITION 1

Proposition 1: When the step size α is small, the better the transferability of the transfer black-box attack, the smaller the gradient angle between the surrogate model and the victim model. Note that we explore the relationship between the cos ϑ (ϑ is the gradient angle between the surrogate model and the victim model) and the transferability on the same surrogate model and victim model pair using different transfer attack methods, but Liu et al. (2017) and Demontis et al. (2019) explore the relationship between the cos ϑ and the transferability on the different surrogate and victim model pairs using the same transfer attack method. 1), VGG16 is attacked by I-FGSM for CIFAR10. The smaller n, the higher the average top-n wrong categories ASR. Therefore, the higher the probability of the wrong category, the more likely the adversarial example is to be classified as this category. Note that the attack strength = 8/255.

Proof (Empirical Proof)

To verify the correctness of Proposition 1, we compare the relationship between the attack success rates of the family of fast gradient sign methods and the cosine values (i.e., the average cosine values of the gradient angles between the surrogate model and the victim model at all iterations) of the family of fast gradient sign methods when the attack strength, number of iteration and step size are , T, α = 8/255, 10, 0.8/255. If the sort of the attack success rates is the same as the sort of the cosine values, Proposition 1 is correct with high confidence. Empirically, Proposition 1 is verified on different surrogate models and datasets as follows. When VGG16 is the surrogate model and ResNet50 is the victim model for CIFAR10, Table 7 shows that the sort of the attack success rates is VMI-FGSM (76.60%) > MI-FGSM (70.75%) > SI-NI-FGSM (68.10%) > DI-FGSM (62.80%) > I-FGSM (61.45%) > FGSM (41.90%), and the Figure 1 - (1) shows that the sort of the cosine values is also basically VMI -FGSM > MI-FGSM > SI-NI-FGSM > DI-FGSM > I-FGSM > FGSM. When VGG16 is the surrogate model and ResNet50 is the victim model for CIFAR100, Table 8 shows that the sort of the attack success rates is VMI-FGSM (77.70%) > MI-FGSM (69.30%) > SI-NI-FGSM (64.35%) ≈ FGSM (63.15%) > DI-FGSM (59.30%) > I-FGSM (50.70%), and the Figure 1-( 2) shows that the sort of the cosine values is also basically VMI -FGSM > MI-FGSM > SI-NI-FGSM ≈ FGSM > DI-FGSM > I-FGSM. When VGG16 is the surrogate model and ResNet50 is the victim model for ImageNet, Table 9 shows that the sort of the attack success rates is VMI-FGSM (62.4%) > SI-NI-FGSM (56.6%) > MI-FGSM (46.5%) > DI-FGSM (38.1%) > FGSM (32.8%) > I-FGSM (27.8%), and the Figure 1-( 3) shows that the sort of the cosine values is basically VMI -FGSM > FGSM > SI-NI-FGSM > MI-FGSM > DI-FGSM > I-FGSM. When ResNet50 is the surrogate model and VGG16 is the victim model for CIFAR10, Table 1 shows that the sort of the attack success rates is VMI-FGSM (80.40%) > MI-FGSM (77.25%) > SI-NI-FGSM (73.00%) > DI-FGSM (67.65%) > I-FGSM (59.85%) > FGSM (44.20%), and the Figure 1 - (4) shows that the sort of the cosine values is also basically VMI -FGSM > MI-FGSM > SI-NI-FGSM > DI-FGSM > I-FGSM > FGSM. When ResNet50 is the surrogate model and VGG16 is the victim model for CIFAR100, Table 2 shows that the sort of the attack success rates is VMI-FGSM (84.40%) > MI-FGSM (77.35%) > SI-NI-FGSM (72.90%) > DI-FGSM (68.40%) > FGSM (64.80%) > I-FGSM (61.45%), and the Figure 1 -( 5) shows that the sort of the cosine values is also basically VMI -FGSM > MI-FGSM > SI-NI-FGSM > DI-FGSM > I-FGSM > FGSM. When ResNet50 is the surrogate model and VGG16 is the victim model for ImageNet, Table 3 shows that the sort of the attack success rates is VMI-FGSM (69.1%) > SI-NI-FGSM (68.7%) > MI-FGSM (55.4%) > DI-FGSM (52.6%) > FGSM (42.6%) > I-FGSM (32.1%), and the Figure 1 -(6) shows that the sort of the cosine values is also basically VMI -FGSM > FGSM > SI-NI-FGSM > MI-FGSM > DI-FGSM > I-FGSM. In conclusion, by discussing the above six cases for CIFAR10/100 and ImageNet datasets, for the family of iterative fast gradient sign methods except for FGSM, Proposition 1 is correct with high confidence. Therefore, decreasing the cosine value of the gradient angle between the surrogate model and the victim model can enhance the transferability of adversarial examples. Proof (Theoretical Proof) Assuming that the perturbation gradients of the surrogate model f and the victim model h at the attack iteration t are ∇ x L(x adv t , y o ; θ f ) and ∇ x L(x adv t , y o ; θ h ), respectively, and ϑ is the angle of them. Then, the perturbation gradient of the surrogate model f (i.e., ∇ x L(x adv t , y o ; θ f )) is decomposed as the parallel component ∇ x L(x adv t , y o ; θ f ) and the vertical component ∇ ⊥ x L(x adv t , y o ; θ f ), which satisfied that ∇ x L x adv t , y o ; θ f ∇ x L x adv t , y o ; θ h (17) ∇ ⊥ x L x adv t , y o ; θ f ⊥∇ x L x adv t , y o ; θ h (18) ∇ x L x adv t , y o ; θ f = ∇ x L x adv t , y o ; θ f + ∇ ⊥ x L x adv t , y o ; θ f (19) For the victim model h, assuming that the variation of the loss funtion of the victim model h caused by the perturbation ∇ x is ∆ h L (∇ x ) = L x adv t + ∇ x , y o ; θ h -L x adv t , y o ; θ h where ∆ h L (•) represents the variation of the loss function of the victim model h. According to Lemma 1, for the victim model h, three properties (i.e., Equations 21, 22 and 23) are achieved. For the first property, in the case of moving the same distance α (satisfied Lemma 1) along the gradient direction, the variation of the loss L x adv t , y o ; θ h along the direction of the parallel component ∇ x L(x adv t , y o ; θ f ) is greater than that along the direction of the vertical component ∇ ⊥ x L(x adv t , y o ; θ f ), i.e., ∆ h L   α • ∇ x L x adv t , y o ; θ f ∇ x L x adv t , y o ; θ f 2   > ∆ h L α • ∇ ⊥ x L x adv t , y o ; θ f ∇ ⊥ x L x adv t , y o ; θ f 2 (21) For the second property, the variation of the loss L x adv t , y o ; θ h along the direction of the parallel component ∇ x L(x adv t , y o ; θ f ) is positively correlated with the moving distance α, i.e., ∆ h L   α • ∇ x L x adv t , y o ; θ f ∇ x L x adv t , y o ; θ f 2   > ∆ h L   (α -∆ α ) • ∇ x L x adv t , y o ; θ f ∇ x L x adv t , y o ; θ f 2   ( ) where ∆ α represents the variation of the moving distance α. For the third property, the degree of correlation between ∆ h L α • ∇ x L(x adv t ,yo;θ f ) ∇ x L(x adv t ,yo;θ f ) 2 and α is greater than that between ∆ h L α • ∇ ⊥ x L(x adv t ,yo;θ f ) ∇ ⊥ x L(x adv t ,yo;θ f ) 2 and α, i.e., ∆ h L   (α + ∆ α ) • ∇ x L x adv t , y o ; θ f ∇ x L x adv t , y o ; θ f 2   + ∆ h L (α -∆ α ) • ∇ ⊥ x L x adv t , y o ; θ f ∇ ⊥ x L x adv t , y o ; θ f 2 > ∆ h L   α • ∇ x L x adv t , y o ; θ f ∇ x L x adv t , y o ; θ f 2   + ∆ h L α • ∇ ⊥ x L x adv t , y o ; θ f ∇ ⊥ x L x adv t , y o ; θ f 2 (23) where ∆ α represents the variation of the moving distance α. At the gradient attack iteration t+1, assuming that the generated adversarial example x adv t+1 in range B x adv t , α (i.e., in the sphere with radius α centered on the adversarial example x adv t ). Hence, for the surrogate model f , when the adversarial example x adv t moves the allowed maximum distance α along the direction of the perturbation gradient ∇ x L(x adv t , y o ; θ f ), the moving distance along the direction of the parallel component ∇ x L(x adv t , y o ; θ f ) is α = α • cos ϑ, the moving distance along the direction of the vertical component ∇ ⊥ x L(x adv t , y o ; θ f ) is α ⊥ = α • sin ϑ (25) Then, we discuss the angle ϑ in three cases, i.e., π 4 < ϑ π 2 , 0 ϑ π 4 and π 2 < ϑ π, respectively. First, when π 4 < ϑ π 2 or 0 cos ϑ < √ 2 2 ≈ 0.707, if the angle ϑ is reduced by ∆ ϑ , ∆ α = α • [cos (ϑ -∆ ϑ ) -cos ϑ] = α • (∆ ϑ • sin ϑ) (26) ∆ α ⊥ = α • [sin (ϑ -∆ ϑ ) -sin ϑ] = α • (-∆ ϑ • cos ϑ) (27) ⇒ |∆ α | > |∆ α ⊥ | (28) Therefore, according to Equations 21, 22, 23 and 28, if the angle ϑ is reduced by ∆ ϑ , ∆ h L ((α + |∆ α |) • ∇ x L(x adv t , y o ; θ f ) ∇ x L(x adv t , y o ; θ f ) 2 ) + ∆ h L ((α ⊥ -|∆ α ⊥ |) • ∇ ⊥ x L(x adv t , y o ; θ f ) ∇ ⊥ x L(x adv t , y o ; θ f ) 2 ) > ∆ h L ((α + |∆ α ⊥ |) • ∇ x L(x adv t , y o ; θ f ) ∇ x L(x adv t , y o ; θ f ) 2 ) + ∆ h L ((α ⊥ -|∆ α ⊥ |) • ∇ ⊥ x L(x adv t , y o ; θ f ) ∇ ⊥ x L(x adv t , y o ; θ f ) 2 ) > ∆ h L (α • ∇ x L(x adv t , y o ; θ f ) ∇ x L(x adv t , y o ; θ f ) 2 ) + ∆ h L (α ⊥ • ∇ ⊥ x L(x adv t , y o ; θ f ) ∇ ⊥ x L(x adv t , y o ; θ f ) 2 ) ( ) According to Assumption 2, in a small local area (i.e., α is small enough, which is satisfied Lemma 1), the variation of the loss L(x adv t , y o ; θ h ) along the direction of the perturbation gradient of the surrogate model f (i.e., ∆ h L (α • ∇xL(x adv t ,yo;θ f ) ∇xL(x adv t ,yo;θ f ) 2 )) is positively correlated with the sum of the variation of the loss L(x adv t , y o ; θ h ) along the directions of the parallel and vertical components (i.e., ∆ h L (α • ∇ x L(x adv t ,yo;θ f ) ∇ x L(x adv t ,yo;θ f ) 2 ) + ∆ h L (α ⊥ • ∇ ⊥ x L(x adv t ,yo;θ f ) ∇ ⊥ x L(x adv t ,yo;θ f ) )). Therefore, according to Equation 29, when the angle ϑ is reduced by ∆ ϑ , the loss L(x adv t , y o ; θ h ) is increased. According to the statistical results of Figure 1 , the average cosine of the angle ϑ between the latest transfer attack generated perturbation gradients of the surrogate model and the victim model is far less than 0.707 (such as the maximum average cosine value is less than 0.25 on CIFAR10, 0.18 on CIFAR100 and 0.05 on ImageNet). Therefore, with the latest transfer attack method as the baseline, increasing the cosine of the angle ϑ can effectively improve the transferable attack success rate. Second, when 0 ϑ π 4 or 0.707 cos ϑ 1, if the angle ϑ is reduced by ∆ ϑ , Equation 29 may not be satisfied. Therefore, increasing the cosine of the angle ϑ may not effectively improve the transferable attack success rate. Third, when π 2 < ϑ π, if the angle ϑ is reduced by ∆ ϑ , Equation 29 does not hold. Therefore, increasing the cosine of the angle ϑ can not improve the transferable attack success rate. According to the statistical results of Figure 1 , the average cosine of the angle ϑ is greater than 0 on the latest attack methods, so π 2 < ϑ π usually does not happen. Overall, with the latest transfer attack method as the baseline, increasing the cosine of the angle ϑ can effectively improve the transferable attack success rate. Lemma 1 When α is small enough, the loss L(x adv t , y o ; θ h ) increases fastest along the direction of the perturbation gradient ∇ x L(x adv t , y o ; θ h ) in the region B(x adv t , α) (i.e., in the sphere with radius α centered on the adversarial example x adv t ). Proof When α is small enough, in the region B(x adv t , α), the rate of variation of the loss L(x adv t , y o ; θ h ) along the direction of the perturbation gradient ∇ x L(x adv t , y o ; θ h ) is almost equal to the Lipschitz constant of the loss function L(x adv t , y o ; θ h ) in this region. Assumption 2 In a small local area (i.e., α is small enough, which is satisfied Lemma 1), the variation of the loss L(x adv t , y o ; θ h ) along the direction of the perturbation gradient of the surrogate model f (i.e., ∆ h L (α • ∇xL(x adv t ,yo;θ f ) ∇xL(x adv t ,yo;θ f ) 2 )) is positively correlated with the sum of the variation of the loss L(x adv t , y o ; θ h ) along the directions of the parallel and vertical components (i.e., ∆ h L (α • ∇ x L(x adv t ,yo;θ f ) ∇ x L(x adv t ,yo;θ f ) 2 ) + ∆ h L (α ⊥ • ∇ ⊥ x L(x adv t ,yo;θ f ) ∇ ⊥ x L(x adv t ,yo;θ f ) 2 )). Lemma 1 and Assumption 2 are used in the Theoretical Proof of Proposition 1. Note that, Assumption 2 is reasonable. When α is small enough, in the region B(x adv t , α), the rate of variation of the loss L(x adv t , y o ; θ h ) along the direction of the parallel component  ∇ x L(x adv t , y o ; θ f ) (which is parallel to ∇ x L(x adv t , y o ; θ h )) ; θ h ) along the direction of the vertical component ∇ ⊥ x L(x adv t , y o ; θ f ) (which is per- pendicular to ∇ x L(x adv t , y o ; θ h )) is almost 0. Therefore, ∆ h L (α • ∇xL(x adv t ,yo;θ f ) ∇xL(x adv t ,yo;θ f ) 2 ) is positively correlated with ∆ h L (α • ∇ x L(x adv t ,yo;θ f ) ∇ x L(x adv t ,yo;θ f ) 2 ) + ∆ h L (α ⊥ • ∇ ⊥ x L(x adv t ,yo;θ f ) ∇ ⊥ x L(x adv t ,yo;θ f ) 2 ). However, as α increases, the Lipschitz constant of the loss function L(x adv t , y o ; θ h ) in the region B(x adv t , α) also increases and is greater than the rate of variation of the loss L(x adv t , y o ; θ h ) along the direction of the parallel component ∇ x L(x adv t , y o ; θ f ). Meanwhile, in the region B(x adv t , α), the rate of variation of the loss L(x adv t , y o ; θ h ) along the direction of the vertical component ∇ ⊥ x L(x adv t , y o ; θ f ) becomes uncertain. Therefore, the positive correlation between ∆ h L (α • ∇xL(x adv t ,yo;θ f ) ∇xL(x adv t ,yo;θ f ) 2 ) and ∆ h L (α • ∇ x L(x adv t ,yo;θ f ) ∇ x L(x adv t ,yo;θ f ) 2 ) + ∆ h L (α ⊥ • ∇ ⊥ x L(x adv t ,yo;θ f ) ∇ ⊥ x L(x adv t ,yo;θ f ) 2 ) will weaken, which is why the transferable attack success rate of the gradient iterative-based attacks (I-FGSM, MI-FGSM, DI-FGSM, SI-NI-FGSM and VMI-FGSM) is higher than that of FGSM (because the attack step length of the gradient iterative-based attacks is less than that of FGSM). Corollary 1 When 0 cos ϑ < 0.707, increasing cos ϑ is a necessary and insufficient condition to improve the transferability of the generated adversarial examples. To effectively improve the transferability, a small step size α is also needed. Proof (Theoretical Proof) For 0 cos ϑ < 0.707, if α is small and satisfied Lemma 1, so Proposition 1 is correct. As α increases, the Lipschitz constant of the loss function L(x adv t , y o ; θ h ) in the region B(x adv t , α) also increases and is greater than the rate of variation of the loss L(x adv t , y o ; θ h ) along the direction of the parallel component ∇ x L(x adv t , y o ; θ f ). Meanwhile, in the region B(x adv t , α), the rate of variation of the loss L(x adv t , y o ; θ h ) along the direction of the vertical component ∇ ⊥ x L(x adv t , y o ; θ f ) becomes uncertain. Therefore, the positive correlation between ∆ h L (α • ∇xL(x adv t ,yo;θ f ) ∇xL(x adv t ,yo;θ f ) 2 ) and ∆ h L (α • ∇ x L(x adv t ,yo;θ f ) ∇ x L(x adv t ,yo;θ f ) 2 ) + ∆ h L (α ⊥ • ∇ ⊥ x L(x adv t ,yo;θ f ) ∇ ⊥ x L(x adv t ,yo;θ f ) 2 ) will weaken. When α is large to a certain extent, ∆ h L (α • ∇xL(x adv t ,yo;θ f ) ∇xL(x adv t ,yo;θ f ) 2 ) and ∆ h L (α • ∇ x L(x adv t ,yo;θ f ) ∇ x L(x adv t ,yo;θ f ) 2 )+∆ h L (α ⊥ • ∇ ⊥ x L(x adv t ,yo;θ f ) ∇ ⊥ x L(x adv t ,yo;θ f ) 2 ) are no longer positively correlated. Then Equation 29 does not hold, so Proposition 1 is incorrect. Therefore, a small step size α is a key parameter to effectively improve the transferability. Proof (Empirical Proof) As shown in Figure 1 Therefore, to effectively improve the transferability, a small step size α is also needed. Note that, in Figure 1 , the attack strength is the same, the step size α of FGSM is greater than SI-NI-FGSM, MI-FGSM and DI-FGSM.

A.2 PROOF OF PROPOSITION 2

Proposition 2: When the victim model h is attacked by the white-box gradient-based attacks, the successful attacked adversarial examples prefer to be classified as the wrong categories with higher probability (i.e. the top-n wrong categories {y τi |i n}). Meanwhile, the higher the probability of the wrong category, the more likely the adversarial example is to be classified as this category. . First, if the top-n wrong categories ASR is significantly higher than the average level (i.e., 1 C-1 where C is the number of categories of the classification task), namely ASR n=n 1 C-1 , the previous sentence of Proposition 2 is correct with high confidence. Second, when n1 < n2, if the average top-n1 wrong categories ASR is higher than the average top-n2 wrong categories ASR, namely ASR n=n1 > ASR n=n2 , the last sentence of Proposition 2 is correct with high confidence. Empirically, Proposition 2 is verified on different surrogate models and datasets as follows.

Proof (Empirical

When VGG16 for CIFAR10 is attacked by I-FGSM, Figure 2 -(1) shows that ASR n=5 > ASR n=4 > ASR n=3 > ASR n=2 > ASR n=1 ≥ 69% 1 10-1 at each iteration. With the increase of the iteration t, ASR n=n gradually increases and approaches 100%. Figure 3-(  1) shows that ASR n=1 > ASR n=2 > ASR n=3 > ASR n=4 > ASR n=5 at each iteration. When ResNet50 for CIFAR10 is attacked by I-FGSM, Figure 2-( 2) shows that ASR n=5 > ASR n=4 > ASR n=3 > ASR n=2 > ASR n=1 ≥ 76% 1 10-1 at each iteration. With the increase of the iteration t, ASR n=n gradually increases and approaches 100%. Figure 3-(  4) shows that ASR n=1 > ASR n=2 > ASR n=3 > ASR n=4 > ASR n=5 at each iteration. When VGG16 for CIFAR100 is attacked by I-FGSM, Figure 2-( 3) shows that ASR n=50 > ASR n=20 > ASR n=10 > ASR n=5 > ASR n=1 ≥ 54% 1 100-1 at each iteration. With the increase of the iteration t, ASR n=n gradually increases and approaches 100%. Figure 3-(  2) shows that ASR n=1 > ASR n=5 > ASR n=10 > ASR n=20 > ASR n=50 at each iteration. When ResNet50 for CIFAR100 is attacked by I-FGSM, Figure 2  -(4) shows that ASR n=50 > ASR n=20 > ASR n=10 > ASR n=5 > ASR n=1 ≥ 52% 100-1 at each iteration. With the increase of the iteration t, ASR n=n gradually increases and approaches 100%. Figure 3-(  5) shows that ASR n=1 > ASR n=5 > ASR n=10 > ASR n=20 > ASR n=50 at each iteration. When VGG16 for ImageNet is attacked by I-FGSM, Figure 2  -(5) shows that ASR n=50 > ASR n=20 > ASR n=10 > ASR n=5 > ASR n=1 ≥ 33% 1000-1 at each iteration. With the increase of the iteration t, ASR n=n gradually increases and approaches 100%. Figure 3-(  3) shows that ASR n=1 > ASR n=5 > ASR n=10 > ASR n=20 > ASR n=50 at each iteration. When ResNet50 for ImageNet is attacked by I-FGSM, Figure 2 -(6) shows that ASR n=50 > ASR n=20 > ASR n=10 > ASR n=5 > ASR n=1 ≥ 43% 1 1000-1 at each iteration. With the increase of the iteration t, ASR n=n gradually increases and approaches 100%. Figure 3-(  6) shows that ASR n=1 > ASR n=5 > ASR n=10 > ASR n=20 > ASR n=50 at each iteration. In conclusion, by discussing the above six cases on CIFAR10/100 and ImageNet datasets, the Proposition 2 is correct with high confidence. Therefore, after knowing the output of the victim model, directly classifying the adversarial examples into the category in the top-n wrong categories can remove the gradient perturbation of the other wrong categories. Proof (Theoretical Proof) To explore whether the successful adversarial examples prefer to be classified as the wrong categories with higher probability or not, the derivation formula of L CE w.r.t. the input x is ∂L CE ∂x = ∂L CE ∂z o • ∂z o ∂x + C i=1(i =o) ∂L CE ∂z i • ∂z i ∂x = - 1 ln 2 • 1 - e zo C i=1 e zi • ∂z o ∂x + 1 ln 2 • C i=1(i =o) e zi C j=1 e zj • ∂z i ∂x ( ) According to Equation 30, the coefficient of ∂zo ∂x (i.e., - 1 ln 2 • (1 -e zo C i=1 e z i )) is less than 0, and the coefficient of ∂zi ∂x (i.e., 1 ln 2 • e z i C j=1 e z j ) is greater than 0. The greater the logit output z i of the wrong category y i , the larger the coefficient of ∂zi ∂x (i.e., 1 ln 2 • e z i C j=1 e z j ). Therefore, in the process of the gradient ascent of the loss function L CE , the greater the logit output z i of the wrong category y i is, the faster z i grows. Due to the fact that the greater the logit output z i of the wrong category y i , the larger the probability p i , the successful adversarial examples prefer to be classified as the wrong categories with higher probability.

A.3 THE DETAIL DESIGN PROCESS OF THE WACE LOSS

According to Proposition 1, decreasing the gradient angle between the surrogate model f and the victim model h, i.e., the angle between ∇ x L CE (x adv t , y o ; θ f ) and ∇ x L CE (x adv t , y o ; θ h ), can enhance the transferability of adversarial examples. According to the previous sentence of Proposition 2, because the successful attacked adversarial examples prefer to be classified as the wrong categories with higher probability, to avoid the gradient perturbation of the other wrong categories, besides maximizing the loss function L CE (x adv , y o ; θ f ), we also minimize the distance between the model output and the top-n wrong categories with higher probability, namely maximizing L CE (x adv , y o ; θ f ) -n i=1 1 n • L CE (x adv , y τi ; θ f ) where each category in {y τi |i n} is equally important. According to the last sentence of Proposition 2, the higher the probability of the wrong category, the more likely the adversarial example is to be classified as this category. Therefore, we add weight to the distance calculation of the top-n wrong categories according to the logit of each category in the top-n wrong categories, namely maximizing L CE (x adv , y o ; θ f ) -n i=1 e z h,τ i n j=1 e z h,τ j • L CE (x adv , y τi ; θ f ). Therefore, the WACE loss is: L W ACE (x, y o ; θ f , Z h , n) = L CE (x, y o ; θ f ) - 1 n n i=1 e z h,τ i n j=1 e z h,τ j • L CE (x, y τi ; θ f ) A.4 PROOF OF THEOREM 1 Proof According to Proposition 2, ∇ x L CE x adv t , y o ; θ h = ∂L CE ∂x adv t = - ∂L CE ∂z h,o • - ∂z h,o ∂x adv t + C i=1(i =o) ∂L CE ∂z h,i • ∂z h,i ∂x adv t = 1 ln 2 •   1 - e z h,o C i=1 e z h,i • - ∂z h,o ∂x adv t + C i=1(i =o) e z h,i C j=1 e z h,j • ∂z h,i ∂x adv t   ≈ 1 ln 2 • 1 - e z h,o C i=1 e z h,i • - ∂z h,o ∂x adv t + n i=1 e z h,τ i n j=1 e z h,τ j • ∂z h,τi ∂x adv t ( ) Eq. 6, 7 and 9 in the main paper are the CE loss, the RCE loss and the WACE loss, respectively. The gradient of each loss w.r.t. x adv t on the surrogate model f is as follows, respectively. ∇ x L CE x adv t , y o ; θ f = ∂L CE ∂x adv t = - ∂L CE ∂z o • - ∂z o ∂x adv t + C i=1(i =o) ∂L CE ∂z i • ∂z i ∂x adv t = 1 ln 2 •   1 - e zo C i=1 e zi • - ∂z o ∂x adv t + C i=1(i =o) e zi C j=1 e zj • ∂z i ∂x adv t   (33) ∇ x L RCE x adv t , y o ; θ f = ∂L RCE ∂x adv t = - ∂L RCE ∂z o • - ∂z o ∂x adv t + C i=1(i =o) ∂L RCE ∂z i • ∂z i ∂x adv t = 1 ln 2 • - ∂z o ∂x adv t + C i=1 1 C • ∂z i ∂x adv t (34) ∇ x L W ACE x adv t , y o ; θ f = ∂L W ACE ∂x adv t = - ∂L W ACE ∂z o • - ∂z o ∂x adv t + C i=1(i =o) ∂L W ACE ∂z i • ∂z i ∂x adv t = 1 ln 2 • - ∂z o ∂x adv t + n i=1 e z h,τ i n j=1 e z h,τ j • ∂z τi ∂x adv t (35) Assuming that x adv is a successful attacked adversarial example and x adv are approximately 0. Eq. 32 and 33 are transformed as follows, respectively. ∇ x L CE x adv t , y o ; θ h ≈ 1 ln 2 • n i=1 e z h,τ i n j=1 e z h,τ j • ∂z h,τi ∂x adv t (36) ∇ x L CE x adv t , y o ; θ f ≈ 0 (37) Therefore, according to Proposition 2, in comparison with the gradient of the CE and RCE losses, Eq. 34, 35, 36 and 37 show that the gradient of the WACE loss remove the gradient of the unrelated wrong categories (i.e., the wrong categories with minimum probability). With the increase of the iteration t, according to Proposition 2, in comparison with the CE loss, Eq. 32, 33 and 35 show that the gradient of the WACE loss removes the gradient of the unrelated wrong categories and enhances the weight (or coefficient) of the gradient of the ground truth. In comparison with the RCE loss, Eq. 32, 34 and 35 show that the gradient of the WACE loss removes the gradient of the unrelated wrong categories. Therefore, Theorem 1 is correct. 

A.5 THE REDUCTION OF THE GRADIENT ANGLE WITH THE WACE LOSS

To verify that the WACE loss can reduce the gradient angle between the surrogate model and the victim model, we compare the cosine value of the gradient angle between the family of iterative FGSM and their query prior-based version. As shown in Figure 4 , for CIFAR10/100 and ImageNet, at each iteration, the cosine value of the gradient angle between the surrogate model and the victim model on the family of iterative FGSM are smaller than that on their query prior-based version. Therefore, the WACE loss can reduce the gradient angle between the surrogate model and the victim model. A.6 PROOF OF PROPOSITIONS 3, 4, 5 AND 6 Proposition 3: In the untargeted attacks, when p > 0.5, the temperature scaling (T > 1 and K = 1) can eliminate a part of the fuzzy domain M N T (x,f,h) . Proof When T > 1 and K = 1, Eq. 14 is transformed as: F ESof tmax (z i ; T , 1) = e zi/T C c=1 e zc/T Assuming that (i) the adversarial example x is generated without the temperature scaling and the logit output of the surrogate model f is Z f = f (x) = [z f,1 , z f,2 , • • • , z f,C ]; (ii) the adversarial example x is generated with the temperature scaling and the logit output of the surrogate model f is Z f = f (x) = z f,1 , z f,2 , • • • , z f,C ; (iii) the same attack eventually makes that ∀c, z f,c /T = z f,c . If x ∈ M N T (x,f,h) , p ĉ = e z f,ĉ C c=1 e z f,c = e z f,ĉ /T C c=1 e z f,c /T < p ( ) where ĉ is the wrong category with the highest probability. To make x not belong to M N T (x,f,h) , according to Assumption 1, x does not belong to A f,-(p). Therefore, the probability threshold p should satisfied: e z f,ĉ /T C c=1 e z f,c /T < p e z f,ĉ C c=1 e z f,c ⇒ 1 C c=1 e 1 T (z f,c -z f,ĉ ) < p 1 C c=1 e z f,c -z f,ĉ When the condition that ∀c C ∧ c = ĉ, z f,ĉ > z f,c is satisfied, 0.5 < 1 C c=1 e 1 T (z f,c -z f,ĉ ) < 1 C c=1 e z f,c -z f,ĉ . Hence, p > 0.5 ⇒ ∀c C ∧ c = ĉ, z f,ĉ > z f,c ⇒ 0.5 < 1 C c=1 e 1 T (z f,c -z f,ĉ ) < p 1 C c=1 e z f,c -z f,ĉ . ( ) Therefore, the temperature scaling can eliminate the fuzzy domain with 0.5 < 1 C c=1 e 1 T ( z f,c -z f,ĉ ) < p. Proposition 4: In the targeted attacks, when p > 0.5, the temperature scaling (T > 1 and K = 1) can eliminate a part of the fuzzy domain M T a (x,f,h) . Proof The Proof of Proposition 4 is the same as that of Proposition 3. Note that the category y ĉ is changed to y τ . Proposition 5: In the untargeted attacks, the fuzzy scaling (T = 1 and K > 1) can eliminate a part of the fuzzy domain M N T (x,f,h) . Proof When T = 1 and K > 1, Eq. 14 is transformed as: F ESof tmax (z i ; 1, K) =    e K•zo e K•zo + C c=1∧c =o e zc , i = o e z i e K•zo + C c=1∧c =o e zc , i = o (43) Assuming that (i) the adversarial example x is generated without the fuzzy scaling and the logit output of the surrogate model f is Z f = f (x) = [z f,1 , z f,2 , • • • , z f,C ]; (ii) the adversarial example x is generated with the fuzzy scaling and the logit output of the surrogate model f is Z f = f (x) = z f,1 , z f,2 , • • • , z f,C ; (iii) the same attack eventually makes that ∀c C ∧ c = ĉ, z f,c = z f,c , z f,o = K • z f,o . If x ∈ M N T (x,f,h) , p ĉ = e z f,ĉ C c=1 e z f,c = e z f,ĉ e K•z f,o + C c=1∧c =o z f,c < p To make x not belong to M N T (x,f,h) , according to Assumption 1, we need to make x does not belong to A f,-(p). Therefore, when z f,o = K • z f,o > 0, e z f,ĉ e K•z f,o + C c=1∧c =o e z f,c < p e z f,ĉ C c=1 e z f,c When z f,o = K • z f,o 0, x and x are almost successfully attacked, i.e., x, x / ∈ M N T (x,f,h) . Therefore, the fuzzy scaling can eliminate the fuzzy domain with e z f,ĉ e K•z f,o + C c=1∧c =o e z f,c < p when z f,o > 0. Proposition 6: In the targeted attacks, the fuzzy scaling (T = 1 and 0 < K < 1) can eliminate a part of the fuzzy domain M T a (x,f,h) . Proof When T = 1 and 0 < K < 1, Eq. 14 is transformed as: Assuming that (i) the adversarial example x is generated without the fuzzy scaling and the logit output of the surrogate model F ESof tmax (z i ; 1, K) =    e K•zτ e K•zτ + C c=1∧c =τ e zc , i = τ e z i e K•zτ + C c=1∧c =τ e zc , i = τ (46) f is Z f = f (x) = [z f,1 , z f,2 , • • • , z f,C ]; (ii) the adversarial example x is generated with the fuzzy scaling and the logit output of the surrogate model f is Z f = f (x) = z f,1 , z f,2 , • • • , z f,C ; (iii) the same attack eventually makes that ∀c C ∧ c = ĉ, z f,c = z f,c , z f,τ = K • z f,τ . If x ∈ M T a (x,f,h) , p τ = e z f,τ C c=1 e z f,c = e K•z f,τ e K•z f,τ + C c=1∧c =τ z f,c < p To make x not belong to M T a (x,f,h) , according to Assumption 1, we need to make x does not belong to A f,-(p). Therefore, when z f,τ = K • z f,τ > 0, because z f,c -z f,τ < z f,c -K • z f,τ , 1 1 + C c=1∧c =τ e z f,c -K•z f,τ < p 1 C c=1 e z f,c -z f,τ ⇒ e K•z f,τ e K•z f,τ + C c=1∧c =τ z f,c < p e z f,τ C c=1 e z f,c When z f,τ = K • z f,τ < 0, x and x are almost failed attacked. Therefore, the fuzzy scaling can eliminate the fuzzy domain with  e K•z f,τ e K•z f,τ + C c=1∧c =τ z f,c < p when z f,τ > 0.

B DETAILED EXPERIMENTAL ANALYSIS

In this section, we first introduce the experimental setup, then we compare our method with competitive baselines under various experimental settings.

B.1 EXPERIMENTAL SETUP

Datasets. Different methods are compared on CIFAR10/100 (Krizhevsky, 2009) and ImageNet (Russakovsky et al., 2015) . We randomly pick 2,000 clean images from the CIFAR10/100 test dataset and 1,000 clean images from the ILSVRC 2012 validation set (Russakovsky et al., 2015) , where the selected images are correctly classified by both surrogate model and victim model.

Models.

We consider nine naturally trained networks, including VGG16 (V16) (Simonyan & Zisserman, 2015) , VGG19 (V19) (Simonyan & Zisserman, 2015) , ResNet50 (R50) (He et al. , 2016), ResNet152 (R152) (He et al., 2016) , ResNext50 (RN50) (Xie et al., 2017) , WideResNet-16-4 (WRN-16-4) (Zagoruyko & Komodakis, 2016) , Inception-v3 (I-v3) (Szegedy et al., 2016) , DenseNet121 (D121) (Huang et al., 2017) and MobileNet-v2 (M-v2) (Sandler et al., 2018) and two adversarially trained models, namely adversarial Inception-v3 (a-I-v3) and adversarial ensemble Inception-Resnet-v2 (ae-IR-v2) (Tramèr et al., 2018) . We choose VGG16 and ResNet50 as source models for CIFAR10/100 and ImageNet, respectively. The CIFAR10/100 models are trained from scratch and the ImageNet models are the pretrained models in (Wightman, 2019; Huang, 2017) . Baselines. Several most recently proposed methods aiming at generating transferable adversarial examples are taken as baselines, i.e. FGSM (Goodfellow et al., 2015) , I-FGSM (Kurakin et al., 2017) , MI-FGSM (Dong et al., 2018) , DI-FGSM (Xie et al., 2019) , SI-NI-FGSM (Lin et al., 2020) and VMI-FGSM (Wang & He, 2021) , which are implemented in a pytorch repository (Kim, 2020) . In addition, the RCE loss (Zhang et al., 2022a) , which is integrated into the above transfer attacks instead of cross-entropy loss (CE), and two black-box query attacks, i.e. P-RGF (Cheng et al., 2019) and Square (Andriushchenko et al., 2020) , are taken as baselines to further validate the effectiveness of our method. Hyper-parameters. On CIFAR10/100 and ImageNet, we set the maximum perturbation, number of iteration and step size as , T, α = 8/255, 10, 0.8/255 or 16/255, 10, 1.6/255. We set the decay factor µ = 1.0 for MI-FGSM, SI-NI-FGSM and VMI-FGSM. The transformation probability is set to 0.5 for DI-FGSM. The number of scale copies is 5 for SI-NI-FGSM. The number of sampled examples in the neighborhood and the upper bound of neighborhood are 20 and 1.5, respectively. The number of query, which is the same as that of our query prior-based attacks, is set to Q = 10 for Square and P-RGF. For the proposed method, we set n=1 and Q = 10 for CIFAR10, n=5 and Q = 10 for CIFAR100 and ImageNet. 

B.2.1 ATTACKING A NATURALLY TRAINED MODEL

To validate that the query priors can enhance the transferability of the transfer attacks, we perform six transfer attacks with or without the query priors to attack six naturally trained models for CIFAR10/100 and ImageNet. As shown in Tables 1, 2, 3, 7, 8 and 9, when the attack strength = 8/255, the query prior-based attacks with the WACE loss can not only significantly improve the transfer attack success rate of the black-box setting but also improve the attack success rate of the white-box setting on different surrogate models and datasets. In comparison with the CE loss, the average increase of the ASR is 2.98 to 4.43% on Q-FGSM and 4.12 to 15.48% on the other five 3 and 9 perform six transfer attacks with or without the query priors to attack two adversarially trained models for ImageNet on different surrogate models when the attack strength = 8/255. The results show that the query prior-based attacks with the WACE loss can enhance the transferability of the gradient iterative-based attacks when attacking the adversarially trained model. In comparison with the CE loss, the increase of the ASR is 0.1 to 0.3% on Q-FGSM and 0.7 to 6.3% on the other five transfer attacks for ImageNet (except for a slight decrease on Q-FGSM with ResNet50 As shown in Tables 1, 2 , 3, 4, 5, 6, 7, 8, 9, 10, 11 and 12 , whether the attack strength = 8/255 or 16/255, when the allowed query number Q = 10, the attack success rate of our QVMI-FGSM is much larger than that of Square and PRGF when attacking the naturally trained models on different surrogate models and datasets. As shown in Tables 3, 6 , 9 and 12, whether the attack strength = 8/255 or 16/255, when the allowed query number Q = 10, the attack success rate of our QVMI-FGSM is much larger than that of PRGF when attacking the adversarially trained models on different surrogate models for ImageNet. As shown in Table 3 , when the attack strength = 8/255 and the allowed query number Q = 10, the attack success rate of our QVMI-FGSM is larger than that of Square when attacking the adversarially trained models on ImageNet. To highlight the advantages of the query prior-based attacks for attacking the adversarially trained models when compared with Square, we set adversarial ensemble Inception-Resnet-v2 as the surrogate model rather than the naturally trained models (i.e. VGG16 or ResNet50) and adversarial Inception-v3 as the victim model, and reduce the query number from 10 to 5 (i.e. Q = 5). As shown in Table 19 , when the attack strength is 8/255, by comparing the best query prior-based attacks with Square, the increase of the ASR is 11.8%. As shown in Table 20 , when the attack strength is 16/255, the increase of the ASR is 7.5%. In conclusion, (i) through the comparison with Square, whether the attack strength is low or high, the ASR of the query prior-based attacks is far greater than that of Square for attacking six naturally trained models. When attacking two adversarially trained models, at the low attack strength, i.e. = 8/255, some query prior-based attacks are better than Square (Q = 10). At the high attack strength, i.e. = 16/255, Square (Q = 10) is better than the query prior-based attacks and Square (Q = 0) is better than the transfer attacks (i.e., the family of FGSMs). However, when we use the adversarially trained model as the surrogate model and reduce the query number, regardless of whether the attack strength is low or high, the ASR of the query prior-based attacks is greater than that of Square for attacking the other adversarially trained models. (ii) P-RGF is inefficient at limits of a few queries on six naturally trained models and two adversarially trained models for CIFAR10/100 and ImageNet. The ASR of the query prior-based attacks is far greater than that of P-RGF. As shown in Figures 5 and 7 , when n is greater than a certain threshold, the attack success rate will not be improved, e.g., Figure 5 shows that the threshold is 2 for CIFAR10, 10 for CIFAR100 and 5 for ImageNet approximately, and Figure 7 shows that the threshold is 2 for CIFAR10, 5 for CI-FAR100 and 10 for ImageNet approximately. Because increasing n increases the calculation time of the gradient, n is not the bigger the better.

B.5.2 DIFFERENT QUERY NUMBERS

Figures 6 and 8 respectively evaluate the effect of different Q on the attack success rates of five naturally trained victim models and two adversarially trained victim models when these victim models are attacked by QI-FGSM ( = 8/255) with VGG16 and ResNet50 for CIFAR10/100 and Ima-geNet. On the victim models, the more the query, the greater the attack success rate. As shown in Figure 6 , when the query number increases from 1 to 10, the attack success rate increases by 3.5% at most for CIFAR10, 10% at most for CIFAR100 and 5% at most for ImageNet approximately, and the increased attack success rate is mainly increased in the first five queries. As shown in Figure 8 , when the query number increases from 1 to 10, the attack success rate increases by 3% at most for CIFAR10, 8% at most for CIFAR100 and 5% at most for ImageNet approximately, and the increased attack success rate is mainly increased in the first five queries.

B.5.3 COMPARISON WITH OR WITHOUT THE QUERY PRIORS WHEN Q = 1

When Q = 0, the query prior-based methods will be transformed into the usual methods, e.g., QI-FGSM → I-FGSM. To further explore the effectiveness of the query-prior based method, we set the query number Q as 1 and I-FGSM is selected as the baseline. To further verify the effectiveness of the query prior-based attacks, we make a more fair comparison that QVMI-FGSM compared with the combination of VMI-FGSM and Square. As shown in Tables 27, 28 and 29, when the attack strength = 8/255 and the allowed query number Q = 10, the results of our QVMI-FGSM have higher performance than the combination of VMI-FGSM and Square to attack five naturally trained models on CIFAR10/100 and ImageNet.

B.5.5 DIFFERENT SIZES OF THE TEMPERATURE PARAMETER

Figure 9 evaluates the effect of different K on the attack success rates of ResNet50 to VGG16 using various transfer attacks for CIFAR10/100 and ImageNet. As shown in Figure 9 , with the increase of K, the attack success rates of the gradient iterative-based attacks are significantly increased on CIFAR10 except for SI-NI-FGSM. When the K increases to 2, the performance of all gradient iterative-based attacks is almost optimal on CIFAR10.

B.5.6 DIFFERENT SIZES OF THE PENALTY PARAMETER

Figure 10 evaluates the effect of different T on the attack success rates of ResNet50 to VGG16 using various transfer attacks for CIFAR10/100 and ImageNet. As shown in Figure 10 , with the increase of T , the attack success rate of the SI-NI-FGSM is increased on CIFAR10, the attack success rates of all gradient iterative-based attacks are significantly increased on CIFAR100 and the attack success rates of the latest gradient iterative-based attacks (MI-FGSM, SI-NI-FGSM and VMI-FGSM) are increased by a reasonable T on ImageNet. In addition, Figure 11 further explores the optimal parameter combinations of K and T on different datasets, which are summarized in Table 30 .

B.6 COMPARISON WITH OR WITHOUT THE FUZZY DOMAIN ELIMINATING TECHNIQUE ON THE TARGETED ATTACKS

As shown in Figure 12 , slightly decreasing K from 1 can slightly increase the targeted attack success rates of several gradient iterative-based attacks on CIFAR10/100. As shown in Figure 13 , with the increase of the T , the targeted attack success rates of almost all the FECE (K = 1) based attacks are increased and close to that of the RCE based attacks, which are theoretically analyzed in Propositions 4 and 7. Proposition 7 With the increase of the T , the targeted attack success rates of almost all the FECE (K = 1) based attacks are close to that of the RCE based attacks. Proof In the targeted attacks, the RCE and FECE losses respectively are Eq. 49 and 50: L RCE (x, y τ ; θ) = -L CE (x, y τ ; θ) + 1 C C c=1 L CE (x, y c ; θ) L F ECE (x, y τ ; θ, T , 1) = log F ESof tmax (z τ ; T , 1) where K = 1 in the FECE loss. To explore the targeted attack success rates of almost all the FECE (K = 1) based attacks are close to that of the RCE based attacks with the increase of the T , the derivation formula of L RCE (x, y τ ; θ) and L F ECE (x, y τ ; θ, T , 1) w.r.t. the input x respectively are Eq. 51 and 52: where sign(•) is the sign function. Therefore, with the increase of the T , the targeted attack success rates of almost all the FECE (K = 1) based attacks are close to that of the RCE based attacks. ∂L RCE ∂x = ∂L RCE ∂z τ • ∂z τ ∂x + C i=1(i =τ ) ∂L RCE ∂z i • ∂z i ∂x = 1 ln 2 • ∂z τ ∂x - C i=1 1 C • ∂z i ∂x (51) ∂L F ECE ∂x = ∂L F ECE ∂z τ • ∂z τ ∂x + C i=1(i =τ ) ∂L F ECE ∂z i • ∂z i ∂x = 1 T • ln 2 •   1 - Therefore, according to Proposition 4, the high performance of the RCE loss in the targeted transfer attacks can be explained as the fuzzy domain eliminating technique. Therefore, by utilizing Propositions 1, 2 and Corollary 1, we design a simple WACE loss function. Theorem 1 and Figure 4 prove that the WACE loss is better than the CE and RCE losses on reducing the gradient angle between the surrogate model and the victim model. Based on the WACE loss, we design the query prior-based attacks, which solves the above two problems and are verified by the extended experiments. Overall, in the fourth scenario (i.e., the allowed number of query Q ≤ 10), our method has the highest attack success rate when compared with the current black-box attacks. The targeted attack success rates (%) of ResNet50 to VGG16 using various transfer attacks for CIFAR10/100 and ImageNet when varying the size of K (0 < K < 1) in the FECE loss. Note that T in the FECE loss is 1. The targeted attack success rates (%) of ResNet50 to VGG16 using various transfer attacks for CIFAR10/100 and ImageNet when varying the size of T (T > 1) in the FECE loss. Note that K in the FECE loss is 1.



(ii)  The successful attacked adversarial examples prefer to be classified as the wrong categories with higher probability by the victim model. Based on the above findings, the weighted augmented crossentropy (WACE) loss is proposed to decrease the gradient angle between the surrogate model and the victim model for enhancing the transferability of adversarial examples. In addition, because the existence of the fuzzy domain makes it difficult to transfer the adversarial examples generated by the surrogate model to the victim model, the fuzzy domain eliminating technique, which consists of the fuzzy scaling and the temperature scaling, is proposed to enhance the transferability of the generated adversarial examples. Theoretical analysis and extensive experiments demonstrate the effectiveness of the query prior-based attacks and fuzzy domain eliminating technique.

Figure 1: The cosine value of the gradient angle between the surrogate model and the victim model at each iteration when the surrogate model is attacked by different methods for CIFAR10/100 and ImageNet. For example, in subfigure (1), VGG16 as the surrogate model and ResNet50 as the victim model are attacked by different transfer attacks for CIFAR10. Note that the attack strength = 8/255.

Figure3: The average top-n wrong categories attack success rate (ASR) (%) at each iteration t when the model is attacked by I-FGSM (white-box setting) for CIFAR10/100 and ImageNet. For example, in subfigure (1), VGG16 is attacked by I-FGSM for CIFAR10. The smaller n, the higher the average top-n wrong categories ASR. Therefore, the higher the probability of the wrong category, the more likely the adversarial example is to be classified as this category. Note that the attack strength = 8/255.

-(3), when VGG16 is the surrogate model and ResNet50 is the victim model for ImageNet, the sort of the cosine values is basically FGSM > SI-NI-FGSM > MI-FGSM > DI-FGSM > I-FGSM, but the sort of the attack success rate is SI-NI-FGSM (56.6%) > MI-FGSM (46.5%) > DI-FGSM (38.1%) > FGSM (32.8%) > I-FGSM (27.8%).As shown in Figure1-(6), when ResNet50 is the surrogate model and VGG16 is the victim model for ImageNet, the sort of the cosine values is basically FGSM > SI-NI-FGSM > MI-FGSM > DI-FGSM > I-FGSM, but the sort of the attack success rate is SI-NI-FGSM (68.7%) > MI-FGSM (55.4%) > DI-FGSM (52.6%) > FGSM (42.6%) > I-FGSM (32.1%).

is correctly classified by the surrogate model f and the victim model h with almost 100% probability. When the iteration t is equal to 0, (1 -e zo C i=1 e z i ) and (1 -e z h,o C i=1 e z h,i ) are approximately 0, and

Figure 5: The untargeted attack success rates (%) on the victim models with adversarial examples generated by QI-FGSM ( = 8/255) for CIFAR10/100 and ImageNet (the surrogate model is VGG16) when varying the number of the top-n wrong categories n.

Figure6: The untargeted attack success rates (%) on the victim models with adversarial examples generated by QI-FGSM ( = 8/255) for CIFAR10/100 and ImageNet (the surrogate model is VGG16) when varying the number of queries Q.

Figure 7: The untargeted attack success rates (%) on the victim models with adversarial examples generated by QI-FGSM ( = 8/255) for CIFAR10/100 and ImageNet (the surrogate model is ResNet50) when varying the number of the top-n wrong categories n.

Figure 8: The untargeted attack success rates (%) on the victim models with adversarial examples generated by QI-FGSM ( = 8/255) for CIFAR10/100 and ImageNet (the surrogate model is ResNet50) when varying the number of queries Q.

Figures 5 and 7 respectively evaluate the effect of different n on the attack success rates of five naturally trained victim models and two adversarially trained victim models when these victim models are attacked by QI-FGSM ( = 8/255) with VGG16 and ResNet50 for CIFAR10/100 and ImageNet. As shown in Figures5 and 7, when n is greater than a certain threshold, the attack success rate will not be improved, e.g., Figure5shows that the threshold is 2 for CIFAR10, 10 for CIFAR100 and 5 for ImageNet approximately, and Figure7shows that the threshold is 2 for CIFAR10, 5 for CI-FAR100 and 10 for ImageNet approximately. Because increasing n increases the calculation time of the gradient, n is not the bigger the better.

52, with the increase of T , when T → +∞, 1 -e zτ /

Figure9: The untargeted attack success rates (%) of ResNet50 to VGG16 using various transfer attacks for CIFAR10/100 and ImageNet when varying the size of K (K > 1) in the FECE loss. Note that T in the FECE loss is 1.

Figure10: The untargeted attack success rates (%) of ResNet50 to VGG16 using various transfer attacks for CIFAR10/100 and ImageNet when varying the size of T (T > 1) in the FECE loss. Note that K in the FECE loss is 1.

Figure12: The targeted attack success rates (%) of ResNet50 to VGG16 using various transfer attacks for CIFAR10/100 and ImageNet when varying the size of K (0 < K < 1) in the FECE loss. Note that T in the FECE loss is 1.

Figure13: The targeted attack success rates (%) of ResNet50 to VGG16 using various transfer attacks for CIFAR10/100 and ImageNet when varying the size of T (T > 1) in the FECE loss. Note that K in the FECE loss is 1.

Theorem 1 The angle between ∇ x L W ACE (x adv



is almost equal to the Lipschitz constant of the loss function L(x adv t , y o ; θ h ) in this region. Meanwhile, in the region B(x adv t , α), the rate of variation of the loss L(x adv t , y o

Proof) To verify the correctness of Proposition 2, we explore the top-n wrong categories attack success rate (ASR) on I-FGSM at each iteration where the attack strength, number of iterations and step of size are , T, α = 8/255, 10, 0.8/255. Assuming that ASR n=n denotes the top-n wrong categories ASR, and ASR n=n denotes the average top-n wrong categories, namely ASR n=n = ASR n=n n

The cosine value of the gradient angle between the surrogate model and the victim model at each iteration t when the surrogate model is attacked by different methods for CIFAR10/100 and ImageNet. For example, in subfigure (1), VGG16 as the surrogate model and ResNet50 as the victim model are attacked by different transfer attacks for CIFAR10. The query prior-based attacks can significantly improve the cosine value of the gradient angle between the surrogate model and the victim model, i.e. decrease the gradient angle between the surrogate model and the victim model. Note that the attack strength = 8/255.

The untargeted attack success rates (%) on six naturally trained models for CIFAR10 using various transfer attacks and two query attacks with the attack strength = 8/255. The adversarial examples are generated by ResNet50. * denotes the attack success rates under white-box attacks. Average means to calculate the average value except * . Note that Q = 1 in Q-FGSM.

The untargeted attack success rates (%) on six naturally trained models for CIFAR100 using various transfer attacks and two query attacks with the attack strength = 8/255. The adversarial examples are generated by ResNet50. * denotes the attack success rates under white-box attacks. Average means to calculate the average value except * . Note that Q = 1 in Q-FGSM.

The untargeted attack success rates (%) of six naturally trained models and two adversarially trained models on ImageNet using various transfer attacks and two query attacks with the attack strength = 8/255. The adversarial examples are generated on ResNet50. * denotes the attack success rates under white-box attacks. Avg. means to calculate the average value of the naturally trained models except * . Note that Q = 1 in Q-FGSM.

The untargeted attack success rates (%) on six naturally trained models for CIFAR10 using various transfer attacks and two-query attacks with the attack strength = 16/255. The adversarial examples are generated by ResNet50. * denotes the attack success rates under white-box attacks. Average means to calculate the average value except * . Note that Q = 1 in Q-FGSM.

The untargeted attack success rates (%) on six naturally trained models for CIFAR100 using various transfer attacks and two query attacks with the attack strength = 16/255. The adversarial examples are generated by ResNet50. * denotes the attack success rates under white-box attacks. Average means to calculate the average value except * . Note that Q = 1 in Q-FGSM.

Table 30 concludes the best parameter combination of K and T for the FECE loss on different transfer attacks and datasets with the ResNet50 as the surrogate model.

The untargeted attack success rates (%) on six naturally trained models and two adversarially trained models for ImageNet using various transfer attacks and two query attacks with the attack strength = 16/255. The adversarial examples are generated by ResNet50. * denotes the attack success rates under white-box attacks. Avg. means to calculate the average value of the naturally trained models except * . Note that Q = 1 in Q-FGSM.

The untargeted attack success rates (%) on six naturally trained models for CIFAR10 using various transfer attacks and two query attacks with the attack strength = 8/255. The adversarial examples are generated by VGG16. * denotes the attack success rates under white-box attacks. Average means to calculate the average value except * . Note that Q = 1 in Q-FGSM.

The untargeted attack success rates (%) on six naturally trained models for CIFAR100 using various transfer attacks and two query attacks with the attack strength = 8/255. The adversarial examples are generated by VGG16. * denotes the attack success rates under white-box attacks. Average means to calculate the average value except * . Note that Q = 1 in Q-FGSM.

The untargeted attack success rates (%) on six naturally trained models and two adversarially trained models for ImageNet using various transfer attacks and two query attacks with the attack strength = 8/255. The adversarial examples are generated by VGG16. * denotes the attack success rates under white-box attacks. Avg. means to calculate the average value of the naturally trained models except * . Note that Q = 1 in Q-FGSM. 6% on the other five query prior-based transfer attacks for ImageNet.In addition, as shown in Tables 4, 5, 6, 10, 11 and 12, when the attack strength = 16/255, in comparison with the CE and RCE losses, the query prior-based attacks with the WACE loss can still effectively enhance the transferability of the gradient iterative-based attacks on different surrogate models and datasets.

The untargeted attack success rates (%) on six naturally trained models for CIFAR10 using various transfer attacks and two query attacks with the attack strength = 16/255. The adversarial examples are generated by VGG16. * denotes the attack success rates under white-box attacks. Average means to calculate the average value except * . Note that Q = 1 in Q-FGSM.

The untargeted attack success rates (%) on six naturally trained models for CIFAR100 using various transfer attacks and two query attacks with the attack strength = 16/255. The adversarial examples are generated by VGG16. * denotes the attack success rates under white-box attacks. Average means to calculate the average value except * .Note that Q = 1 in Q-FGSM.In conclusion, through the comparison with or without the query priors, at the low attack strength, i.e. = 8/255, the query prior-based attacks can significantly enhance the transferability of adversarial examples to attack the naturally trained models. At the high attack strength, i.e. = 16/255, most query prior-based attacks can enhance the transferability of adversarial examples, but the average ASR of QSI-NI-FGSM has a slight decrease on ImageNet with VGG16 as the surrogate model.

The untargeted attack success rates (%) on six naturally trained models and two adversarially trained models for ImageNet using various transfer attacks and two query attacks with the attack strength = 16/255. The adversarial examples are generated by VGG16. * denotes the attack success rates under white-box attacks. Avg. means to calculate the average value of the naturally trained models except * . Note that Q = 1 in Q-FGSM.

The untargeted attack success rates (%) on five naturally trained models for CIFAR10 using VMI-FGSM as the baseline with the attack strength = 8/255. The adversarial examples are generated by VGG16 and ResNet50, which are the surrogate model and the query model, respectively. Note that ResNet50 is both the query model and the victim model.6.03% on FGSM and 2.31 to 7.25% on the other five gradient iterative-based attacks for CIFAR10, 1.3 to 10.27% on the five gradient iterative-based attacks for CIFAR100, 0.4% on FGSM and 2.3 to 3.4% on the latest gradient iterative-based attacks (SI-NI-FGSM and VMI-FGSM) for ImageNet. In comparison with the RCE loss, the average increase of the ASR is 12.69% on FGSM and 0.64 to 11.14% on the other five gradient iterative-based attacks for CIFAR10, 0.21 to 0.6% on several gradient iterative-based attacks (MI-FGSM, DI-FGSM and VMI-FGSM) for CIFAR100 (the average ASR is kept on I-FGSM and SI-NI-FGSM), 6.9% on FGSM and 1.2 to 3.9% on the latest gradient iterative-based attacks for ImageNet.In addition, as shown in Tables4, 5 and 6, when the attack strength = 16/255, in comparison with the CE loss, our FECE loss based attacks can still effectively enhance the transferability of the gradient iterative-based attacks on different datasets. In comparison with the RCE loss, our FECE loss based attacks can also still effectively enhance the transferability of the gradient iterativebased attacks on CIFAR10 and ImageNet, and keep the transferability of the gradient iterative-based attacks on CIFAR100. In conclusion, through the comparison with or without the fuzzy domain eliminating technique, at the low attack strength, i.e. = 8/255, our FECE loss can effectively enhance the transferability of adversarial examples to attack the naturally trained models on different datasets. At the high attack strength, i.e. = 16/255, our FECE loss can effectively enhance the transferability of adversarial examples to attack the naturally trained models on CIFAR10 and ImageNet, and keep the transferability of adversarial examples on CIFAR100.

performs the three latest transfer attacks (MI-FGSM, SI-NI-FGSM and VMI-FGSM) with or without the fuzzy domain eliminating technique to attack two adversarially trained models for ImageNet when the attack strength = 8/255. In comparison with different loss functions (the CE and RCE losses), our FECE loss can enhance the transferability of VMI-FGSM and keep (or slightly decrease) the transferability of the other transfer attacks.In addition, as shown in Table6, when the attack strength = 16/255, in comparison with the CE

The untargeted attack success rates (%) on five naturally trained models for ImageNet using VMI-FGSM as the baseline with the attack strength = 8/255. The adversarial examples are generated by ResNet50 and VGG16, which are the surrogate model and the query model, respectively. Note that VGG16 is both the query model and the victim model.

The untargeted attack success rates (%) on adversarial Inception-v3 for ImageNet using various transfer attacks and a query attack with the attack strength = 8/255. The adversarial examples are generated by adversarial ensemble Inception-Resnet-v2. Note that Q = 1 in Q-FGSM and Q = 5 in the other attacks. VMI-FGSM. At the high attack strength, i.e. = 16/255, the CE, RCE and FECE losses have their own advantages on different transfer attacks.B.3.3 COMBINATION OF THE QUERY PRIORS AND FUZZY DOMAIN ELIMINATING TECHNIQUEAs shown in Tables1, 2, 3, 4, 5 and 6, whether the attack strength = 8/255 or 16/255, when attacking the naturally trained model, in comparison with our WACE and FECE losses, our WFCE loss can further improve the transferability of the gradient iterative-based attacks on different datasets. In addition, when attacking the adversarially trained model and the attack strength = 16/255, in comparison with our WACE and FECE losses, our WFCE loss can further improve the transferability of the latest VMI-FGSM on ImageNet.

The untargeted attack success rates (%) on adversarial Inception-v3 for ImageNet using various transfer attacks and a query attack with the attack strength = 16/255. The adversarial examples are generated by adversarial ensemble Inception-Resnet-v2. Note that Q = 1 in Q-FGSM and Q = 5 in the other attacks.

The untargeted attack success rates (%) on five naturally trained models for CIFAR10 using I-FGSM as the baseline with the attack strength = 8/255. VGG16 is the surrogate model and the query number of QI-FGSM is 1, i.e., Q = 1.

The untargeted attack success rates (%) on five naturally trained models for CIFAR10 using I-FGSM as the baseline with the attack strength = 8/255. ResNet50 is the surrogate model and the query number of QI-FGSM is 1, i.e., Q = 1.

The untargeted attack success rates (%) on five naturally trained models for CIFAR100 using I-FGSM as the baseline with the attack strength = 8/255. VGG16 is the surrogate model and the query number of QI-FGSM is 1, i.e., Q = 1.As shown in Tables21, 22, 23, 24, 25 and 26, even if the number of query Q is 1, the query priorbased method can still significantly improve the transferability of the baseline method on different surrogate models and different datasets with attack strength = 8 255 .B.5.4 FURTHER VERIFY THE EFFECTIVENESS OF THE QUERY PRIOR-BASED ATTACKS

The untargeted attack success rates (%) on five naturally trained models for CIFAR100 using I-FGSM as the baseline with the attack strength = 8/255. ResNet50 is the surrogate model and the query number of QI-FGSM is 1, i.e., Q = 1.

The untargeted attack success rates (%) on five naturally trained models for ImageNet using I-FGSM as the baseline with the attack strength = 8/255. VGG16 is the surrogate model and the query number of QI-FGSM is 1, i.e., Q = 1.

The optimal parameter of the FECE loss on the combination of different methods and datasets with ResNet50 as the surrogate model.

ETHICS STATEMENT

We do not anticipate any negative ethical implications of the proposed method. The datasets (CI-FAR10/100 and ImageNet) used in this paper are publicly available and frequently used in the domain of computer vision. The proposed method is beneficial to the development of AI security.REPRODUCIBILITY STATEMENT Appendix B.1 introduces the details experimental setup, including datasets, models, baselines and hyper-parameters. Most baselines are implemented in a popular pytorch repository Kim (2020) . Python implementation of this paper and all baselines are available in the supplementary materials. as the surrogate model). In comparison with the RCE loss, the increase of the ASR is 0.6 to 2.5% on Q-FGSM and 1.6 to 5.2% on the other five transfer attacks for ImageNet.In addition, as shown in Tables 6 and 12 , when the attack strength = 16/255, in comparison with the CE and RCE losses, the query prior-based attacks with the WACE loss can still effectively enhance the transferability of the gradient iterative-based attacks to attack the adversarially trained models on different surrogate models (except for QSI-NI-FGSM with ResNet50 as the surrogate model to attack adversarial Inception-v3).In conclusion, through the comparison with or without the query priors, at the low attack strength, i. When VGG16 and ResNet50 are the surrogate model and the query model, respectively, as shown in Tables 13, 15 and 17, the attack success rate of QVMI-FGSM is almost higher than that of VMI-FGSM (with the CE loss). Specifically, the average increase of the ASR is 5.01% on CIFAR10, 2.12% on CIFAR100 and 1.78% on ImageNet. In addition, the ASR of QVMI-FGSM is higher than that of VMI-FGSM with the RCE loss.When ResNet50 and VGG16 are the surrogate model and the query model, respectively, as shown in Tables 14, 16 and 18, the attack success rate of QVMI-FGSM is also almost higher than that of VMI-FGSM (with the CE loss). Specifically, the average increase of the ASR is 4.28% on CIFAR10, 0.64% on CIFAR100 and 3.42% on ImageNet. In addition, the ASR of QVMI-FGSM is also almost higher than that of VMI-FGSM with the RCE loss (except for CIFAR100 with ResNet50 as the surrogate model and VGG16 as the query model).Overall, the adversarial examples generated by QVMI-FGSM not only perform better on the query model but also perform better on the other models. 

B.7 LIMITATION

The query prior-based attacks are effective for the untargeted attack. However, because Proposition 2 is more conducive to the exploration of the untargeted attack than the targeted attack, the proposed query prior-based attacks are designed as the untargeted attacks, which may not work in the targeted attack. In the future, the design of the query prior-based targeted attack is still a problem that needs to be studied.

C THE DETAILED CONTRIBUTION INTRODUCTION OF THE QUERY PRIOR-BASED ATTACKS

The detailed contributions of the query prior-based attacks as follows.First, we propose Proposition 1 and Corollary 1, which explore the relationship between the cos ϑ (ϑ is the gradient angle between the surrogate model and the victim model) and the transferability on the same surrogate model and victim model pair using different transfer attack methods. In addition, we propose Proposition 2, which finds the preference property of deep neural networks. The Theoretical and Empirical Proofs of Propositions 1, 2 and Corollary 1 are represented in the appendices A.1 and A.2.Second, by utilizing Propositions 1, 2 and Corollary 1, we designed a simple WACE loss function. Theorem 1 and Figure 4 proved that the WACE loss is better than CE and RCE losses on reducing the gradient angle between the surrogate model and victim model. Based on the WACE loss, we designed the query prior-based attacks, which solved two problems. First, compared with several latest transfer attack methods, the query prior-based attacks significantly improve the transferable attack success rate on the target victim model for CIFAR10/100 and ImageNet, and effectively improve the transferable attack success rate on the other models for CIFAR10/100 and ImageNet. Second, compared with two latest effective query attack methods, when the number of query is reduced to 10, the attack success rate of our QVMI-FGSM still remains high and is much higher than them.Third, as far as we know, our query-prior based attack method is the first try to solve the problem of black-box attack that allows a few queries (i.e., less or equal to 10).

ATTACKS

To the best of our knowledge, we can divide the current black-box attacks into three scenarios. The first scenario is the query-free transfer-based attack, i.e., the allowed number of query Q = 0.The adversarial examples are generated by the surrogate model without any knowledge of any target model. For example, the current transfer-based attacks are the query-free transfer-based attack, i.e., FGSM, I-FGSM, MI-FGSM, DI-FGSM, SI-NI-FGSM and VMI-FGSM.The second scenario is the query-based attack without transfer prior, i.e., a sufficient number of query and without transfer prior. The adversarial examples are generated by gradient estimation or random search. For example, a typical effective algorithm is Square.The third scenario is the query-based attack with transfer prior, i.e., a sufficient number of query and with transfer prior. The adversarial examples are generated by the combination of the transfer prior and gradient estimation (or random search) where the transfer prior is used to improve the efficiency of gradient estimation and reduce the number of query. For example, a typical effective algorithm is PRGF.In our paper, we explore a novel scenario, i.e., the fourth scenario. The fourth scenario is the transfer-based attack with a few queries, i.e., the allowed number of query Q ≤ 10. The adversarial examples are generated by the surrogate model with a few query outputs of a target victim model (the number of query Q ≤ 10).The fourth scenario is reasonable, and there is no black-box attack algorithm specifically belonging to the fourth scenario at present. Why is the fourth scenario reasonable? There are two reasons to answer the question and the two reasons are also the problems that existed in the first, second and third scenarios.First, in the second and third scenarios, although the number of queries in the current query-based attacks is decreasing, it still needs hundreds of queries. Even if the number of query Q ≤ 10, the attack success rates of the query-based attacks with or without transfer prior are significantly reduced, and are far lower than the query-free transfer-based attack in the first scenario, which can be found in our experimental results.Second, Proposition 1 and Corollary 1 of our paper explore the reason why the attack success rate of the current query-free transfer-based attack in the first scenario is increasing (i.e., when the step size α is small, the better the transferability of the transfer-based attack, the smaller the gradient angle ϑ between the surrogate model and the victim model). To reduce the angle ϑ for improving the transferability, Proposition 2 of our paper explores the preference of deep neural network implemented classification models after being attacked by the gradient-based attack algorithm (i.e., the successful attacked adversarial examples prefer to be classified as the wrong categories with higher probability). By utilizing Propositions 1, 2 and Corollary 1, we can design an algorithm to reduce the angle ϑ for enhancing the transferability of the generated adversarial examples with a few query outputs of a target victim model.

