ENHANCING THE TRANSFERABILITY OF ADVERSAR-IAL EXAMPLES VIA A FEW QUERIES AND FUZZY DO-MAIN ELIMINATING

Abstract

Due to the vulnerability of deep neural networks, the black-box attack has drawn great attention from the community. Though transferable priors decrease the query number of the black-box query attacks in recent efforts, the average number of queries is still larger than 100, which is easily affected by the query number limit policy. In this work, we propose a novel method called query prior-based method to enhance the attack transferability of the family of fast gradient sign methods by using a few queries. Specifically, for the untargeted attack, we find that the successful attacked adversarial examples prefer to be classified as the wrong categories with higher probability by the victim model. Therefore, the weighted augmented cross-entropy loss is proposed to reduce the gradient angle between the surrogate model and the victim model for enhancing the transferability of the adversarial examples. In addition, the fuzzy domain eliminating technique is proposed to avoid the generated adversarial examples getting stuck in the local optimum. Specifically, we define the fuzzy domain of the input example x in the -ball of x. Then, temperature scaling and fuzzy scaling are utilized to eliminate the fuzzy domain for enhancing the transferability of the generated adversarial examples. Theoretical analysis and extensive experiments demonstrate that our method could significantly improve the transferability of gradient-based adversarial attacks on CIFAR10/100 and ImageNet and outperform the black-box query attack with the same few queries.

1. INTRODUCTION

Deep Neural Network (DNN) has penetrated many aspects of life, e.g. autonomous cars, face recognition and malware detection. However, the imperceptible perturbations fool the DNN to make a wrong decision, which is dangerous in the field of security and will cause significant economic losses. To evaluate and increase the robustness of DNN, the advanced adversarial attack methods need to be researched. In recent years, the white-box attacks make a great success and the blackbox attacks make great progress. However, because of the weak transferability (with the low attack strength) and the large number of queries, the black-box attacks can still be further improved. Recently, a number of transferable prior-based black-box query attacks have been proposed to reduce the number of queries. For example, Cheng et al. (2019) proposed a prior-guided random gradientfree (P-RGF) method, which takes the advantage of a transfer-based prior and the query information simultaneously. Yang et al. (2020) also proposed a simple baseline approach (SimBA++), which combines transferability-based and query-based black-box attacks, and utilized the query feedback to update the surrogate model in a novel learning scheme. However, the average query number of the most query attacks is larger than 100 in the evaluations on ImageNet. In this scenario, the performance of these query attacks may be significantly affected when the query number limit policy is applied in the DNN application. Therefore, to solve the above problems, we make the following contributions: • First, we propose the query prior-based attacks to enhance the transferability of adversarial examples with few queries under the constraint of low attack strength. Specifically, we find that: (i) The better the transferability of the transfer black-box attack, the smaller the gradient angle between the surrogate model and the victim model. (ii) The successful attacked adversarial examples prefer to be classified as the wrong categories with higher probability by the victim model. Based on the aforementioned findings, the weighted augmented cross-entropy (WACE) loss is proposed to decrease the gradient angle between the surrogate model and the victim model for enhancing the transferability of adversarial examples, which is proved in Appendices A.4 and A.5. The proposed query prior-based method enhances the transferability of the family of FGSMs by integrating the WACE loss and a few queries (this contribution is described in detail in Appendix C). • Second, when the query prior is not achieved, the fuzzy domain eliminating technique is used to enhance the transferability of adversarial examples. Specifically, we explore the effectiveness of the temperature scaling in eliminating the fuzzy domain and propose the fuzzy scaling to eliminate the fuzzy domain. By combining the temperature scaling and fuzzy scaling, fuzzy domain eliminating based cross-entropy (FECE) loss is proposed to enhance the transferability of the generated adversarial examples. In addition, the weighted augmented fuzzy domain eliminating based cross-entropy (WFCE) loss, which consists of the WACE and FECE loss, can further enhance the transferability of adversarial examples. • Third, theoretical analysis and extensive experiments demonstrate that: (i) On the premise of allowing query, the WACE loss is better than cross-entropy (CE) and RCE losses. (ii) The temperature scaling and fuzzy scaling can effectively eliminate a part of the fuzzy domain. (iii) Under the constraint of low attack strength, the query prior-based method and fuzzy domain eliminating technique can significantly improve the attack transferability of the family of fast gradient sign methods on CIFAR10/100 (Krizhevsky, 2009) and ImageNet (Russakovsky et al., 2015) .

2. PRELIMINARIES

The family of FGSMs and the RCE loss are briefly introduced, which is helpful to understand our methods in Section 3 and is regarded as the baselines in Section 4.

2.1. FAMILY OF FAST GRADIENT SIGN METHODS

The methods mentioned in this section are referred as the black-box transfer attacks with the objective of enhancing the transferability of adversarial examples. Fast gradient sign method (FGSM) (Goodfellow et al., 2015) is the first transfer attack, which generates the adversarial examples x adv by maximizing the loss function L(x adv , y o ; θ) with a onestep update: x adv = x + • sign (∇ x L (x, y o ; θ)) where is the attack strength, y o is the ground truth, θ is the model parameters, sign(•) is the sign function and ∇ x L (x, y o ; θ) is the gradient of the loss function w.r.t. x. Iterative FGSM (I-FGSM) (Kurakin et al., 2017) is the iterative version of FGSM by applying FGSM with a small step size: x 0 = x, x adv t+1 = Clip x x adv t + α • sign ∇ x L x adv t , y o ; θ where Clip x (•) function restricts the generated adversarial examples to be within the -ball of x.



Besides, many black-box transfer attacks have been proposed to enhance the transferability of the adversarial examples, e.g. fast gradient sign method (FGSM)(Goodfellow et al., 2015), iterative FGSM (I-FGSM)(Kurakin et al., 2017), momentum I-FGSM (MI-FGSM) (Dong et al., 2018), diverse input I-FGSM (DI-FGSM) (Xie et al., 2019), scale-invariant Nesterov I-FGSM (SI-NI-FGSM) (Lin et al., 2020) and variance-tuning MI-FGSM (VMI-FGSM) (Wang & He, 2021). Zhang et al. (2022a) also proposed the relative cross-entropy loss (RCE) to enhance the transferability by maximizing the logit's rank distance from the ground-truth class. However, these transfer attacks achieve weak transferability of adversarial examples under the constraint of low attack strength.

