ENHANCING THE TRANSFERABILITY OF ADVERSAR-IAL EXAMPLES VIA A FEW QUERIES AND FUZZY DO-MAIN ELIMINATING

Abstract

Due to the vulnerability of deep neural networks, the black-box attack has drawn great attention from the community. Though transferable priors decrease the query number of the black-box query attacks in recent efforts, the average number of queries is still larger than 100, which is easily affected by the query number limit policy. In this work, we propose a novel method called query prior-based method to enhance the attack transferability of the family of fast gradient sign methods by using a few queries. Specifically, for the untargeted attack, we find that the successful attacked adversarial examples prefer to be classified as the wrong categories with higher probability by the victim model. Therefore, the weighted augmented cross-entropy loss is proposed to reduce the gradient angle between the surrogate model and the victim model for enhancing the transferability of the adversarial examples. In addition, the fuzzy domain eliminating technique is proposed to avoid the generated adversarial examples getting stuck in the local optimum. Specifically, we define the fuzzy domain of the input example x in the -ball of x. Then, temperature scaling and fuzzy scaling are utilized to eliminate the fuzzy domain for enhancing the transferability of the generated adversarial examples. Theoretical analysis and extensive experiments demonstrate that our method could significantly improve the transferability of gradient-based adversarial attacks on CIFAR10/100 and ImageNet and outperform the black-box query attack with the same few queries.

1. INTRODUCTION

Deep Neural Network (DNN) has penetrated many aspects of life, e.g. autonomous cars, face recognition and malware detection. However, the imperceptible perturbations fool the DNN to make a wrong decision, which is dangerous in the field of security and will cause significant economic losses. To evaluate and increase the robustness of DNN, the advanced adversarial attack methods need to be researched. In recent years, the white-box attacks make a great success and the blackbox attacks make great progress. However, because of the weak transferability (with the low attack strength) and the large number of queries, the black-box attacks can still be further improved. Recently, a number of transferable prior-based black-box query attacks have been proposed to reduce the number of queries. For example, Cheng et al. ( 2019) proposed a prior-guided random gradientfree (P-RGF) method, which takes the advantage of a transfer-based prior and the query information simultaneously. Yang et al. ( 2020) also proposed a simple baseline approach (SimBA++), which combines transferability-based and query-based black-box attacks, and utilized the query feedback to update the surrogate model in a novel learning scheme. However, the average query number of the most query attacks is larger than 100 in the evaluations on ImageNet. In this scenario, the performance of these query attacks may be significantly affected when the query number limit policy is applied in the DNN application. 



Besides, many black-box transfer attacks have been proposed to enhance the transferability of the adversarial examples, e.g. fast gradient sign method (FGSM) (Goodfellow et al., 2015), iterative FGSM (I-FGSM) (Kurakin et al., 2017), momentum I-FGSM (MI-FGSM) (Dong et al., 2018), diverse input I-FGSM (DI-FGSM) (Xie et al., 2019), scale-invariant Nesterov I-FGSM (SI-NI-FGSM)

