IMPROVING THE TRANSFERABILITY OF ADVERSARIAL ATTACKS THROUGH EXPERIENCED PRECISE NESTEROV MOMENTUM Anonymous

Abstract

Deep Neural Networks are vulnerable to adversarial attacks, which makes adversarial attacks serve as a method to evaluate the robustness of DNNs. However, adversarial attacks have high white-box attack success rates but poor transferability, making black-box attacks impracticable in the real world. Momentumbased attacks were proposed to accelerate optimization to improve transferability. Nevertheless, conventional momentum-based attacks accelerate optimization inefficiently during early iterations since the initial value of momentum is zero, which leads to unsatisfactory transferability. Therefore, we propose Experienced Momentum (EM), which is the pre-trained momentum. Initializing the momentum to EM can help accelerate optimization during the early iterations. Moreover, the pre-update of conventional Nesterov momentum based attacks is rough, prompting us to propose Precise Nesterov momentum (PN). PN refines the preupdate by considering the gradient of the current data point. Finally, we integrate EM with PN as Experienced Precise Nesterov momentum (EPN) to further improve transferability. Extensive experiments against normally trained and defense models demonstrate that our EPN is more effective than conventional momentum in the improvement of transferability. Specifically, the attack success rates of our EPN-based attacks are ∼11.9% and ∼13.1% higher than conventional momentum-based attacks on average against normally trained and defense models, respectively.

1. INTRODUCTION

Deep neural networks (DNNs) (Krizhevsky et al., 2012; Szegedy et al., 2015; He et al., 2016; Ioffe & Szegedy, 2015) have been widely applied in computer vision, e.g., autonomous driving (Franchi et al., 2022; Hao et al., 2019; Cococcioni et al., 2018 ), facial recognition (Chrysos et al., 2020; Ghenescu et al., 2018) , and medical image analysis (Akselrod-Ballin et al., 2016; Ding et al., 2017; Liu et al., 2019) . However, Szegedy et al. (2013) found that applying certain imperceptible perturbations to images can make DNNs misclassify, and they refer to such perturbed images as adversarial examples (AEs). Adversarial examples pose a huge threat to the security of DNNs, which attaches extensive attention from researchers. Adversarial attacks can be categorized into white-box attacks and black-box attacks. Typically, iterative gradient-based (Kurakin et al., 2016; Madry et al., 2017) and optimization-based attacks (Carlini & Wagner, 2017) have high white-box but low black-box attack success rates, which means that such two attacks are impracticable in the real world. Transferability, which means adversarial examples crafted on the source model remain effective on other models, makes black-box attacks feasible. Furthermore, iterative gradient-based attacks have the advantages of low computational cost and fast generation speed, thus improving the transferability of iterative gradient-based attacks has become a hotspot in the field of adversarial attacks. Many methods have been proposed to improve the transferability of iterative gradient-based attacks. These methods can be classified into three branches: improving optimization algorithms, input transformations, and disrupting feature space. For example, MI-FGSM (Dong et al., 2018) , NI-FGSM (Lin et al., 2019) , and VM(N)I-FGSM (Wang & He, 2021) improve gradient ascent (or

