ILA-DA: IMPROVING TRANSFERABILITY OF INTER-MEDIATE LEVEL ATTACK WITH DATA AUGMENTATION

Abstract

Adversarial attack aims to generate deceptive inputs to fool a machine learning model. In deep learning, an adversarial input created for a specific neural network can also trick other neural networks. This intriguing property is known as black-box transferability of adversarial examples. To improve black-box transferability, a previously proposed method called Intermediate Level Attack (ILA) fine-tunes an adversarial example by maximizing its perturbation on an intermediate layer of the source model. Meanwhile, it has been shown that simple image transformations can also enhance attack transferability. Based on these two observations, we propose ILA-DA, which employs three novel augmentation techniques to enhance ILA. Specifically, we propose (1) an automated way to apply effective image transformations, (2) an efficient reverse adversarial update technique, and (3) an attack interpolation method to create more transferable adversarial examples. Shown by extensive experiments, ILA-DA greatly outperforms ILA and other state-of-the-art attacks by a large margin. On ImageNet, we attain an average attack success rate of 84.5%, which is 19.5% better than ILA and 4.7% better than the previous state-of-the-art across nine undefended models. For defended models, ILA-DA also leads existing attacks and provides further gains when incorporated into more advanced attack methods. The code is available at

1. INTRODUCTION

Recent studies (Szegedy et al., 2013; Goodfellow et al., 2015) showed that deep neural network (DNN) models are vulnerable to adversarial attacks, where perturbations are added to the clean data to fool the models in making erroneous classification. Such adversarial perturbations are usually crafted to be almost imperceptible by humans, yet causing apparent fluctuations in the model output. The effectiveness of adversarial attacks on deep learning models raises concerns in multiple fields, especially for security-sensitive applications. Besides being effective to the victim model, adversarial attacks are found to be capable of transferring across models (Papernot et al., 2016) . One explanation for this phenomenon is the overlapping decision boundaries shared by different models (Liu et al., 2017; Dong et al., 2018) . Such behavior not only aggravates concerns on the reliability and robustness of deep learning models, but also enables various black-box attacks which leverage the transferring behavior, such as directly generating attacks from a source (or surrogate) model (Zhou et al., 2018) or acting as a gradient prior to reduce the number of model queries (Guo et al., 2019) . Intermediate Level Attack (ILA) is a method proposed by Huang et al. (2019) to fine-tune an existing adversarial attack as a reference, thereby raising its attack transferability across different models. Formulated to maximize the intermediate feature map discrepancy represented in the models, ILA achieves remarkable black-box transferability, outperforming various attacks that are directly generated (Zhou et al., 2018; Xie et al., 2019a) . On the other hand, many of the transfer-based attacks empirically show that simple image transformations, including padding (Xie et al., 2019a) , image translation (Dong et al., 2019), and scaling (Lin et al., 2020) , are effective in strengthening the trans-

availability

https://github.com/argenycw/ILA

