THE ULTIMATE COMBO: BOOSTING ADVERSARIAL EXAMPLE TRANSFERABILITY BY COMPOSING DATA AUGMENTATIONS

Abstract

Transferring adversarial examples (AEs) from surrogate machine-learning (ML) models to evade target models is a common method for evaluating adversarial robustness in black-box settings. Researchers have invested substantial efforts to enhance transferability. Chiefly, attacks leveraging data augmentation have been found to help AEs generalize better from surrogates to targets. Still, prior work has explored a limited set of augmentation techniques and their composition. To fill the gap, we conducted a systematic study of how data augmentation affects transferability. Particularly, we explored ten augmentation techniques of six categories originally proposed to help ML models generalize to unseen benign samples, and assessed how they influence transferability, both when applied individually and when composed. Our extensive experiments with the ImageNet and CIFAR-10 dataset showed that simple color-space augmentations (e.g., color to greyscale) outperform the state of the art when combined with standard augmentations, such as translation and scaling. Additionally, except for two methods that may harm transferability, we found that composing augmentation methods impacts transferability monotonically (i.e., more methods composed → ≥transferability)-the best composition we found significantly outperformed the state of the art (e.g., 95.6% vs. 92.0% average transferability on ImageNet from normally trained surrogates to other normally trained models). We provide intuitive, empirically supported explanations for why certain augmentations fail to improve transferability.

1. INTRODUCTION

Adversarial examples (AEs)-variants of benign inputs minimally perturbed to induce misclassification at test time-have emerged as a profound challenge to machine learning (ML) (Biggio et al., 2013; Szegedy et al., 2014) , calling its use in security-and safety-critical systems into question (e.g., Eykholt et al. (2018) ). Many attacks have been proposed to generate AEs in white-box settings, where adversaries are familiar with all the particularities of the attacked model (Papernot et al., 2016) . By contrast, black-box attacks enable evaluating the vulnerability of ML in realistic settings, without access to the model (Papernot et al., 2016) . Attacks exploiting the transferability-property of AEs (Szegedy et al., 2014) have received special attention. Namely, as AEs produced against one model are often misclassified by others, transferability-based attacks produce AEs against surrogate (a.k.a. substitute) white-box models to mislead black-box ones. To measure the risk of AEs in black-box settings accurately, researchers have proposed varied methods to enhance transferability (e.g., Lin et al. (2020); Liu et al. (2017) ). Notably, attacks using data augmentation, such as translations (Dong et al., 2019) and scaling of pixel values (Lin et al., 2020) , as a means to improve the generalizability of AEs across models have accomplished state-of-the-art transferability rates. Still, previous transferability-based attacks have studied only four augmentation methods (see Section 3.1), out of many proposed in the dataaugmentation literature (Shorten & Khoshgoftaar, 2019) , primarily for reducing model overfitting. Hence, the extent to which different data-augmentation types boost transferability, either individually or when combined, remains largely unknown. To fill the gap, we conducted a systematic study of how augmentation methods influence transferability. Specifically, alongside techniques considered in previous work, we studied how ten augmentation techniques pertaining to six categories impact transferability when applied individually or composed (Section 3). Integrating augmentation methods into attacks via a flexible framework we propose (Algorithm 1), we conducted extensive experiments using an ImageNet-compatible dataset, CIFAR-10 (Krizhevsky, 2009), and 16 models, and measured transferability in diverse settings, including with and without defenses (Sections 4 and 5). Our results offer several interesting insights: • Simple color-space augmentations outperform state-of-the-art transferability-based attacks when composed with standard augmentations (Section 5.1). • Transferability has a mostly monotonic relationship with data-augmentation techniques. Except for two augmentation methods that may harm transferability, composing additional augmentation methods either improves of preserves transferability (Section 5.2). • Out of 2 7 compositions explored, the best composition we found, ULTIMATECOMBO, outperforms state-of-the-art attacks by a large margin (Section 5.3). • We show empirical support to conjectures we raise concerning when data-augmentation techniques may be counterproductive to transferability (Section 5.4). 2017)). These typically leverage first-or second-order optimizations to generate AEs models would misclassify. For example, given an input x of class y, model weights θ, and a loss function J, the Fast Gradient Sign method (FGSM) of Goodfellow et al. (2015) , crafts an AE x using the loss gradients ∇ x J(x, y, θ):

2. BACKGROUND AND RELATED WORK

x = x + * sign(∇ x J(x, y, θ)) where sign(•) maps real numbers to -1, 0, or 1, depending on their sign. Following FGSM, researchers proposed numerous advanced attacks. Notably, iterative FGSM (I-FGSM) of Kurakin et al. (2017b) performs multiple gradient-ascent steps, updating x iteratively to evade models: xt+1 = Proj x xt + α • sign ∇ x J (x t , y, θ) where Proj x (•) projects the perturbation into ∞ -norm -ball centered at x, α is the step size, and x0 = x. The attacks we study in this work are based on I-FGSM. In practice, adversaries often lack white-box access to victim models. Attempts to explain the transferability phenomenon attribute it to gradient norm of the target model (i.e., its susceptibility to attacks), the smoothness of classification boundaries, and, primarily, the alignment of gradient directions between the surrogate and target models (Demontis et al., 2019; Yang et al., 2021) . Said differently, for AEs to transfer, the gradient directions of surrogates need to be similar to those of target models (i.e., attain high cosine similarity). Enhancing transferability is an active research area. Some methods integrate momentum into attacks such as I-FGSM to avoid surrogate-specific optima and saddle points that may hinder transferability (e.g., Dong et al. (2018) ; Wang & He (2021)). Others employ specialized losses, such as reducing the variance of intermediate activations (Huang et al., 2019) or the mean loss of model ensembles (Liu et al., 2017) , to enhance transferability. Lastly, a prominent family of attacks leverages data augmentation to enhance AEs' generalizability between models. For instance, Dong et al. ( 2019) boosted transferability by integrating random translations into I-FGSM. Evasion attacks incorporating data augmentation attain state-of-the-art transferability rates (Lin et al., 2020; Wang et al., 



Hence, researchers studied black-box attacks in which adversaries may only query models. Certain attack types, such as scoreand boundary-based attacks perform multiple queries, often around several thousands, to produce AEs (e.g., Brendel et al. (2018); Ilyas et al. (2019)). By contrast, attacks leveraging transferability (e.g., Goodfellow et al. (2015); Szegedy et al. (2014)) avoid querying victim models, and use surrogate white-box models to create AEs that are likely misclassified by other black-box ones.

