THE ULTIMATE COMBO: BOOSTING ADVERSARIAL EXAMPLE TRANSFERABILITY BY COMPOSING DATA AUGMENTATIONS

Abstract

Transferring adversarial examples (AEs) from surrogate machine-learning (ML) models to evade target models is a common method for evaluating adversarial robustness in black-box settings. Researchers have invested substantial efforts to enhance transferability. Chiefly, attacks leveraging data augmentation have been found to help AEs generalize better from surrogates to targets. Still, prior work has explored a limited set of augmentation techniques and their composition. To fill the gap, we conducted a systematic study of how data augmentation affects transferability. Particularly, we explored ten augmentation techniques of six categories originally proposed to help ML models generalize to unseen benign samples, and assessed how they influence transferability, both when applied individually and when composed. Our extensive experiments with the ImageNet and CIFAR-10 dataset showed that simple color-space augmentations (e.g., color to greyscale) outperform the state of the art when combined with standard augmentations, such as translation and scaling. Additionally, except for two methods that may harm transferability, we found that composing augmentation methods impacts transferability monotonically (i.e., more methods composed → ≥transferability)-the best composition we found significantly outperformed the state of the art (e.g., 95.6% vs. 92.0% average transferability on ImageNet from normally trained surrogates to other normally trained models). We provide intuitive, empirically supported explanations for why certain augmentations fail to improve transferability.

1. INTRODUCTION

Adversarial examples (AEs)-variants of benign inputs minimally perturbed to induce misclassification at test time-have emerged as a profound challenge to machine learning (ML) (Biggio et al., 2013; Szegedy et al., 2014) , calling its use in security-and safety-critical systems into question (e.g., Eykholt et al. (2018) ). Many attacks have been proposed to generate AEs in white-box settings, where adversaries are familiar with all the particularities of the attacked model (Papernot et al., 2016) . By contrast, black-box attacks enable evaluating the vulnerability of ML in realistic settings, without access to the model (Papernot et al., 2016) . Notably, attacks using data augmentation, such as translations (Dong et al., 2019) and scaling of pixel values (Lin et al., 2020) , as a means to improve the generalizability of AEs across models have accomplished state-of-the-art transferability rates. Still, previous transferability-based attacks have studied only four augmentation methods (see Section 3.1), out of many proposed in the dataaugmentation literature (Shorten & Khoshgoftaar, 2019) , primarily for reducing model overfitting. Hence, the extent to which different data-augmentation types boost transferability, either individually or when combined, remains largely unknown.



Attacks exploiting the transferability-property of AEs(Szegedy et al., 2014)  have received special attention. Namely, as AEs produced against one model are often misclassified by others, transferability-based attacks produce AEs against surrogate (a.k.a. substitute) white-box models to mislead black-box ones. To measure the risk of AEs in black-box settings accurately, researchers have proposed varied methods to enhance transferability (e.g.,Lin et al. (2020); Liu et al. (2017)).

