BOOSTING ADVERSARIAL TRANSFERABILITY USING DYNAMIC CUES

Abstract

The transferability of adversarial perturbations between image models has been extensively studied. In this case, an attack is generated from a known surrogate e.g., the ImageNet trained model, and transferred to change the decision of an unknown (black-box) model trained on an image dataset. However, attacks generated from image models do not capture the dynamic nature of a moving object or a changing scene due to a lack of temporal cues within image models. This leads to reduced transferability of adversarial attacks from representation-enriched image models such as Supervised Vision Transformers (ViTs), Self-supervised ViTs (e.g., DINO), and Vision-language models (e.g., CLIP) to black-box video models. In this work, we induce dynamic cues within the image models without sacrificing their original performance on images. To this end, we optimize temporal prompts through frozen image models to capture motion dynamics. Our temporal prompts are the result of a learnable transformation that allows optimizing for temporal gradients during an adversarial attack to fool the motion dynamics. Specifically, we introduce spatial (image) and temporal (video) cues within the same source model through taskspecific prompts. Attacking such prompts maximizes the adversarial transferability from image-to-video and image-to-image models using the attacks designed for image models. As an example, an iterative attack launched from image model Deit-B with temporal prompts reduces generalization (top1 % accuracy) of a video model by 35% on Kinetics-400. Our approach also improves adversarial transferability to image models by 9% on ImageNet w.r.t the current state-of-the-art approach. Our attack results indicate that the attacker does not need specialized architectures, e.g., divided space-time attention, 3D convolutions, or multi-view convolution networks for different data modalities. Image models are effective surrogates to optimize an adversarial attack to fool black-box models in a changing environment over time. Code is available at https://bit.ly/3Xd9gRQ 

1. INTRODUCTION

Deep learning models are vulnerable to imperceptible changes to the input images. It has been shown that for a successful attack, an attacker no longer needs to know the attacked target model to compromise its decisions (Naseer et al., 2019; 2020; Nakka & Salzmann, 2021) . Adversarial perturbations suitably optimized from a known source model (a surrogate) can fool an unknown target model (Kurakin et al., 2016) . These attacks are known as black-box attacks since the attacker is restricted to access the deployed model or compute its adversarial gradient information. Adversarial attacks are continuously evolving, revealing new blind spots of deep neural networks. Adversarial transferability has been extensively studied in image-domain (Akhtar & Mian, 2018; Wang & He, 2021; Naseer et al., 2022b; Malik et al., 2022) . Existing works demonstrate how adversarial patterns can be generalized to models with different architectures (Zhou et al., 2018) and even different data domains (Naseer et al., 2019) . However, the adversarial transferability between different architecture families designed for varying data modalities, e.g., image models to video models, has not been actively explored. Since the adversarial machine learning topic has gained maximum attention in the image-domain, it is natural to question if image models can help transfer

