REVISITING ADAPTERS WITH ADVERSARIAL TRAINING

Abstract

While adversarial training is generally used as a defense mechanism, recent works show that it can also act as a regularizer. By co-training a deep network on clean and adversarial inputs, it is possible to improve classification accuracy on the clean, non-adversarial inputs. We demonstrate that, contrary to previous findings, it is not necessary to separate batch statistics when co-training on clean and adversarial inputs, and that it is sufficient to use adapters with few domain-specific parameters for each type of input. We establish that using the classification token of a Vision Transformer (VIT) as an adapter is enough to match the classification performance of dual normalization layers, while using significantly less additional parameters. First, we improve upon the top-1 accuracy of a non-adversarially trained VIT-B16 model by +1.12% on IMAGENET (reaching 83.76% top-1 accuracy). Second, and more importantly, we show that training with adapters enables model soups through linear combinations of the clean and adversarial tokens. These model soups, which we call adversarial model soups, allow us to trade-off between clean and robust accuracy without sacrificing efficiency. Finally, we show that we can easily adapt the resulting models in the face of distribution shifts. Our VIT-B16 obtains top-1 accuracies on IMAGENET variants that are on average +4.00% better than those obtained with Masked Autoencoders.

1. INTRODUCTION

Deep networks are inherently susceptible to adversarial perturbations. Adversarial perturbations fool deep networks by adding an imperceptible amount of noise which leads to an incorrect prediction with high confidence (Carlini & Wagner, 2017; Goodfellow et al., 2015; Kurakin et al., 2016b; Szegedy et al., 2014) . There has been a lot of work on building defenses against adversarial perturbations (Papernot et al., 2016; Kannan et al., 2018) ; the most commonly used defense is adversarial training as proposed by Madry et al. (2018) and its variants (Zhang et al., 2019; Pang et al., 2020; Huang et al., 2020; Rice et al., 2020; Gowal et al., 2020) , which use adversarially perturbed images at each training step as training data. Earlier studies (Kurakin et al., 2016a; Xie et al., 2019b) showed that using adversarial samples during training leads to performance degradation on clean images. However, AdvProp (Xie et al., 2019a) challenged this observation by showing that adversarial training can act as a regularizer, and therefore improve nominal accuracy, when using dual batch normalization (BatchNorm) layers (Ioffe & Szegedy, 2015) to disentangle the clean and adversarial distributions. We draw attention to the broad similarity between the AdvProp approach and the adapters literature (Rebuffi et al., 2017; Houlsby et al., 2019) where a single backbone network is trained on multiple domains by means of adapters, where a few parameters specific to each domain are trained separately while the rest of the parameters are shared. In light of this comparison, we further develop the line of work introduced by AdvProp and analyze it from an adapter perspective. In particular, we explore various adapters and aim to obtain the best classification performance with minimal additional parameters. Our contributions are as follows: • We show that, in order to benefit from co-training on clean and adversarial samples, it is not necessary to separate the batch statistics of clean and adversarial images in BatchNorm layers. We demonstrate empirically that it is enough to use domain specific trainable parameters to achieve similar results. * Work done during an internship at DeepMind 1

