REVISITING ADAPTERS WITH ADVERSARIAL TRAINING

Abstract

While adversarial training is generally used as a defense mechanism, recent works show that it can also act as a regularizer. By co-training a deep network on clean and adversarial inputs, it is possible to improve classification accuracy on the clean, non-adversarial inputs. We demonstrate that, contrary to previous findings, it is not necessary to separate batch statistics when co-training on clean and adversarial inputs, and that it is sufficient to use adapters with few domain-specific parameters for each type of input. We establish that using the classification token of a Vision Transformer (VIT) as an adapter is enough to match the classification performance of dual normalization layers, while using significantly less additional parameters. First, we improve upon the top-1 accuracy of a non-adversarially trained VIT-B16 model by +1.12% on IMAGENET (reaching 83.76% top-1 accuracy). Second, and more importantly, we show that training with adapters enables model soups through linear combinations of the clean and adversarial tokens. These model soups, which we call adversarial model soups, allow us to trade-off between clean and robust accuracy without sacrificing efficiency. Finally, we show that we can easily adapt the resulting models in the face of distribution shifts. Our VIT-B16 obtains top-1 accuracies on IMAGENET variants that are on average +4.00% better than those obtained with Masked Autoencoders.

1. INTRODUCTION

Deep networks are inherently susceptible to adversarial perturbations. Adversarial perturbations fool deep networks by adding an imperceptible amount of noise which leads to an incorrect prediction with high confidence (Carlini & Wagner, 2017; Goodfellow et al., 2015; Kurakin et al., 2016b; Szegedy et al., 2014) . There has been a lot of work on building defenses against adversarial perturbations (Papernot et al., 2016; Kannan et al., 2018) ; the most commonly used defense is adversarial training as proposed by Madry et al. (2018) and its variants (Zhang et al., 2019; Pang et al., 2020; Huang et al., 2020; Rice et al., 2020; Gowal et al., 2020) , which use adversarially perturbed images at each training step as training data. Earlier studies (Kurakin et al., 2016a; Xie et al., 2019b) showed that using adversarial samples during training leads to performance degradation on clean images. However, AdvProp (Xie et al., 2019a) challenged this observation by showing that adversarial training can act as a regularizer, and therefore improve nominal accuracy, when using dual batch normalization (BatchNorm) layers (Ioffe & Szegedy, 2015) to disentangle the clean and adversarial distributions. We draw attention to the broad similarity between the AdvProp approach and the adapters literature (Rebuffi et al., 2017; Houlsby et al., 2019) where a single backbone network is trained on multiple domains by means of adapters, where a few parameters specific to each domain are trained separately while the rest of the parameters are shared. In light of this comparison, we further develop the line of work introduced by AdvProp and analyze it from an adapter perspective. In particular, we explore various adapters and aim to obtain the best classification performance with minimal additional parameters. Our contributions are as follows: • We show that, in order to benefit from co-training on clean and adversarial samples, it is not necessary to separate the batch statistics of clean and adversarial images in BatchNorm layers. We demonstrate empirically that it is enough to use domain specific trainable parameters to achieve similar results. • Inspired by the adapters literature, we evaluate various adapters. We show that training separate classification tokens of a VIT for the clean and adversarial domains is enough to match the classification performance of dual normalization layers with 49× fewer domain specific parameters. This classification token acts as a conditioning token which can modify the behaviour of the network to be either in clean or robust mode (Figure 1 ). 2018) propose an adapter layer whose weights are generated by a conditioning network. Besides computer vision, adapters are also used in natural language processing for efficient fine-tuning (Houlsby et al., 2019; Pfeiffer et al., 2020; Wang et al., 2020) and multi-task learning (Stickland & Murray, 2019) . Merging multiple models. While ensembles are a popular and successful way to combine multiple independently trained classifiers to improve on individual performance (Ovadia et al., 2019; Gontijo-Lopes et al., 2021) , they increase the inference cost as they require a forward pass for each sub-network



See https://github.com/amazon-research/normalizer-free-robust-training/issues/2.



• UnlikeXie et al. (2019a) and Herrmann et al. (2022), we also aim at preserving the robust performance of the network against adversarial attacks. We show that our conditional token can obtain SOTA nominal accuracy in the clean mode while at the same time achieving competitive ℓ ∞ -robustness in the robust mode. As a by-product of our study, we show that adversarial training of VIT-B16 on IMAGENET leads to state-of-the-art robustness against ℓ ∞ -norm bounded perturbations of size 4/255.• We empirically demonstrate that training with adapters enables model soups(Wortsman et al.,  2022). This allow us to introduce adversarial model soups, models that trade-off between clean and robust accuracy through linear interpolation of the clean and adversarial adapters. To the best of our knowledge, our work is the first to study adversarial model soups. We also show that adversarial model soups perform better on IMAGENET variants than the state-of-the-art with masked auto-encoding(He et al., 2022). While the main drawback of adversarial training is the degradation of performance of robust models on clean images(Tsipras et al., 2018),Xie et al.  (2019a)  showed that adversarial images can be leveraged as a strong regularizer to improve the clean accuracy of classifiers on IMAGENET. In particular, they propose AdvProp, which introduces separate BatchNorm layers specific to clean or adversarial inputs, with the remaining layers being shared. This approach and the role of normalization layers when training with both clean and adversarial points has been further studied by(Xie & Yuille, 2019; Walter et al., 2022). Recently, Wang et al. (2022)   suggest removing BatchNorm layers from the standard RESNET architecture(He et al., 2016)  to retain high clean accuracy with adversarial training, but this negatively affects the robustness against stronger attacks. 1 Finally,(Kireev et al., 2021; Herrmann et al., 2022)  showed that carefully tuning the threat model in adversarial training might improve the performance on clean images and in the presence of distribution shifts, such as common corruptions(Hendrycks & Dietterich, 2018).

