BAG OF TRICKS FOR ADVERSARIAL TRAINING

Abstract

Adversarial training (AT) is one of the most effective strategies for promoting model robustness. However, recent benchmarks show that most of the proposed improvements on AT are less effective than simply early stopping the training procedure. This counter-intuitive fact motivates us to investigate the implementation details of tens of AT methods. Surprisingly, we find that the basic settings (e.g., weight decay, training schedule, etc.) used in these methods are highly inconsistent. In this work, we provide comprehensive evaluations on CIFAR-10, focusing on the effects of mostly overlooked training tricks and hyperparameters for adversarially trained models. Our empirical observations suggest that adversarial robustness is much more sensitive to some basic training settings than we thought. For example, a slightly different value of weight decay can reduce the model robust accuracy by more than 7%, which is probable to override the potential promotion induced by the proposed methods. We conclude a baseline training setting and re-implement previous defenses to achieve new state-of-the-art results 1 . These facts also appeal to more concerns on the overlooked confounders when benchmarking defenses.

1. INTRODUCTION

Adversarial training (AT) has been one of the most effective defense strategies against adversarial attacks (Biggio et al., 2013; Szegedy et al., 2014; Goodfellow et al., 2015) . Based on the primary AT frameworks like PGD-AT (Madry et al., 2018) , many improvements have been proposed from different perspectives, and demonstrate promising results (detailed in Sec. 2). However, the recent benchmarks (Croce & Hein, 2020b; Chen & Gu, 2020) find that simply early stopping the training procedure of PGD-AT (Rice et al., 2020) can attain the gains from almost all the previously proposed improvements, including the state-of-the-art TRADES (Zhang et al., 2019b) . This fact is somewhat striking since TRADES also executes early stopping (one epoch after decaying the learning rate) in their code implementation. Besides, the reported robustness of PGD-AT in Rice et al. ( 2020) is much higher than in Madry et al. (2018) , even without early-stopping. This paradox motivates us to check the implementation details of these seminal works. We find that TRADES uses weight decay of 2 × 10 -4 , Gaussian PGD initialization as δ 0 ∼ N (0, αI), and eval mode of batch normalization (BN) when crafting adversarial examples, while Rice et al. ( 2020) use weight decay of 5 × 10 -4 , uniform PGD initialization as δ 0 ∼ U(-, ), and train mode of BN to generate adversarial examples. In our experiments on CIFAR-10 (e.g., Table 8 ), the two slightly different settings can differ the robust accuracy by ∼ 5%, which is significant according to the reported benchmarks. To have a comprehensive study, we further investigate the implementation details of tens of papers working on the AT methods, some of which are summarized in Table 1 . We find that even using the same model architectures, the basic hyperparameter settings (e.g., weight decay, learning rate schedule, etc.) used in these papers are highly inconsistent and customized, which could affect the model performance and may override the gains from the methods themselves. Under this situation, if we directly benchmark these methods using their released code or checkpoints, some actually effective improvements would be under-estimated due to the improper hyperparameter settings. Our contributions. We evaluate the effects of a wide range of basic training tricks (e.g., warmup, early stopping, weight decay, batch size, BN mode, etc.) on the adversarially trained models. Our empirical results suggest that improper training settings can largely degenerate the model performance,

annex

Published as a conference paper at ICLR 2021 Table 1 : Hyperparameter settings and tricks used to implement different AT methods on CIFAR-10. We convert the training steps into epochs, and provide code links for reference in Table 11 . Compared to the model architectures, the listed settings are easy to be neglected and paid less attention to unify. Method l.r. while this degeneration may be mistakenly ascribed to the methods themselves. We provide a baseline recipe for PGD-AT on CIFAR-10 as an example, and demonstrate the generality of the recipe on training other frameworks like TRADES. As seen in Table 16 , the retrained TRADES achieve new state-of-the-art performance on the AutoAttack benchmark (Croce & Hein, 2020b).Although our empirical conclusions may not generalize to other datasets or tasks, we reveal the facts that adversarially trained models could be sensitive to certain training settings, which are usually neglected in previous work. These results also encourage the community to re-implement the previously proposed defenses with fine-tuned training settings to better explore their potentials.

2. RELATED WORK

In this section, we introduce related work on the adversarial defenses and recent benchmarks. We detail on the adversarial attacks in Appendix A.1.

2.1. ADVERSARIAL DEFENSES

To alleviate the adversarial vulnerability of deep learning models, many defense strategies have been proposed, but most of them can eventually be evaded by adaptive attacks (Carlini & Wagner, 2017b; Athalye et al., 2018) . Other more theoretically guaranteed routines include training provably robust networks (Dvijotham et al., 2018a; b; Hein & Andriushchenko, 2017; Wong & Kolter, 2018) and obtaining certified models via randomized smoothing (Cohen et al., 2019) . While these methods are promising, they currently do not match the state-of-the-art robustness under empirical evaluations.The idea of adversarial training (AT) stems from the seminal work of Goodfellow et al. (2015) , while other AT frameworks like PGD-AT (Madry et al., 2018) and TRADES (Zhang et al., 2019b) occupied the winner solutions in the adversarial competitions (Kurakin et al., 2018; Brendel et al., 2020) .Based on these primary AT frameworks, many improvements have been proposed via encoding the mechanisms inspired from other domains, including ensemble learning (Tramèr et al., 2018; Pang et al., 2019 ), metric learning (Mao et al., 2019; Li et al., 2019; Pang et al., 2020c) , generative modeling (Jiang et al., 2018; Pang et al., 2018b; Wang & Yu, 2019; Deng et al., 2020) , semisupervised learning (Carmon et al., 2019; Alayrac et al., 2019; Zhai et al., 2019) , and self-supervised

