BAG OF TRICKS FOR ADVERSARIAL TRAINING

Abstract

Adversarial training (AT) is one of the most effective strategies for promoting model robustness. However, recent benchmarks show that most of the proposed improvements on AT are less effective than simply early stopping the training procedure. This counter-intuitive fact motivates us to investigate the implementation details of tens of AT methods. Surprisingly, we find that the basic settings (e.g., weight decay, training schedule, etc.) used in these methods are highly inconsistent. In this work, we provide comprehensive evaluations on CIFAR-10, focusing on the effects of mostly overlooked training tricks and hyperparameters for adversarially trained models. Our empirical observations suggest that adversarial robustness is much more sensitive to some basic training settings than we thought. For example, a slightly different value of weight decay can reduce the model robust accuracy by more than 7%, which is probable to override the potential promotion induced by the proposed methods. We conclude a baseline training setting and re-implement previous defenses to achieve new state-of-the-art results 1 . These facts also appeal to more concerns on the overlooked confounders when benchmarking defenses.

1. INTRODUCTION

Adversarial training (AT) has been one of the most effective defense strategies against adversarial attacks (Biggio et al., 2013; Szegedy et al., 2014; Goodfellow et al., 2015) . Based on the primary AT frameworks like PGD-AT (Madry et al., 2018) , many improvements have been proposed from different perspectives, and demonstrate promising results (detailed in Sec. 2). However, the recent benchmarks (Croce & Hein, 2020b; Chen & Gu, 2020) find that simply early stopping the training procedure of PGD-AT (Rice et al., 2020) can attain the gains from almost all the previously proposed improvements, including the state-of-the-art TRADES (Zhang et al., 2019b) . This fact is somewhat striking since TRADES also executes early stopping (one epoch after decaying the learning rate) in their code implementation. Besides, the reported robustness of PGD-AT in Rice et al. ( 2020) is much higher than in Madry et al. (2018) , even without early-stopping. This paradox motivates us to check the implementation details of these seminal works. We find that TRADES uses weight decay of 2 × 10 -4 , Gaussian PGD initialization as δ 0 ∼ N (0, αI), and eval mode of batch normalization (BN) when crafting adversarial examples, while Rice et al. ( 2020) use weight decay of 5 × 10 -4 , uniform PGD initialization as δ 0 ∼ U(-, ), and train mode of BN to generate adversarial examples. In our experiments on CIFAR-10 (e.g., Table 8 ), the two slightly different settings can differ the robust accuracy by ∼ 5%, which is significant according to the reported benchmarks. To have a comprehensive study, we further investigate the implementation details of tens of papers working on the AT methods, some of which are summarized in Table 1 . We find that even using the same model architectures, the basic hyperparameter settings (e.g., weight decay, learning rate schedule, etc.) used in these papers are highly inconsistent and customized, which could affect the model performance and may override the gains from the methods themselves. Under this situation, if we directly benchmark these methods using their released code or checkpoints, some actually effective improvements would be under-estimated due to the improper hyperparameter settings. Our contributions. We evaluate the effects of a wide range of basic training tricks (e.g., warmup, early stopping, weight decay, batch size, BN mode, etc.) on the adversarially trained models. Our empirical results suggest that improper training settings can largely degenerate the model performance,

