TOWARDS UNDERSTANDING FAST ADVERSARIAL TRAINING

Abstract

Current neural-network-based classifiers are susceptible to adversarial examples. The most empirically successful approach to defending against such adversarial examples is adversarial training, which incorporates a strong self-attack during training to enhance its robustness. This approach, however, is computationally expensive and hence is hard to scale up. A recent work, called fast adversarial training, has shown that it is possible to markedly reduce computation time without sacrificing significant performance. This approach incorporates simple self-attacks, yet it can only run for a limited number of training epochs, resulting in sub-optimal performance. In this paper, we conduct experiments to understand the behavior of fast adversarial training and show the key to its success is the ability to recover from overfitting to weak attacks. We then extend our findings to improve fast adversarial training, demonstrating superior robust accuracy to strong adversarial training, with much-reduced training time.

1. INTRODUCTION

Adversarial examples are carefully crafted versions of the original data that successfully mislead a classifier (Szegedy et al., 2013) , while realizing minimal change in appearance when viewed by most humans. Although deep neural networks have achieved impressive success on a variety of challenging machine learning tasks, the existence of such adversarial examples has hindered the application of deep neural networks and drawn great attention in the deep-learning community. Empirically, the most successful defense thus far is based on Projected Gradient Descent (PGD) adversarial training (Goodfellow et al., 2014; Madry et al., 2017) , augmenting the data of interest with strong adversarial examples, to help improve model robustness. Although effective, this approach is not efficient and may take multiple days to train a moderately large model. On the other hand, one of the early versions of adversarial training, based on a weaker Fast Gradient Signed Method (FGSM) attack, is much more efficient but suffers from "catastrophic overfitting," a phenomenon where the robust accuracy with respect to strong attacks suddenly drops to almost zero during training (Tramèr et al., 2017; Wong et al., 2019) , and fails to provide robustness against strong attacks. Fast adversarial training (Wong et al., 2019) is a simple modification to FGSM, that mitigates this issue. By initializing FGSM attacks with large randomized perturbations, it can efficiently obtain robust models against strong attacks. Although the modification is simple, the underlying reason for its success remains unclear. Moreover, fast adversarial training is only compatible with a cyclic learning rate schedule (Smith & Topin, 2019) , with a limited number of training epochs, resulting in sub-optimal robust accuracy compared to PGD adversarial training (Rice et al., 2020) . When fast adversarial training runs for a large number of epochs, it still suffers from catastrophic overfitting, similar to vanilla FGSM adversarial training. Therefore, it remains an unfinished task to obtain the effectiveness of PGD adversarial training and the efficiency of FGSM adversarial training simultaneously. In this paper, we conduct experiments to show that the key to the success of fast adversarial training is not avoiding catastrophic overfitting, but being able to retain the robustness of the model when catastrophic overfitting occurs. We then utilize this understanding to propose a simple fix to fast adversarial training, making possible the training of it for a large number of epochs, without sacrificing efficiency. We demonstrate that, as a result, we yield improved performance. Under review as a conference at ICLR 2021 We also revisit a previously developed technique, FGSM adversarial training as a warmup (Wang et al., 2019) , and combine it with our training strategy to further improve performance with small additional computational overhead. The resulting method outperforms the state-of-the-art approach, PGD adversarial training (Rice et al., 2020) , while consuming much less training time. Our contributions are summarized as follows: • We conduct experiments to explain both the success and the failure of fast adversarial training for various cases. • We propose an alternative training strategy as a fix to fast adversarial training, which is equivalently efficient but allows training for a large number of epochs, and hence achieves better performance. • We propose to utilize the improved fast adversarial training as a warmup for PGD adversarial training, to outperform the state-of-the-art adversarial robustness, with reduced computation. 2 BACKGROUND AND RELATED WORK et al., 2019b; Xie et al., 2019) . However, a recent study (Rice et al., 2020) conducted extensive experiments on adversarially trained models and demonstrated that the performance gain from almost all recently proposed algorithmic modifications to PGD adversarial training is no better than a simple piecewise learning rate schedule and early stopping to prevent overfitting. In addition to adversarial training, a great number of adversarial defenses have been proposed, yet most remain vulnerable to stronger attacks (Goodfellow et al., 2014; Moosavi-Dezfooli et al., 2016; Papernot et al., 2016; Kurakin et al., 2016; Carlini & Wagner, 2017; Brendel et al., 2017; Athalye et al., 2018) . A major drawback of many defensive models is that they are heuristic and vulnerable to adaptive attacks that are specifically designed for breaking them (Carlini et al., 2019; Tramer et al., 2020) . To address this concern, many works have focused on providing provable/certified robustness of deep neural networks (Hein & Andriushchenko, 2017; Raghunathan et al., 2018; Kolter & Wong, 2017; Weng et al., 2018; Zhang et al., 2018; Dvijotham et al., 2018; Wong et al., 2018; Wang et al., 2018; Lecuyer et al., 2018; Li et al., 2019; Cohen et al., 2019) 



, yet their certifiable robustness cannot match the empirical robustness obtained by adversarial training. Among all adversarial defenses that claim empirical adversarial robustness, PGD adversarial training has stood the test of time. The only major caveat to PGD adversarial training is its computational cost, due to the iterative attacks at each training step. Many recent works try to reduce the computational overhead of PGD adversarial training. (Shafahi et al., 2019) proposes to update adversarial perturbations and model parameters simultaneously. By performing multiple updates on the same batch, it is possible to imitate PGD adversarial training with accelerated training speed. Redundant calculations are removed in (Zhang et al., 2019a) during back-propagation for constructing adversarial examples, to reduce computational overhead. Recently,(Wong et al., 2019)  shows surprising results that FGSM adversarial training can obtain strongly robust models if a large randomized initialization is used for FGSM attacks. However, they are forced to use a cyclic learning rate schedule(Micikevicius et al.,  2017)  and a small number of epochs for the training. This issue limits its performance, especially when compared to state-of-the-art PGD adversarial training with early stopping(Rice et al., 2020).

