BAG OF TRICKS FOR FGSM ADVERSARIAL TRAINING Anonymous

Abstract

Adversarial training (AT) with samples generated by Fast Gradient Sign Method (FGSM), also known as FGSM-AT, is a computationally simple method to train robust networks. However, during its training procedure, an unstable mode of "catastrophic overfitting" has been identified in (Wong et al., 2020), where the robust accuracy abruptly drops to zero within a single training step. Existing methods use gradient regularizers or random initialization tricks to attenuate this issue, whereas they either take high computational cost or lead to lower robust accuracy. In this work, we provide the first study, which thoroughly examines a collection of tricks from three perspectives: Data Initialization, Network Structure, and Optimization, to overcome the catastrophic overfitting in FGSM-AT. Surprisingly, we find that simple tricks, i.e., a) masking partial pixels (even without randomness), b) setting a large convolution stride and smooth activation functions, or c) regularizing the weights of the first convolutional layer, can effectively tackle the overfitting issue. Extensive results on a range of network architectures validate the effectiveness of each proposed trick, and the combinations of tricks are also investigated. For example, trained with PreActResNet-18 on CIFAR-10, our method attains 49.8% accuracy against PGD-50 attacker and 46.4% accuracy against AutoAttack, demonstrating that pure FGSM-AT is capable of enabling robust learners.

1. INTRODUCTION

Convolution neural networks (CNNs), though achieving compelling performances on various visual recognition tasks, are vulnerable to adversarial perturbations (Szegedy et al., 2014) . To effectively defend against such malicious attacks, adversarial examples are utilized as training data for enhancing model robustness, a process known as adversarial training (AT). To generate adversarial examples, one of the leading approaches is to perturb the data using the sign of the image gradients, namely the Fast Gradient Sign Method (FGSM) (Goodfellow et al., 2015) . The adversarial training with FGSM (FGSM-AT) is computationally efficient, and it lays the foundation for many followups (Kurakin et al., 2017; Tramèr et al., 2018; Madry et al., 2018; Xie et al., 2019; Zhang et al., 2019) . Nonetheless, interestingly, FGSM-AT is not widely used today because of the catastrophic overfitting: the model robustness will collapse after a few training epochs (Wong et al., 2020) . Several methods are proposed to mitigate catastrophic overfitting and stabilize FGSM-AT. For instance, (Wong et al., 2020) pre-add uniformly random noises around images to generate adversarial examples, i.e., turning the FGSM attacker into the PGD-1 attacker. (Andriushchenko & Flammarion, 2020) propose GradAlign, which regularizes the AT via explicitly maximizing the gradient alignment of the perturbations. While these approaches successfully alleviate the catastrophic overfitting, there are still some limitations. For example, GradAlign requires an extra forward pass compared to the vanilla FGSM-AT, which significantly increases the computational cost; Fast-AT in (Wong et al., 2020) shows relatively lower robustness, and may still collapse when used to train larger networks or applied in the larger-perturbation settings. In this paper, we aim to develop more effective and computationally efficient solutions for attenuating catastrophic overfitting. Specifically, we revisit FGSM-AT and design to stabilize its training from the following three perspectives: • Data Initialization. Following the idea of adding random perturbations in (Madry et al., 2018; Wong et al., 2020) , we propose to randomly mask a subset of the input pixels to stabilize FGSM-AT, 1

