BAG OF TRICKS FOR FGSM ADVERSARIAL TRAINING Anonymous

Abstract

Adversarial training (AT) with samples generated by Fast Gradient Sign Method (FGSM), also known as FGSM-AT, is a computationally simple method to train robust networks. However, during its training procedure, an unstable mode of "catastrophic overfitting" has been identified in (Wong et al., 2020), where the robust accuracy abruptly drops to zero within a single training step. Existing methods use gradient regularizers or random initialization tricks to attenuate this issue, whereas they either take high computational cost or lead to lower robust accuracy. In this work, we provide the first study, which thoroughly examines a collection of tricks from three perspectives: Data Initialization, Network Structure, and Optimization, to overcome the catastrophic overfitting in FGSM-AT. Surprisingly, we find that simple tricks, i.e., a) masking partial pixels (even without randomness), b) setting a large convolution stride and smooth activation functions, or c) regularizing the weights of the first convolutional layer, can effectively tackle the overfitting issue. Extensive results on a range of network architectures validate the effectiveness of each proposed trick, and the combinations of tricks are also investigated. For example, trained with PreActResNet-18 on CIFAR-10, our method attains 49.8% accuracy against PGD-50 attacker and 46.4% accuracy against AutoAttack, demonstrating that pure FGSM-AT is capable of enabling robust learners.

1. INTRODUCTION

Convolution neural networks (CNNs), though achieving compelling performances on various visual recognition tasks, are vulnerable to adversarial perturbations (Szegedy et al., 2014) . To effectively defend against such malicious attacks, adversarial examples are utilized as training data for enhancing model robustness, a process known as adversarial training (AT). To generate adversarial examples, one of the leading approaches is to perturb the data using the sign of the image gradients, namely the Fast Gradient Sign Method (FGSM) (Goodfellow et al., 2015) . The adversarial training with FGSM (FGSM-AT) is computationally efficient, and it lays the foundation for many followups (Kurakin et al., 2017; Tramèr et al., 2018; Madry et al., 2018; Xie et al., 2019; Zhang et al., 2019) . Nonetheless, interestingly, FGSM-AT is not widely used today because of the catastrophic overfitting: the model robustness will collapse after a few training epochs (Wong et al., 2020) . Several methods are proposed to mitigate catastrophic overfitting and stabilize FGSM-AT. For instance, (Wong et al., 2020) pre-add uniformly random noises around images to generate adversarial examples, i.e., turning the FGSM attacker into the PGD-1 attacker. (Andriushchenko & Flammarion, 2020) propose GradAlign, which regularizes the AT via explicitly maximizing the gradient alignment of the perturbations. While these approaches successfully alleviate the catastrophic overfitting, there are still some limitations. For example, GradAlign requires an extra forward pass compared to the vanilla FGSM-AT, which significantly increases the computational cost; Fast-AT in (Wong et al., 2020) shows relatively lower robustness, and may still collapse when used to train larger networks or applied in the larger-perturbation settings. In this paper, we aim to develop more effective and computationally efficient solutions for attenuating catastrophic overfitting. Specifically, we revisit FGSM-AT and design to stabilize its training from the following three perspectives: • Data Initialization. Following the idea of adding random perturbations in (Madry et al., 2018; Wong et al., 2020) , we propose to randomly mask a subset of the input pixels to stabilize FGSM-AT, dubbed FGSM-Mask. Surprisingly, additional analysis suggests that the randomness of the masking process may not be necessary during training-we find that applying a pre-defined masking pattern to the training set also effectively stabilizes FGSM-AT. This observation also holds for adding perturbations as the attack initialization in (Wong et al., 2020) , challenging the general belief that randomness is one of the key factors for stabilizing AT. • Network Structure. We identify two architectural elements that affect FGSM-AT. Firstly, in addition to boosting robustness as shown in (Xie et al., 2020) , we find that a smoother activation function can make FGSM-AT more stable. Secondly, we find vanilla FGSM-AT can effectively train Vision Transformers (ViTs) (Dosovitskiy et al., 2021) without showing catastrophic overfitting. We conjecture this phenomenon may be related to how CNNs and ViTs extract features: i.e., CNNs typically extract features from overlapped image regions (i.e., stride size < kernel size in convolution), while ViTs extract features from non-overlapped image patches (i.e., stride size = kernel size in convolution). By simply increasing the stride size of the first convolution layer in a CNN, we validate that the resulting model can stably train with FGSM-AT. • Optimization. Inspired by GradAlign (Andriushchenko & Flammarion, 2020), we propose ConvNorm, a regularization term that simply constrains the weights of the first convolution layer to stabilize FGSM-AT. Different from GradAlign which introduces a significant amount of extra computations, our ConvNorm works as nearly computationally efficiently as the vanilla FGSM-AT. Our contributions. In summary, we discover a bag of tricks that effectively alleviate the catastrophic overfitting in FGSM-AT from three different perspectives. We extensively validate the effectiveness of our methods with a range of different network structures on the popular CIFAR-10/100 datasets, using different perturbation radii. Our results demonstrate that only using FGSM-AT is capable of enabling robust learners. We hope this work can encourage future exploration on unleashing the potential of FGSM-AT.

2. PRELIMINARIES

Given a neural classifier f with parameters θ, we denote x and y as the input data and the corresponding label from the data generator D, respectively. δ represents the adversarial perturbation, ϵ is the maximum perturbation size under the l ∞ -norm constraint, and L is the cross-entropy loss typically used for image classification tasks. Adversarial Training: (Madry et al., 2018) formulates the adversarial training as a min-max optimization problem: min θ E (x,y)∼D max ∥δ∥∞≤ϵ L(f θ (x + δ), y) . Among different attacks for generating adversarial examples, we chose two popular ones to study: • FGSM: (Goodfellow et al., 2015) first proposes Fast Gradient Sign Method (FGSM) to generate the perturbation δ in a single step, as the following: δ = ϵ sign(∇ x L(f θ (x), y)), • PGD: (Madry et al., 2018) proposes a strong iterative version with a random start based on FGSM, named Projected Gradient Descent (PGD): x t+1 = Π ∥δ∥∞≤ϵ (x t + αsign(∇ xt L(f θ (x t ), y))) , where α denotes the step size of each iteration, x t denotes the adversarial examples after t steps, and Π ∥δ∥∞≤ϵ refers to the projection to the ϵ -Ball. Compared to FGSM, PGD provides a better choice for generating adversarial examples, but it will also be much more computationally expensive. In the following sections, we call adversarial training with FGSM as FGSM-AT, and correspondingly, with PGD as PGD-AT.

