EFFICIENT ROBUST TRAINING VIA BACKWARD SMOOTHING

Abstract

Adversarial training is so far the most effective strategy in defending against adversarial examples. However, it suffers from high computational cost due to the iterative adversarial attacks in each training step. Recent studies show that it is possible to achieve Fast Adversarial Training by performing a single-step attack with random initialization. Yet, it remains a mystery why random initialization helps. Besides, such an approach still lags behind state-of-the-art adversarial training algorithms on both stability and model robustness. In this work, we develop a new understanding towards Fast Adversarial Training, by viewing random initialization as performing randomized smoothing for better optimization of the inner maximization problem. From this perspective, we show that the smoothing effect by random initialization is not sufficient under the adversarial perturbation constraint. A new initialization strategy, backward smoothing, is proposed to address this issue and significantly improves both stability and model robustness over single-step robust training methods. Experiments on multiple benchmarks demonstrate that our method achieves similar model robustness as the original TRADES method, while using much less training time (∼3x improvement with the same training schedule).

1. INTRODUCTION

Deep neural networks are well known to be vulnerable to adversarial examples (Szegedy et al., 2013) , i.e., a small perturbation on the original input can lead to misclassification or erroneous prediction. Many defense methods have been developed to mitigate the disturbance of adversarial examples (Guo et al., 2018; Xie et al., 2018; Song et al., 2018; Ma et al., 2018; Samangouei et al., 2018; Dhillon et al., 2018; Madry et al., 2018; Zhang et al., 2019) , among which robust training methods, such as adversarial training (Madry et al., 2018) and TRADES (Zhang et al., 2019) , are currently the most effective strategies. Specifically, adversarial training method (Madry et al., 2018) trains a model on adversarial examples by solving a min-max optimization problem: min θ 1 n n i=1 max x i ∈B (xi) L(f θ (x i ), y i ), (1.1) where {(x i , y i )} n i=1 is the training dataset, f (•) denotes the logits output of the neural network, B (x i ) := {x : xx i ∞ ≤ } denotes the -perturbation ball, and L is the cross-entropy loss. On the other hand, instead of directly training on adversarial examples, TRADES (Zhang et al., 2019) further improves model robustness with a trade-off between natural accuracy and robust accuracy, by solving the empirical risk minimization problem with a robust regularization term: min θ 1 n n i=1 L(f θ (x i ), y i ) + β max x i ∈B (xi) KL s(f θ (x i )), s(f θ (x i )) , (1.2) where s(•) denotes the softmax function, and β > 0 is a regularization parameter. The goal of this robust regularization term (i.e., KL divergence term) is to ensure the outputs are stable within the local neighborhood. Both adversarial training and TRADES achieve good model robustness, as shown on recent model robustness leaderboardsfoot_0 (Croce & Hein, 2020b; Chen & Gu, 2020).



https://github.com/fra31/auto-attack and https://github.com/uclaml/RayS. 1

