ROBUST OVERFITTING MAY BE MITIGATED BY PROP-ERLY LEARNED SMOOTHENING

Abstract

A recent study (Rice et al., 2020) revealed overfitting to be a dominant phenomenon in adversarially robust training of deep networks, and that appropriate early-stopping of adversarial training (AT) could match the performance gains of most recent algorithmic improvements. This intriguing problem of robust overfitting motivates us to seek more remedies. As a pilot study, this paper investigates two empirical means to inject more learned smoothening during AT: one leveraging knowledge distillation and self-training to smooth the logits, the other performing stochastic weight averaging (Izmailov et al., 2018) to smooth the weights. Despite the embarrassing simplicity, the two approaches are surprisingly effective and hassle-free in mitigating robust overfitting. Experiments demonstrate that by plugging in them to AT, we can simultaneously boost the standard accuracy by 3.72% ∼ 6.68% and robust accuracy by 0.22% ∼ 2.03%, across multiple datasets (STL-10, SVHN, CIFAR-10, CIFAR-100, and Tiny Ima-geNet), perturbation types ( ∞ and 2 ), and robustified methods (PGD, TRADES, and FSGM), establishing the new state-of-the-art bar in AT. We present systematic visualizations and analyses to dive into their possible working mechanisms. We also carefully exclude the possibility of gradient masking by evaluating our models' robustness against transfer attacks. Codes are available at https: //github.com/VITA-Group/Alleviate



advances upon AT: by just using an earlier checkpoint, the performance of AT be drastically boosted to match the more recently reported state-of-the-arts (Yang et al., 2019b; Zhang et al., 2019b) . Even worse, Rice et al. ( 2020) tested several other implicit and explicit regularization methods, including weight decay, data augmentation and semi-supervised learning; they reported that none of those alternatives seem to combat robust overfitting (stably) better than simple early stopping. The authors thus advocated using the validation set to select a stopping point, although the manual picking would inevitably trade off between selecting either the peak point of robust test accuracy or that of standard accuracy, which often do not coincide (Chen et al., 2020a) . Does there exist more principled, hands-off, and hassle-free mitigation for this robust overfitting, for us to further unleash the competency of AT? This paper explores two options along the way, that draw two more sophisticated ideas from enhancing standard deep models' generalization. Both could be viewed as certain types of learned smoothening, and are directly plugged into AT: • Our first approach is to smooth the logits in AT via self-training, using knowledge distillation with the same model pre-trained as a self-teacher. The idea is inspired by two facts: ( Tramèr et al., 2018; Grefenstette et al., 2018) with the convenience of a single model. Those suggest that applying SWA is natural and promising for AT. To be clear, neither knowledge-distillation/self-training nor SWA was invented by this paper: they have been utilized in standard training to alleviate (standard) overfitting and improve generalization, by fixing over-confidence and by finding flatter solutions, respectively. By introducing and adapting them to AT, our aim is to complement the existing study, demonstrating that while simpler regularizations were unable to fix robustness overfitting as Rice et al. (2020) found, our learned logit/weight smoothening could effectively regularize and mitigate it, without needing early stopping. Experiments demonstrate that by plugging in the two techniques to AT, we can simultaneously boost the standard accuracy by 3.72% ∼ 6.68% and robust accuracy by 0.22% ∼ 2.03%, across multiple datasets (STL-10, SVHN, CIFAR-10, CIFAR-100, and Tiny ImageNet), perturbation types ( ∞ and 2 ), and robustified methods (PGD, TRADES, and FSGM), establishing the new state-of-the-art in AT. As shown in Figure 1 example, our method eliminates the robust overfitting phenomenon in AT, even when training up to 200 epochs. Our results imply that although robustness overfitting is more challenging than standard overfitting, its mitigation is still feasible with properly-chosen, advanced regularizations that were developed for the latter. Overall, our findings join (Rice et al., 2020) in re-establishing the competitiveness of the simplest AT baseline.

1.1. BACKGROUND WORK

Deep networks are easily fooled by imperceivable adversarial samples. To tackle this vulnerability, numerous defense methods were proposed (Goodfellow et al., 2015; Kurakin et al., 2016; Madry et al., 2018) , yet many of them (Liao et al., 2018; Guo et al., 2018; Xu et al., 2017; Dziugaite et al., 2016; Dhillon et al., 2018; Xie et al., 2018; Jiang et al., 2020) et al., 2018; Yang et al., 2019b; Mosbach et al., 2018; Hu et al., 2020; Wang et al., 2020a; Dong et al., 2020; Zhang et al., 2020a; b) , with some of them also being variants of AT, e.g. TRADES (Zhang et al., 2019b) and AT with metric learning regularizers (Mao et al., 2019; Pang et al., 2019; 2020) . While overfitting has become less a practical concern in training deep networks nowadays, it was not yet noticed nor addressed in the adversarial defense field until lately. An overfitting phenomenon was



1) label smoothening(Szegedy et al., 2016)  can calibrate the notorious overconfidence of deep networks(Hein et al., 2019), and that was found to improve their standard generalization; (2) label smoothening can be viewed as a special case of knowledge distillation(Yuan  et al., 2020), and self-training can produce more semantic-aware and discriminative soft label "self-teachers" than naive label smoothening(Chen et al., 2020b; Tang et al., 2020).• Our second approach is to smooth the weights in AT via stochastic weight averaging (SWA)

were later found to result from training artifacts, such as obfuscated gradients (Athalye et al., 2018) caused by input transformation or randomization. Among them, adversarial training (AT) (Madry et al., 2018) remains one of the most competitive options. Recently more improved defenses have been reported (Dong

