ON INTRIGUING LAYER-WISE PROPERTIES OF RO-BUST OVERFITTING IN ADVERSARIAL TRAINING

Abstract

Adversarial training has proven to be one of the most effective methods to defend against adversarial attacks. Nevertheless, robust overfitting is a common obstacle in adversarial training of deep networks. There is a common belief that the features learned by different network layers have different properties, however, existing works generally investigate robust overfitting by considering a DNN as a single unit and hence the impact of different network layers on robust overfitting remains unclear. In this work, we divide a DNN into a series of layers and investigate the effect of different network layers on robust overfitting. We find that different layers exhibit distinct properties towards robust overfitting, and in particular, robust overfitting is mostly related to the optimization of latter parts of the network. Based upon the observed effect, we propose a robust adversarial training (RAT) prototype: in a minibatch, we optimize the front parts of the network as usual, and adopt additional measures to regularize the optimization of the latter parts. Based on the prototype, we designed two realizations of RAT, and extensive experiments demonstrate that RAT can eliminate robust overfitting and boost adversarial robustness over the standard adversarial training.

1. INTRODUCTION

Deep neural networks (DNNs) have been widely applied in multiple fields, such as computer vision (He et al., 2016) and natural language processing (Devlin et al., 2018) . Despite its achieved success, recent studies show that DNNs are vulnerable to adversarial examples. Well-constructed perturbations on the input images that are imperceptible to human's eyes can make DNNs lead to a completely different prediction (Szegedy et al., 2013) . The security concern due to this weakness of DNNs has led to various works in the study of improving DNNs robustness against adversarial examples. Across existing defense techniques thus far, Adversarial Training (AT) (Goodfellow et al., 2014; Madry et al., 2017) , which optimizes DNNs with adversarially perturbed data instead of natural data, is the most effective approach (Athalye et al., 2018) . However, it has been shown that networks trained by AT technique do not generalize well (Rice et al., 2020) . After a certain point in AT, immediately after the first learning rate decay, the robust test accuracy continues to decrease with further training. Typical regularization practices to mitigate overfitting such as l1 & l2 regularization, weight decay, data augmentation, etc. are reported to be as inefficient compared to simple early stopping (Rice et al., 2020) . Many studies have attempted to improve the robust generalization gap in AT, and most have generally investigated robust overfitting by considering DNNs as whole. However, DNNs trained on natural images exhibit a common phenomenon: features obtained in the first layers appear to be general and applicable widespread, while features computed by the last layers are dependent on a particular dataset and task (Yosinski et al., 2014) . Such behavior of DNNs sparks a question: Do different layers contribute differently to robust overfitting? Intuitively, robust overfitting acts as an unexpected optimization state in adversarial training, and its occurrence may be closely related to the entire network. Nevertheless, the unique effect of different network layers on robust overfitting is still unclear. Without a detailed understanding of the layer-wise mechanism of robust overfitting, it is difficult to completely demystify the exact underlying cause of the robust overfitting phenomenon. In this paper, we provide the first layer-wise diagnosis of robust overfitting. Specifically, instead of considering the network as a whole, we treat the network as a composition of layers and sys-tematically investigate the impact of robust overfitting phenomenon on different layers. To do this, we first fix the parameters for the selected layers, leaving them unoptimized during AT, and then normally optimize other layer parameters. We discovered that robust overfitting is always mitigated in the case where the latter layers are left unoptimized, and applying the same effect to other layers is futile for robust overfitting, suggesting a strong connection between the optimization of the latter layers and the overfitting phenomenon. Based upon the observed effect, we propose a robust adversarial training (RAT) prototype to relieve the issue of robust overfitting. Specifically, RAT works in each mini-batch: it optimizes the front layers as usual, and for the latter layers, it implements additional measures on these parameters to regularize their optimization. It is a general adversarial training prototype, where the front and latter network layers can be separated by some simple test experiments, and the implementation of additional measures to regularize network layer optimization can be versatile. For instance, we designed two representative methods for the realizations of RAT: RAT LR and RAT WP . They adopt different strategies to hinder weight update, e.g., enlarging the learning rate and weight perturbation, respectively. Extensive experiments show that the proposed RAT prototype effectively eliminates robust overfitting. The contributions of this work are summarized as follows: • We provide the first diagnosis of robust overfitting on different network layers, and find that there is a strong connection between the optimization of the latter layers and the robust overfitting phenomenon. • Based on the observed properties of robust overfitting, we propose the RAT prototype, which adopts additional measures to regularize the optimization of the latter layers and is tailored to prevent robust overfitting. • We design two different realizations of RAT, with extensive experiments on a number of standard benchmarks, verifying its effectiveness.

2.1. ADVERSARIAL TRAINING

Since the discovery of adversarial examples, there have been many defensive methods attempted to improve the DNN's robustness against such adversaries, such as adversarial training (Madry et al., 2017) , defense distillation (Papernot et al., 2016) , input denoising (Liao et al., 2018) , gradient regularization (Tramèr et al., 2018) . So far, adversarial training (Madry et al., 2017) has proven to be the most effective method. Adversarial training comprises two optimization problems: the inner maximization and outer minimization. The first one constructs adversarial examples by maximizing the loss and the second updates the weight by minimizing the loss on adversarial data. Here, f w is the DNN classifier with weight w, and ℓ(•) is the loss function. d(., .) specify the distance between original input data x i and adversarial data x ′ i , which is usually an l p -norm ball such as the l 2 and l ∞ -norm balls and ϵ is the maximum perturbation allowed. ℓ AT (w) = min w i max d(xi,x ′ i )≤ϵ ℓ(f w (x ′ i ), y i ),

2.2. ROBUST GENERALIZATION

An interesting characteristic of deep neutral networks (DNNs) is their ability to generalize well in practice (Belkin et al., 2019) . For the standard training setting, it is observed that test loss continues to decrease for long periods of training (Nakkiran et al., 2020) , thus the common practice is to train DNNs for as long as possible. However, this is no longer the case in adversarial training, which exhibits overfitting behavior the longer the training process (Rice et al., 2020) . This phenomenon has been referred to as "robust overfitting" and has shown strong resistance to standard regularization techniques such as l 1 , l 2 regularization and data augmentation methods. (Rice et al., 2020 ) Schmidt et al. (2018) theorizes that robust generalization have a large sample complexity, which requires substantially larger dataset. Many subsequent works have empirically validated such claim, such as AT with semi-supervised learning (Carmon et al., 2019; Zhai et al., 2019) , robust local feature (Song et al., 2020) and data interpolation (Lee et al., 2020; Chen et al., 2021) . (Chen et al., 2020) proposes to combine smoothing the logits via self-training and smoothing the weight via stochastic weight averaging to mitigate robust overfitting. Wu et al. (2020) emphasizes the connection of weight loss landscape and robust generalization gap, and suggests injecting the adversarial perturbations into both inputs and weights during AT to regularize the flatness of weight loss landscape. The intriguing property of robust overfitting has motivated great amount of study and investigation, but current works typically approach the phenomenon considering a DNN as a whole. In contrast, our work treats a DNN as a series of layers and reveals a strong connection between robust overfitting and the optimization of the latter layers, providing a novel perspective into better understanding the phenomenon.

3. INTRIGUING PROPERTIES OF ROBUST OVERFITTING

In this section, we first investigate the layer-wise properties of robust overfitting by fixing model parameters in AT (Section 3.1). Based on our observations, we further propose a robust adversarial training (RAT) prototype to eliminate robust overfitting (Section 3.2). Finally, we design two different realizations for RAT to verify the effectiveness of the proposed method (Section 3.3).

3.1. LAYER-WISE ANALYSIS OF ROBUST OVERFITTING

Current works usually study the robust overfitting phenomenon considering the network as a single unit. However, features computed by different layers exhibit different properties, such as first-layer features are general and last-layer features are specific (Yosinski et al., 2014) . We hypothesize that different network layers have different effects on robust overfitting. To empirically verify the above hypothesis, we deliberately fix the parameters of the selected network layers, leaving them unoptimized during AT and observe the behavior of robust overfitting accordingly. Specifically, we considered ResNet-18 architecture as a composition of 4 main layers, corresponding to 4 Residual blocks. We then train multiple PreAct ResNet-18 networks on CIFAR-10 for 200 epochs using AT, each time selecting a set of network layers to have their parameter fixed. The robust test performance in figure 1 (a) shows a consistent pattern. Robust overfitting is mitigated whenever we fix the parameters for layer 4 during AT, while any settings that do not fix the parameters for layer 4 result in a more severe gap between the best accuracy and the accuracy at the last epoch. For example, for settings such as , 4] , 4] and 4] , robust overfitting is significantly reduced. On the other hand, for settings such AT-fix-param-[1,2], AT-fix-param-[1,3] and AT-fix-param-[2,3], when we fix the parameters of various set of layers but allow for the optimization of layer 4, robust overfitting still widely exists. For extreme case like 2, 3] , where we fix the first three front layers and only allow for the optimization of that last layer 4, the gap between the best accuracy and the last accuracy is still obvious. This clearly indicates that the optimization of the latter layers present a strong correlation to the robust overfitting phenomenon. Note that this relationship can be observed across a variety of datasets, model architectures, and threat models (shown in Appendix A), indicating that it is a general property in adversarial training. In many of these settings, robust overfitting is mitigated at the cost of robust accuracy. For example in AT-fix-param-[3,4], if we leave both layer 3 & 4 unoptimized, robust overfitting will practically disappear, but the peak performance is much worse compared to standard AT. When carefully examining the training performance in these settings shown in figure 1(b), we generally observe that the network capacity to fit adversarial data is strong when we fix the parameters for the front layers, but it gradually gets weaker as we try to fix the latter layers. For instance, AT-fix-param-[1] has the highest train robust accuracy, then comes AT-fix-param[2], AT-fix-param[3] and AT-fix-param[4]; AT-fix-param[1,2,3] has higher training accuracy than 3, 4] . This suggests fixing the latter layers' parameters can regularize the network better compared to fixing the front layers's parameters. In the subsequent sections, we will introduce methods that specifically regularize the optimization of the latter layers, so as to mitigate robust overfitting without tradeoffs in robustness. We will compare the impact on robust overfitting when applied such methods on the front layers vs the latter layers, further highlighting the importance of the latter layers in relation to robust overfitting. [1, 2]  AT_fix_param_[1, 3]  AT_fix_param_[1, 4 [1, 2, 3]  AT_fix_param_[1, 2, 4]  AT_fix_param_[1, 3, 4]  AT_fix_param_[2, 3, 4  .8 AT_standard AT_fix_param_[1, 2, 3] AT_fix_param_[1, 2, 4] AT_fix_param_[1, 3, 4] AT_fix_param_[2, 3, 4] Epochs Train Robust Accuracy (b) Robust Train Performance

3.2. A PROTOTYPE OF RAT

As witnessed in Section 3.1, the optimization of AT in the latter layers is highly correlated to the existence of robust overfitting. To address this, we propose to train the network on adversarial data with some restrictions put onto the optimization of the latter layers, dubbed as Robust Adversarial Training (RAT). RAT adopts additional measures to regularize the optimization of the latter layers, and ensures that robust overfitting will not occur. The RAT prototype is given in Algorithm 1. It runs as follows. We start with a base adversarial training algorithm A. In Line 1-3, The inner maximization pass aims to maximize the loss via creating adversarial examples, and then the outer minimization pass updates the weight by minimizing the loss on adversarial data. Line 4 initiates a loop through all parts of the weight w from the front layers to the latter layers. Line 5-9 then manipulate different parts of the weight based on its layer conditions. If the parts of the weight belong to the front layers (C front ), they will be kept intact. Otherwise, a weight update scheme S is put onto the parts of the weight corresponding to the latter layers (C latter ). The role of S is to apply some regularization on the latter layers' weight. Finally, the optimizer O updates the model f w in Line 11. Note that RAT is a general prototype where layer conditions C front , C latter and weight adjustment strategy S can be versatile. For example, based on the observations in Section 3.1, we treat the Res-Net architecture as a composition of 4 main layers, corresponding to 4 residual blocks, where C front indicates layer 1 & 2 and C latter indicates layer 3 & 4. S can also represent various strategies that serves to regularize the optimization of the latter layers. In the section below, we will propose two different strategies S in the implementations of RAT to demonstrate RAT's effectiveness.

3.3. TWO REALIZATIONS OF RAT

In this section, we will propose two different methods to put certain restrictions on the optimization of selected parts of the network, and then investigate the robust overfitting behavior upon applying such method to the front layers vs the latter layers. These methods showcase a clear relation between the optimization of the latter layers and robust generalization gap. RAT through enlarging learning rate. In standard AT, the sudden increases in robust test performance appears to be closely related to the drops in the scheduled learning rate decay. We hypothesize Algorithm 1 RAT-prototype (in a mini-batch). Require: base adversarial training algorithm A, optimizer O, network f w , model parameter w = {w 1 , w 2 , ..., w n }, training data D = {(x i , y i )}, mini-batch B, front and latter layer conditions C front and C latter for f w , gradient adjustment strategy S 1: Sample a mini-batch B = {(x i , y i )} from D 2: B ′ = A.inner maximization(f w , B) 3: ∇ w ← A.outer minimization(f w , ℓ B ′ ) 4: for i = 1, ..., n do 5: if C front (w i ) then 6: ∇ wi ← ∇ wi 7: else if C latter (w i ) then 8: ∇ wi ← S(f w , B ′ , ∇ wi ) # adjust gradient 9: end if 10: end for 11: O.step(∇ w ) that training AT without learning rate decays is sub-optimal, which can regularize the learning process of adversarial training. Comparison of the train/test performance between standard AT and AT without learning rate decay (AT-fix-lr-[1,2,3,4]) are shown in figure 2(b) . Training performance of standard AT accelerates quickly right after the first learning rate drop, expanding the generalization gap with further training, whereas for AT without learning rate decay, training performance increases slowly and maintain a stable generalization gap. This suggests that AT optimized without learning rate decay has less capacity to fit adversarial data, and thus provides the regularization needed to relieve robust overfitting. As our previous analysis suggests that the optimization of the latter layers is more important in mitigating robust overfitting, we propose using a fixed learning rate = 0.1 for optimizing the latter parts of the network while applying the piecewise decay learning rate for the former parts to close the robust generalization gap. We refer to this approach as a realization of RAT, namely RAT LR . Compared to standard AT, RAT LR essentially enlarge the weight update step ∇ wi along the latter parts of the gradients by 10 at the first learning rate decay and 100 at the second decay. ∇ wi = η∇ wi , ( ) where η is the amplification coefficient. To demonstrate the effectiveness of RAT LR , we train multiple PreAct ResNet-18 networks on CIFAR-10 for 200 epochs using AT, each time selecting a set of network layers to have their learning rate fixed to 0.1 while maintaining the piece-wise learning rate schedule for other layers. Figure 2 RAT through adversarial weight pertubation. We continue to study the impact of different network layers to robust overfitting phenomenon from the perspective of adversarial weight perturbation (AWP). Wu et al. (2020) proposes AWP as a method to explicitly flatten weight loss landscape, by introducing adversarial perturbations into both inputs and weights during AT: min w max v∈V i max d(xi,x ′ i )≤ϵ ℓ(f w+v (x ′ i ), y i ), where v is the adversarial weight perturbation generated by maximizing the classification loss: v = ∇ w i ℓ i . As AWP keeps injecting the worst-case perturbations on weight during training, it could also be viewed as a means to regularize the optimization of AT. In fact, the training of AWP exhibits a negative robust generalization gap, where robust training accuracy is in short of robust testing accuracy by a large margin, shown in figure 3(c ). This indicates AWP put significant restrictions to the optimization of AT, introducing huge trade-offs to training performance. As our previous analysis suggests a strong correlation between robust overfitting and the optimization of the latter layers, we argue that the capacity to mitigate robust overfitting from AWP is mostly thanks to the perturbations occurring at latter layers' weight. As such, we propose to specifically apply AWP to the latter half of the network, and refer to this method as RAT WP . In essence, RAT WP compute the adversarial weight perturbation v i under the layer condition C latter (w i ), so that only the parts of the weight along the latter half of the network are perturbed.  max d(xi,x ′ i )≤ϵ ℓ(f w+v (x ′ i ), y i ), v i = ∇ wi i ℓ i . To prove the effectiveness of RAT WP , we train multiple PreAct ResNet-18 networks on CIFAR-10 for 200 epochs using AT, each time selecting a set of network layers to have their weight locally [1, 2, 3]  AT_awp_[1, 2, 4]  AT_awp_[1, 3, 4]  AT_awp_[2, 3, 4 [2, 3, 4] . These settings share one key similarity: both layer 3&4 have their weight adversarially perturbed during AT. Simply applying AWP to any set of layers that exclude layers 3&4 is not sufficient to eliminate robust overfitting. This shows that AWP is effective in solving robust overfitting only when applied to both layer 3 and layer 4. Even when AWP is applied to the first 3 former layers out of 4 layers (AT-awp-[1,2,3]), robust overfitting still widely exists. In another word, it is essential for the adversarial weight perturbations to occur at the latter part of the network in order to mitigate robust overfitting. To examine this phenomenon in detail, we compare the training performance of AWP applied to front layers (represented by AT-AWP-[1,2,3]) vs AWP applied to latter layers (represented by 4] ), shown in figure 3(b) . AWP applied in the front layers have a much better training performance than AWP applied in the latter layers. Furthermore, AWP applied to front layers reveals a positive robust generalization gap (training accuracy > testing accuracy) shortly after the first drop in learning rate, which continues to widen with further training. Conversely, AWP applied in the latter layers exhibits a negative robust generalization gap throughout most of the training, only converging to 0 after the second drop in learning rate. These differences demonstrate that worst-case perturbations, when injected into the latter layers' weights, have a more powerful impact in regularizing the optimization of AT. Consistent with our previous findings, AWP applied to the latter layers can be considered as an approach to regularize the optimization of AT in those layers, which successfully mitigates robust overfitting. This finding supports our analysis thus far, further demonstrating that regularizing the optimization of the latter layers is key to improving the robust generalization.

4. EXPERIMENT

In this section, we conduct extensive experiments to verify the effectiveness of RAT LR and RAT WP . Details of the experiment settings and performance evaluation are introduced below. Table 1 : Test robustness (%) on CIFAR10. We omit the standard deviations of 5 runs as they are very small (< 0.6%). 

4.1. EXPERIMENTAL SETUP

We conduct extensive experiments on two realizations of RAT across three benchmark datasets (CIFAR10 (Krizhevsky et al., 2009) , SVHN (Netzer et al., 2011) and CIFAR100 (Krizhevsky et al., 2009) ) and two threat models (L ∞ and L 2 ). We use PreAct ResNet-18 He et al. (2016) and Wide ResNet-34-10 following the same hyperparameter settings for AT in Rice et al. (2020) : for L ∞ threat model, ϵ = 8/255, step size is 1/255 for SVHN, and 2/255 for CIFAR-10 and CIFAR-100; for L 2 threat model, ϵ = 128/255, step size is 15/255 for all datasets. For training, all models are trained under 10-step PGD (PGD-10) attack for 200 epochs using SGD with momentum 0.9, weight decay 5 × 10 -4 , and a piecewise learning rate schedule with an initial learning rate of 0.1. RAT models are decomposed into a series of 4 main layers, corresponding to 4 residual blocks of the ResNet architecture. For RAT LR , learning rate for layer 3&4 are set to a fixed value of 0.1. For RAT WP leveraging AWP in layer 3&4, γ = 1 × 10 -2 . For testing, the robustness accuracy is evaluated under two different adversarial attacks, including 20-step PGD (PGD-20) and Auto Attack (AA) Croce & Hein (2020b) . Auto Attack is considered the most reliable robustness evaluation to date, which is an ensemble of complementary attacks, consisting of three white-box attacks (APGD-CE (Croce & Hein, 2020b) , APGD-DLR (Croce & Hein, 2020b) , and FAB (Croce & Hein, 2020a) ) and a black-box attack (Square Attack (Andriushchenko et al., 2020) ) Table 3 : Test robustness (%) on SVHN. We omit the standard deviations of 5 runs as they are very small (< 0.6%). CIFAR100 Results. We also show the results on CIFAR100 dataset in Table 2 . We observe similar performance like CIFAR10, where both RAT LR and RAT WP is able to significantly reduce the robustness gaps. For robustness improvement, RAT WP stands out to be the leading method. The results further verify the effectiveness of the proposed approach. SVHN Results. Finally, we summarize the results on the SVHN dataset in Table 3 , where robustness gap are also narrowed down to a small margin by RAT WP . SVHN dataset is a special case where RAT LR strategy does not improve robust overfitting. Unlike CIFAR10 and CIFAR100, learning rate decay in SVHN's training does not have much connection to the sudden increases in robust test performance or the prevalence of robust overfitting, and hence makes RAT LR ineffective. Other than this, The improvement in robust generalization gaps can be witnessed in all cases, demonstrating the proposed approachs are generic and can be applied widely.

5. CONCLUSION

In this paper, we investigate the effects of different network layers on robust overfitting and identify that robust overfitting is mainly driven by the optimization occurred at the latter layers. Following this, we propose a robust adversarial training (RAT) prototype to specifically hinder the optimization of the latter layers in the process of training adversarial network. The approach prevents the model from overfitting the latter parts of the network, which effectively eliminate robust overfitting of the network as a whole. We then further demonstrate two implementations of RAT: one locally uses a fixed learning rate for the latter layers and the other utilize adversarial weight perturbation for the latter layers. Extensive experiments show the effectiveness of both approaches, suggesting RAT is generic and can be applied across different network architectures, threat models and benchmark datasets to solve robust overfitting. 

OVERFITTING

In this section, we provide more empirical experiments to showcase the layer-wise properties of robust overfitting across different datasets, model architectures and threat models. Specifically, we use two strategies mentioned in Section 3.3 to put restriction on the optimization of different network layers. We can always observe that there is no robust overfitting when we regularize the optimization of layers 3 and 4 (the latter layers), while robust overfitting is prevalent for other settings. These evidences further highlight the strong relation between robust overfitting and the optimization of the latter layers.

A.1 EVIDENCES ACROSS DATASETS

We show that the layer-wise properties of robust overfitting is universal across datasets on CIFAR-100 and SVHN. We adversarially train PreAct ResNet-18 under l ∞ threat model on different datasets with the same settings as Section 3.3. The results are shown in Figure 4 and 5. Note that for SVHN, regularization strategy utilizing a fixed learning rate (RAT LR ) for does not improve robust overfitting (Figure 4 ). Unlike CIFAR10 and CIFAR100, SVHN's training overfits way before the first learning rate decay. Also, learning rate decay in SVHN's training does not have any relation to the sudden increases in robust test performance or the appearance of robust overfitting. Hence, SVHN dataset is a special case where RAT LR does not apply. For all other cases, robust overfitting is effectively eliminated by regularizing the optimization of layers 3 and 4.

A.2 EVIDENCES ACROSS THREAT MODELS

We further demonstrate that the generality of layer-wise properties of robust overfitting by conducting experiments under l 2 threat model across datasets. The settings are the same as Section 3.3. The results are shown in Figure 6 and 7. Under l 2 threat model, except for SVHN dataset where regularization strategy utilizing a fixed learning rate (RAT LR ) does not apply, robust overfitting is effectively eliminated by regularizing the optimization of layers 3 and 4. 



Figure 1: The robust train/test performance of adversarial training with different sets of network layers fixed. AT-fix-param[1,2] corresponds to fixing the parameters of layers 1 & 2 during AT

Figure 2: The train/test performance of adversarial training using a fixed learning rate for different sets of network layers. AT-fix-lr[1,2] corresponds to using a fixed learning rate for layers 1 & 2 during AT

min w=[w1,...,wi,...,wn] max v=[0,...,vi,...0]∈V i

Figure 3: The train/test performance of adversarial training when applying AWP for different sets of network layers. AT-AWP-[1,2] means only layer 1 & 2 have their weight perturbed using AWP

Figure 4: Robust test performance of adversarial training using a fixed learning rate for different sets of network layers, across datasets (CIFAR-100 and SVHN) under l ∞ threat

Figure 6: Robust test performance of adversarial training using a fixed learning rate for different sets of network layers, across datasets (CIFAR-10, CIFAR-100 and SVHN) under l 2 threat

(a) validate our proposition. Robust overfitting is relieved for all settings that target layers that include layer 4 (AT-fix-lr-[4], AT-fix-lr-[1,4],4], etc.) while any settings that fix the learning rate of layers that exclude layer 4 do not reduce robust overfitting. Furthermore, all settings that fix the learning rate for both layer 3 & 4, including4],3,4],3,4]   AT-fix-lr-[1,2,3,4]  completely eliminate robust overfitting. The observations verify that regularizing the optimization of the latter layers by optimizing those layers without learning rate decays can prevent robust overfitting from occurring. An important observation is that RAT LR (AT-fix-lr-[3,4]) can both overcome robust overfitting and achieve better robust test performance compared to the network using a fixed learning rate for all layers (AT-fix-lr-[1,2,3,4]). Examining the training performance between these two settings in figure 2(c), we find that RAT LR exhibits a rapid rise in both robust and standard training performance immediately after the first learning rate decay similar to standard AT. The training performance of RAT LR is able to benefit from the learning rate decay occurring at layer 1 & 2, making a notable improvement compared to2,3,4]. By training layers 3 & 4 without learning rate decays, we specifically put some restrictions on the optimization of only the latter parts of the network heavily responsible for robust overfitting, which can relieve robust overfitting without sacrificing too much performance. The experiment results provide another indication that the latter layers have stronger connections to robust overfitting than the front layers do, and regularizing the optimization of the latter layers from the perspective of learning rate can effectively solve robust overfitting.

Test robustness (%) on CIFAR100. We omit the standard deviations of 5 runs as they are very small (< 0.6%).

In this section, we present the experimental results of RAT LR and RAT WP across three benchmark datasets CIFAR10 Results. The evaluation results on CIFAR10 dataset are summarized in Table1, where "Best" is the highest test robustness achieved during training; "Last" is the test robustness at the last epoch checkpoint; "Diff" denotes the robust accuracy gap between the "Best" & "Last". It is observed that RAT WP generally achieves the best robust performance compared to RAT LR & standard AT. Regardless, both RAT LR and RAT WP tighten the robustness gaps by a significant margin, indicating they can effectively suppress robust overfitting.

