IMPROVING ADVERSARIAL ROBUSTNESS VIA FRE-QUENCY REGULARIZATION

Abstract

Deep neural networks (DNNs) are incredibly vulnerable to crafted, humanimperceptible adversarial perturbations. While adversarial training (AT) has proven to be an effective defense approach, the properties of AT for robustness improvement remain an open issue. In this paper, we investigate AT from a spectral perspective, providing new insights into the design of effective defenses. Our analyses show that AT induces the deep model to focus more on the low-frequency region, which retains the shape-biased representations, to gain robustness. Further, we find that the spectrum of a white-box attack is primarily distributed in regions the model focuses on, and the perturbation attacks the spectral bands where the model is vulnerable. To train a model tolerant to frequency-varying perturbation, we propose a frequency regularization (FR) such that the spectral output inferred by an attacked input stays as close as possible to its natural input counterpart. Experiments demonstrate that FR and its weight averaging (WA) extension could significantly improve the robust accuracy by 1.14% ∼ 4.57% relative to the AT, across multiple datasets (SVHN, CIFAR-10, CIFAR-100, and Tiny ImageNet), and various attacks (PGD, C&W, and Autoattack), without any extra data.

1. INTRODUCTION

DNNs have exhibited strong capabilities in various application such as computer vision He et al. (2016) , natural language processing Devlin et al. (2018) , recommendation systems Covington et al. (2016) , etc. However, researches in adversarial learning show that even well-trained DNNs are highly susceptible to adversarial perturbations Goodfellow et al. (2014) ; Szegedy et al. (2013) . These perturbations are nearly indistinguishable to human eyes but can mislead neural networks to completely erroneous outputs, thus endangering safety-critical application. Among various defense methods for improving robustness Das et al. (2018) ; Mao et al. (2019) ; Zheng et al. (2020) , adversarial training (AT) Madry et al. (2017) , which feeds adversarial inputs into a DNN to solve a min-max optimization problem, proves to be an effective means without obfuscated gradients problems Athalye et al. (2018) . Some recent results inspired by AT are also in place to further boost the robust accuracy: Zhang et al. (2019) identify a trade-off between standard and robust accuracies that serves as a guiding principle for designing the defenses. Wu et al. (2020) claim that the weight loss landscape is closely related to the robust generalization gap, and propose an effective adversarial weight perturbation method to overcome the robust overfitting problem Rice et al. (2020) . Jia et al. (2022) introduce a learnable attack strategy to automatically produce the proper hyperparameters for generating the perturbations during training to improve the robustness. On the other hand, frequency analysis provides a new lens on the generalization behavior of DNNs. Wang et al. (2020a) claim that convolutional neural networks (CNNs) could capture humanimperceptible high-frequency components of images for predictions. It is found that robust models have smooth convolutional kernels in the first layer, thereby paying more attention to low-frequency information. Yin et al. (2019) establish a connection between the frequency of common corruptions and model performance, especially for high-frequency corruptions. It views AT as a data augmentation method to bias the model toward low-frequency information, which improves the robustness to high-frequency corruptions at the cost of reduced robustness to low-frequency corruptions. Zhang & Zhu (2019) find that AT-CNNs are better at capturing long-range correlations such as shapes, and less biased towards textures than normally trained CNNs in popular object recognition datasets. Our findings are similar, but we learn them from a spectral perspective. Wang et al. (2020b) state that perturbations mainly focus on the high-frequency information in natural images, and low-frequency information is more robust than the high-frequency part. It is claimed that developing a stronger association between low-frequency information with true labels makes the model robust. However, our study shows that building this connection alone cannot render the model adversarial robustness. The closest work to ours is Maiya et al. (2021) , which discovers that the adversarial perturbation is data-dependent and analyses many intriguing properties of AT with frequency constraints. Our research goes one step further to show that the perturbation is also model-dependent, and explains why it behaves differently across the datasets and models. Besides, we propose a frequency regularization (FR) to improve robust accuracy. These breakthroughs motivate us to zoom in on deeper AT analysis from a spectral viewpoint. Specifically, we obtain models with different frequency biases and study the distribution of their corresponding white-box attack perturbations across different datasets. We then propose a simple yet effective FR to improve the adversarial robustness and perform validation on multiple datasets. Our main contributions are: • We find that AT facilitates the model to focus on robust low-frequency information, which contains the shape-biased representation to improve the robustness. In contrast, simply focusing on lowfrequency information does not lead to adversarial robustness. • We reveal for the first time that the white-box attack is primarily distributed in the frequencies where the model focuses on, and can adapt its aggressive frequency distribution to the model's sensitivity to frequency corruptions. This explains why white-box attacks are hard to defend. • We propose a FR that enforces alignment of the outputs of natural and adversarial examples in the frequency domain, thus effectively improving the adversarial robustness.

2. PRELIMINARIES

Typically, AT updates the model weights to solve the min-max saddle point optimization problem: min θ 1 n n i=1 max ∥δ∥ p ≤ϵ L (f θ (x i + δ) , y i ) , ( ) where n is the number of training examples, x i +δ is the adversarial input within the ϵ-ball (bounded by an L p -norm) centered at the natural input x i , δ is the perturbation, y i is the true label, f θ is the DNN with weight θ, L(•) is the classification loss, e.g., cross-entropy (CE). We refer to the adversarially trained model as the robust model and the naturally trained model as the natural model. The accuracy achieved on natural and adversarial inputs is denoted as standard accuracy and robust accuracy, respectively. We define the high-pass filtering (HPF) with bandwidth k as the operation that after a Fast Fourier Transform (FFT), only the k × k patch in the center (viz. high frequencies) is preserved, and all external values are zeroed, and then applies inverse FFT. Low-pass filtering (LPF) is defined similarly except that the low-frequency part is shifted to the center after FFT, to be preserved by the center k × k patch as in Yin et al. (2019) .

3.1. FREQUENCY ATTENTION & LOW-FREQUENCY INFORMATION

Attention to the Frequency Domain. Since the labels are inherently tied with the low-frequency information Wang et al. (2020a) , to maintain high standard accuracy and explore the connection between the low-frequency information and adversarial robustness, we train models (denoted as L-models) with the natural inputs after the LPF with a bandwidth of 16 for multiple datasets (32 for Tiny ImageNet), cf. Table 1 . Then, we feed natural inputs processed by LPF with different bandwidths into models to evaluate the accuracy, which reflects how much attention the models pay to low-frequency information. Results are shown in Table 1 . For natural models, the standard accuracy gradually increases as the bandwidth increases, indicating that the models utilize both low-and high-frequency information, consistent with the findings of Wang et al. (2020a) . SVHN is an exception, as the information in this dataset is mainly concentrated in low-frequency region Bernhard et al. (2021) . So the models trained on SVHN all primarily rely on the low-frequency information for predictions. For L-models, when the bandwidth increases to beyond 16 (32 for Tiny ImageNet), the standard accuracy no longer improves much, which is in line with the expectation that a model trained with low-frequency information focuses mainly on the low-frequency region for predictions. As for robust models, even though a large amount of high-frequency information is removed, there is only a negligible reduction in standard accuracy in the SVHN and CIFAR datasets. For Tiny ImageNet, the high-frequency content can further improve the standard accuracy a little, but the accuracy improvement relies mainly on low-frequency part. Comparing the natural, L-and robust models across different datasets leads to the conclusion that AT enforces the AT-trained model to focus primarily on low-frequency information. Taking the CIFAR-10 as an example, the standard accuracy (81.98%) of the robust model is similar to that of the natural model at a LPF bandwidth of 20 (80.58%). Such observation indicates that the low standard accuracy of natural dataset in the robust model is due to the under-utilization of high-frequency components. Robust Features in the Low-frequency Region. Although both the L-model and robust model rely on low-frequency information for predictions, the L-model has no resistance to PGD-20 (last column in Table 1 ), suggesting that learning the low-frequency information alone does not contribute to robust accuracy. Besides, when the input retains limited low-frequency information, the robust model is more accurate than the natural model, and even more accurate than the L-model in particularly small bandwidth (e.g., 4, 8) except for SVHN. This implies the robust model can extract more useful information from the particularly low-frequency region. To explore what robust information the model is concerned with, we visualize the images after LPF with a small bandwidth of 8, the natural images, and the perturbed images of the CIFAR-10 dataset in Fig. 1 . Other datasets show similar performance, as depicted in Appendix A.1. At k = 8, the robust model achieves a much higher standard accuracy (57.82%) than the natural model (17.93%) with very limited low-frequency information. Compared to natural images, the filtered images retain the outer contours, but the detailed textures are heavily blurred. And for the perturbed images, the texture of the foreground and background is disturbed, while the shapes are almost unaffected. Indeed, Geirhos et al. (2018) show that naturally trained CNNs are strongly biased toward recognizing textures rather than shapes, which accounts for the low standard accuracy for the filtered images and their vulnerability. The robust model can maintain a certain level of standard and robust accuracies, when the texture is heavily smoothed whereas the shape profile is partially preserved. This indicates that AT enables the model to learn a more shape-biased representation, which is more human-like and consistent with the finding of Zhang & Zhu (2019) : AT-CNNs focus more on shape information. Learning shape-biased features improves robustness, and low-frequency information preserves objects' shapes while blurring textures. Consequently, AT induces models to primarily focus on low-frequency information to learn a more shape-biased representation, which gains the robustness. SVHN is a unique dataset wherein the information is primarily concentrated in the low-frequency region. The models obtained on it by either training method rely mainly on low-frequency information for predictions. Perturbations also rely on low-frequency information to maintain their aggressiveness, while high-frequency perturbations barely degrade robust accuracy. For the natural models of CIFAR and Tiny ImageNet datasets, whether the perturbations are processed by LPF or HPF, the robust accuracy decreases as the bandwidth increases until it reaches almost zero. It indicates that perturbations maintain their aggressiveness in both low-and highfrequency parts, which corresponds to the fact that natural models use both low-and high-frequency information for predictions. In CIFAR-10, the perturbation after HPF (green curve) leads to more accuracy degradation compared to LPF (orange curve) at the same bandwidth, which means the high-frequency perturbation is more aggressive. The opposite is true in the CIFAR-100 and Tiny ImageNet. In particular, in Tiny ImageNet, the aggressiveness of the perturbation is mainly concentrated at low frequencies. For natural models utilizing low-and high-frequency information, why is the distribution of perturbation aggressiveness so different across different datasets? We propose a sensitivity hypothesis that white-box attacks can detect spectral bands where the model is sensitive and formulate the attack correspondingly. To prove this assumption, we investigate the sensitivity of models to frequency corruptions via the Fourier heat maps Yin et al. (2019) shown in Fig. 4 . The definition is described in Appendix A.2. A high error rate means that the model is vulnerable to the attacks with the corresponding frequency. The first row of Fig. 4 shows the Fourier heat maps of natural models across different datasets. The CIFAR-10 and CIFAR-100 are sensitive to the low-and high-frequency perturbations, which is consistent with the phenomenon that the perturbations after LPF or HPF can significantly degrade the robust accuracy. Tiny ImageNet is much more vulnerable to the low-frequency perturbations, which corresponds to the fact that perturbation aggressiveness is mainly concentrated at low frequencies. These phenomena validate our assumption: white-box attacks can detect a model-sensitive band of the spectrum and attack it. Figure 4 : Error rate of models on the images perturbed with the spectral perturbations. We evaluate three models on Four datasets. v is the norm of the perturbation. The high error rate represents the high sensitivity to the spectral noise. For the L-models shown in Fig. 3 , which extract information from the low-frequency region, the perturbation relies heavily on low-frequency information to ensure the success of attacks. As for the robust models, there is somewhat symmetry between the red and orange curves. In the CIFAR dataset, the model relies mainly on low-frequency information for predictions, and the perturbation similarly relies on low-frequency parts to degrade the robust accuracy. For the Tiny ImageNet dataset, the model can improve standard accuracy by a small margin with the help of high-frequency information. There is a small decrease in robust accuracy in both the high bandwidth region for LPF and the low bandwidth region for HPF of the perturbation, indicating that the perturbation is somewhat aggressive at high frequencies. The aggressive frequency distribution of the perturbations basically corresponds to the model's attention to the frequency domain. The L-models and robust models are sensitive to the low-frequency perturbations shown in the 2nd and 3rd rows of Fig. 4 , which again concurs with our sensitivity hypothesis. Comparing the Fourier heat maps of the natural and robust models on multiple datasets, it can be concluded that the robust model is less sensitive to spectral perturbations. This suggests methods that reduce the model's sensitivity to frequency corruptions should help improve adversarial robustness. Based on the above experiments, we claim that the white-box attacks are primarily distributed in the frequency domain where the model attends to, and can adjust their aggressive frequency distribution according to the model's sensitivity to frequency corruptions. In short, the white-box attacks can strike the frequency regions where the model's defenses are weak. Indeed, this is a first-ever spectral perspective to explain why white-box attacks are so hard to defend.

4. FREQUENCY REGULARIZATION

Although AT improves robustness, there is still a large gap between standard and robust accuracies. Kannan et al. (2018) propose an adversarial logit pairing that forces the logits of a paired natural and adversarial examples to be similar. Zhang et al. (2019) utilize the classification-calibrated loss to minimize the difference between the prediction of natural and adversarial inputs. Bernhard et al. (2021) apply LPF and HPF to the inputs, and then minimize the output difference between natural and filtered inputs. Tack et al. (2022) improve the robustness by forcing the predictive distributions after attacking from two different augmentations of the same input to be similar. For the first time in literature, we have demonstrated in Section 3.2 that the frequency distribution of a perturbation is related to both the dataset and model, which is not a simple low-or high-frequency phenomenon, and the white-box attack can adapt its aggressive frequency distribution to the target model. Intuitively, a natural idea is to drive the model to limit or tolerate this spectral difference between the outputs subject to a natural input and its adversarial counterpart, and to achieve similar frequency-domain outputs for both types of inputs. By updating the weights through the backpropagation mechanism, this constraint makes the model extract similar spectral features from the adversarial inputs as the natural inputs. Then the robust accuracy will gradually approach the standard accuracy and thus be improved. To achieve this goal, we devise a simple yet effective frequency regularization (FR) to align the difference of the outputs between the natural and adversarial inputs in the frequency domain, as shown in Fig. 5 . The optimization goal of the proposed AT with FR is: L AT = L CE + λ • 1 n n i=1 Dis(F(f 1 (x i )), F(f 2 (x i + δ))), where λ (defaulted at 0.1) denotes the FR coefficient, f 1 , f 2 are the DNNs (same model for the basic FR) and f 2 is used for prediction, Dis denotes the distance function (L 1 norm is used), L CE is the Cross-Entropy loss and F denotes FFT. The distance function is applied to the real and imaginary parts of the complex numbers after FFT, respectively, and the results are summed. FR consists of two branches, one dealing with natural inputs and the other with adversarial inputs. Because the standard accuracy is higher than the robust accuracy, it may reduce the standard accuracy while increasing the robustness. To control the degradation of standard accuracy while maintaining robustness, we need to find the proper model to handle natural inputs. Weight averaging (WA) Izmailov et al. (2018) depicted in Appendix A.3, which averages the weight values over epochs along the training trajectory, proves to be an effective means to improve the generalization of models. In AT, it could be combined with other methods Gowal et al. (2020) ; Chen et al. (2020) to mitigate the robust overfitting problem Rice et al. (2020) . These works use WA to generate the final model for evaluation. During AT, the WA model maintains a similar standard and robust accuracies to the current training model, and its weights are fixed. If we utilize the WA model to deal with natural inputs, the branch of FR/WA processing natural inputs (cf. WA branch in Fig. 5 ) will not update the weights to force the standard accuracy approach the robust accuracy. Therefore, instead of using the WA model for the final evaluation, we utilize the WA model to deal with natural inputs. In simple terms, we replace the f 1 model in Eqn. 2 with the WA model generated during AT.

5. EXPERIMENTS

Datasets. Without loss of generality, we select four common image datasets: SVHN Netzer et al. (2011) , CIFAR-10, CIFAR-100 Krizhevsky et al. (2009) and Tiny ImageNet. We apply 4-pixel padding with 32 × 32 random crop (not for SVHN and Tiny ImageNet) and random horizontal flip (not for SVHN) for data enhancement. All natural images are normalized to [0, 1]. SVHN, CIFAR-10, and CIFAR-100 image resolution is 32×32×3, corresponding to the length, width, and channel, respectively. Tiny ImageNet image resolution is 64 × 64 × 3. Experimental Settings. We take ResNet18 as a default model and adopt a SGD optimizer with a momentum of 0.9 and a global weight decay of 5×10 -4 . The model is trained with PGD-10 for 100 (30)foot_0 epochs with a batch size of 128 on one 3090 GPU. The initial learning rate is 0.1 (0.01), which decays to one-tenth at 75th (15th) and 90th (25th) epochs, respectively. The robust accuracy of the PGD-20 attack equipped with a random-start is taken as the main basis for robustness analysis. The attack step size is α = 2/255 and maximum l ∞ norm-bounded perturbation ϵ = 8/255. FR and WA are used since the first epoch where the learning rate drops, and continues until the end with a cycle length 1. We also show our method is suitable for large-scale models in Appendix A.5. Evaluated Attacks. The model with the highest robust accuracy against PGD-20 is selected for further evaluation. To avoid a false sense of security caused by the obfuscated gradients, we evaluate the robust accuracy against several popular white-box attack methods, including PGD Madry et al. (2017) , C&W Carlini & Wagner (2017) , and AutoAttack Croce & Hein (2020) (denoted as AA, consists of APGD-CE, APGD-DLR, FAB, and Square). Following the default setting of AT, the attack step size is 2/255, and the maximum l ∞ norm-bounded perturbation is 8/255.

5.1. ABLATION STUDIES

Distance Function in FR. For the distance function defines in Eqn. 2, there are three frequently used methods, including L 1 norm, L 2 norm, and cosine similarity. We measure their performance on CIFAR-10. For fair comparison, we use the same checkpoint from the 74th epoch and then apply the different distance functions, respectively. As shown in Figs. 6(a) and 6(b), in terms of improving robustness, L 1 norm is more effective at the expense of a slightly reduced standard accuracy. 

6. CONCLUSION

This work explores the appealing properties of adversarial perturbation and AT from a spectral lens. We find that AT renders the model more focused on shape-biased representation in the lowfrequency region to gain robustness. Using systematic experiments, we show for the first time that the white-box attack can adapt its aggressive frequency distribution to the target model's sensitivity to frequency corruptions, making it hard to defend. To enhance tolerance to frequency-varying perturbations, we further devise a frequency regularization (FR) to align the outputs with respect to natural and adversarial inputs in the spectral domain. Experiments show that FR can substantially improve robust accuracy without extra data. It is believed these novel insights can advance our knowledge about the frequency behavior of AT and shed more light on robust network design.

A.2 FOURIER HEAT MAP

Fourier heat map provides a perturbation analysis method to investigate the sensitivity of models to the frequency corruptions. More precisely, let U i,j ∈ R d1×d2 be a real-valued matrix such that ∥U i,j ∥ 2 = 1, and F F T (U i,j ) only has up to two non-zero elements located at (i, j) and its symmetric coordinate with respect to the center. These matrices are denoted as 2D Fourier basis matrices. Given a model, we can generate the perturbed image X = X + rvU i,j from the natural image X, where r is chosen uniformly at random from {-1, 1}, and v is the norm magnitude of the perturbation. For multi-channel images, we perturb every channel independently. We can then calculate the error rate of the model under Fourier basis noises and visualize how the error rate changes as a function of the spectral indices. The visualization result is called a Fourier heat map. In this paper, We move the low-frequency region to the center of the image. A high error rate means the model is vulnerable to attacks with the corresponding frequency.

A.3 WEIGHT AVERAGING

Following the definition in Izmailov et al. (2018) , the equation of WA is: W n wa = W n-1 wa × k + W n k + 1 where k denotes the number of past checkpoints to be averaged, n denotes the index of the epoch during the training, W n wa denotes the weights of the WA model at n-th epoch, W n denotes the current model's weights. In this paper, The WA model is viewed as a teacher dealing with the natural inputs, and the model that we evaluate the accuracy is viewed as a student. We hope the teacher (WA) helps the student extract useful information from the perturbed images, and we evaluate the robustness of the student model not on the teacher (WA) model.

A.4 ADVERSARIAL PERTURBATIONS AND SPECTRAL DISTRIBUTION

Figures 10-13 show the natural images, adversarial images, and the spectral distribution (low frequency in the center) of the perturbations across the datasets. x denotes the natural images, δ nm , δ lm and δ rm denote the PGD-20 attack perturbations generated according to the natural, L-and robust models, respectively. F F T denotes the Fast Fourier Transform. Jet color map is used to highlight perturbations for clear visualization. For the natural models, the perturbations are a jumble of noise points within the picture and have large magnitudes in the high-frequency region. The perturbations for the adversarial models are significantly more ordered and mainly concentrated in the low-frequency region. These visualizations prove that the adversarial perturbation is not a simple high-frequency phenomenon and is model-and dataset-dependent. 4 . For reference, the ResNet18 and Wide ResNet-34-10 have a parameter count of 11.174M and 46.160M, respectively. For large models, FR and FR/WA can still improve the model's robustness against multiple attacks relative to the standard AT, proving that the proposed methods are suitable for large models.

A.6 MORE COMPARISONS WITH FR

Experimental setup: For a fair comparison, all experiments adopt the same data augmentation method: 4-pixel padding with 32 × 32 random crops (not for SVHN and Tiny ImageNet) and random horizontal flip (not for SVHN). All natural images are normalized to [0, 1]. The Frequency Regularization (FR) coefficient is set to 0.1 for SVHN and CIFAR datasets, and 0.05 for Tiny Ima-geNet. The training set was randomly divided into the training set and the validation set according to the ratio of 9:1. We select the model with the highest robustness against PGD-20 attacks on the validation set for further evaluation against other popular attacks. Effectiveness across the datasets and methods: We provide the thorough performance comparison of AT+FR and other methods on ResNet18 in Table 5 -8. Since the FR and FR/WA are plug-and- play blocks, we also apply them to popular techniques (detailed experimental settings are depicted in Appendix A.7) to prove their effectiveness. Experimental results demonstrate that FR can be plugged into these popular methods to further improve robustness. Besides, FR/WA can maintain a similar standard accuracy as AT while improving robust accuracy. The improvement is non-trivial since some papers have claimed a trade-off between the standard and robust accuracy. 



The numbers in brackets are the hyperparameters for SVHN. The numbers in brackets are the hyperparameters for SVHN.



Figure 1: Visualization of the CIFAR-10 images after LPF with a bandwidth of 8 (top), natural images X (middle), and the perturbed images X+δ (bottom).

Figure 2: Visualization of perturbations in the frequency domain, with low frequency in the center.

Figure 3: Standard accuracy and robust accuracy against PGD-20 attacks for natural, L-and robust ResNet18 models across different datasets. As the bandwidth increases, more input or perturbation information is retained.

Figure 5: An overview of the standard AT, the proposed FR, and its WA extension. δ denotes the perturbation, F denotes the FFT, dis denotes the distance function.

Figure 6: Ablation of distance functions and coefficient λ on CIFAR-10. SA and RA denote the standard and robust accuracy. (a) and (b) show the SA and RA of different distance functions, AT represents the standard AT without FR. (c) shows the impact of λ on SA and RA.

Figure 10: Visualization of the natural and perturbed images on SVHN.

Figure 11: Visualization of the natural and perturbed images on CIFAR-10.

Figure 12: Visualization of the natural and perturbed images on CIFAR-100.

Figure 13: Visualization of the natural and perturbed images on Tiny ImageNet.

Top-1 accuracy(%) of natural, L-and robust ResNet18 models. The bandwidth row denotes the LPF bandwidth (k) applied to the inputs. The higher the value, the more information is retained (i.e., 32 or 64 means no filtering). The last column shows the robust accuracy against PGD-20 attack. Bold numbers indicate the best.

Top-1 robust accuracy(%) against diverse attacks with maximum l ∞ norm-bounded perturbation ϵ = 8/255 of ResNet18 models. Bold numbers indicate the best on different datasets. As shown in Table2, we incorporate the FR and FR/WA into AT to improve the robustness against various attacks on multiple datasets. In particular, for AT, the FR/WA version relatively improves 3.27% and 1.84% of robust accuracy on average against the PGD-20 and AA, respectively, with a much smaller degradation (0.31%) in standard accuracy. These results indicate that our methods are versatile across various datasets. Experiments in Appendix A.6 indicate the FR and FR/WA can be plugged into other defenses to further improve the robustness.Benchmark with Other Defenses. Table3further compares the impact of FR with famous defenses (details of the defenses are reviewed in the Appendix A.7) on the CIFAR-10 dataset. WideResNet-34-10 Zagoruyko & Komodakis (2016)  is the popular model for comparison. The results show that the FR and FR/WA substantially improve robust accuracy compared to the AT and outperform other defenses. Besides, FR/WA improves robustness while maintaining standard accuracy.

Top-1 accuracy(%) of various models on the CIFAR-10. #number indicates the parameters. Bold numbers indicate the best.

Top-1 accuracy(%) of the ResNet18 model on the SVHN. Bold numbers indicate the best.

