RETHINKING UNCERTAINTY IN DEEP LEARNING: WHETHER AND HOW IT IMPROVES ROBUSTNESS

Abstract

Deep neural networks (DNNs) are known to be prone to adversarial attacks, for which many remedies are proposed. While adversarial training (AT) is regarded as the most robust defense, it suffers from poor performance both on clean examples and under other types of attacks, e.g. attacks with larger perturbations. Meanwhile, regularizers that encourage uncertain outputs, such as entropy maximization (EntM) and label smoothing (LS) can maintain accuracy on clean examples and improve performance under weak attacks, yet their ability to defend against strong attacks is still in doubt. In this paper, we revisit uncertainty promotion regularizers, including EntM and LS, in the field of adversarial learning. We show that EntM and LS alone provide robustness only under small perturbations. Contrarily, we show that uncertainty promotion regularizers complement AT in a principled manner, consistently improving performance on both clean examples and under various attacks, especially attacks with large perturbations. We further analyze how uncertainty promotion regularizers enhance the performance of AT from the perspective of Jacobian matrices ∇ X f (X; θ), and find out that EntM effectively shrinks the norm of Jacobian matrices and hence promotes robustness.

1. INTRODUCTION

Deep neural networks (DNNs) have achieved great success in image recognition (Russakovsky et al., 2015) , audio recognition (Graves & Jaitly, 2014) , etc. However, as shown by (Szegedy et al., 2013) , DNNs are vulnerable to adversarial attacks, where slightly perturbed adversarial examples can easily fool DNN classifiers. The ubiquitous existence of adversarial examples (Kurakin et al., 2016) casts doubts on real-world DNN applications. Therefore, many techniques are proposed to enhance the robustness of DNNs to adversarial attacks (Papernot et al., 2016; Kannan et al., 2018) . Regarding defenses to adversarial attacks, adversarial training (AT) (Goodfellow et al., 2014; Madry et al., 2017) is commonly recognized as the most effective defense. However, it illustrates poor performance on clean examples and under attacks stronger than what they are trained on (Song et al., 2018) . Recent works (Rice et al., 2020) attribute such shortcomings to overfitting, but solutions, or even mitigation to them remain open problems. Meanwhile, many regularization techniques are proposed to improve robustness, among which two closely related regularizers, entropy maximization (EntM) and label smoothing (LS) show performance improvements under weak attacks (Pang et al., 2019; Shafahi et al., 2019) without compromising accuracy on clean examples. However, as stated by (Uesato et al., 2018; Athalye et al., 2018) , strong and various kinds of attacks should be used to better approximate the adversarial risk of a defense. Therefore, it remains doubtful how they perform under stronger attacks by themselves and how their adversarial risk is. One appealing property of both EntM and LS is that they both penalize over-confident predictions and promote prediction uncertainty (Pereyra et al., 2017; Szegedy et al., 2016) . Therefore, both of them are used as regularizers that mitigate overfitting in multiple applications (Müller et al., 2019) . However, it remains unclear whether EntM and LS can effectively regularize AT, where many other regularizers are ineffective (Rice et al., 2020) . Therefore, in this paper, we perform an extensive empirical study on uncertainty promotion regularizers, i.e. entropy maximization and label smoothing, in the domain of adversarial machine learning. We carry out experiments on both regularizers, with and without AT, on multiple datasets and under various adversarial settings. We found out that, although neither EntM and LS are able to provide consistent robustness by themselves, both regular-izers complement AT and improve its performances consistently. Specifically, we observe not only better accuracy on clean examples, but also better robustness, especially under attacks with larger perturbations (e.g. 9% improvement under perturbation ε = 16/255 for models trained with 8/255 AT. See Table 1 and 3 ). In addition, we investigate the underlying mechanisms about how EntM and LS complement AT, and attribute the improvements to the shrunken norm of Jacobian matrix ∇ X f (X; θ) (∼10 times. See Sect. 6), which indicates better numerical stability and hence better adversarial robustness than AT. To summarize, we make the following contributions: 1. We carry out an extensive empirical study about whether uncertainty promotion regularizers, i.e. entropy maximization (EntM) and label smoothing (LS), provide better robustness, and find out that while neither of them provides consistent robustness under strong attacks alone, both techniques serve as effective regularizers that improve the performance of AT consistently on multiple datasets, especially under large perturbations. 2. We provide further analysis on uncertainty promotion regularizers from the perspective of Jacobian matrices, and find out that by applying such regularizers, the norm of Jacobian matrix ∇ X f (X; θ) is significantly shrunken, leading to better numerical stability and adversarial robustness.

2. RELATED WORK AND DISCUSSIONS

Adversarial Attacks and Defenses. (Szegedy et al., 2013; Goodfellow et al., 2014) pointed out the vulnerability of DNNs to adversarial examples, and proposed an efficient attack, the FGSM, to generate such examples. Since then, as increasingly strong adversaries Carlini & Wagner (2017) are proposed, AT (Madry et al., 2017) is considered as the most effective defense remaining. More recently, Zhang et al. (2019) proposes TRADES, extending AT and showing better robustness than Madry et al. (2017) . We refer readers to (Yuan et al., 2019; Biggio & Roli, 2018) for more comprehensive surveys on adversarial attacks and defenses. However, all AT methods illustrate common drawbacks. Specifically, Rice et al. (2020) shows that AT suffers from severe overfitting that cannot be easily mitigated, which leads to multiple undesirable consequences. For example, AT performs poorly on clean examples, and Song et al. (2018) shows that AT overfits on the adversary it is trained on and performs poorly under other attacks. Uncertainty Promotion Techniques in Machine Learning. The principle of maximum entropy is widely accepted in not only physics, but also machine learning, such as reinforcement learning (Ziebart et al., 2008; Haarnoja et al., 2018) . In supervised learning, uncertainty promotion techniques, including entropy maximization (Pereyra et al., 2017) and label smoothing (Szegedy et al., 2016) act on model outputs and can improve generalization in many applications, such as image classification and language modeling (Pereyra et al., 2017; Müller et al., 2019; Szegedy et al., 2016) . There are only several works that study uncertainty promotion techniques in the field of adversarial machine learning. ADP (Pang et al., 2019) claims that diversity promotion in ensembles can promote robustness, for which entropy maximization is used. Essentially, ADP consists of multiple mutually independent models that in total mimic the behavior of entropy maximization. However, ADP only shows improvements under weak attacks, e.g. PGD with 10 iterations. Shafahi et al. (2019) claims that label smoothing can improve robustness beyond AT by itself, yet the claim remains to be further investigated under a wider range of adversaries, e.g. adaptive attacks and black-box attacks. As stated by (Uesato et al., 2018; Athalye et al., 2018) that, one should use strong and various attackers for better approximation of the adversarial risk (Uesato et al., 2018) , it remains unknown how much robustness EntM or LS alone can provide under strong and diverse attacks. Moreover, neither of these works study the relationship between AT and uncertainty promotion, which is an important open problem since many regularizers fail to mitigate the overfitting of AT (Rice et al., 2020) . We also discuss knowledge distillation (KD) (Hinton et al., 2015) as another line of work that introduces uncertainty. KD is a procedure that trains a student model to fit the outputs of a teacher network, which are also 'soft' labels and introduce uncertainty. However, KD alone has been shown not to improve robustness (Carlini & Wagner, 2016) . Also, while (Goldblum et al., 2020) proposed adversarially robust distillation (ARD), combining AT and KD, ARD primarily aims to build robust models upon robust pre-trained teacher models, while we primarily study uncertainty and do not rely on pre-trained teacher models. Therefore we consider ARD to be orthogonal to our work.

3. PRELIMINARIES

Notations. We denote a dataset D = {X (i) , y (i) } n i=1 , where X (i) ∈ R d are input data, y (i) ∈ R C are one-hot vectors for labels. We denote a DNN parameterized by θ as f (X; θ) : R d → R C , which outputs logits on C classes. We denote f σ (X; θ) = σ • f (X; θ), the network followed by softmax activation. We denote the cross entropy loss as CE(ŷ, y) = -C i=1 y i log ŷi , the Shannon entropy as H(p) = -C i=1 p i log p i and the Kullback-Leibler divergence as D KL (p q) = C i=1 p i log pi qi . Adversarial Attacks and Adversarial Training. Given a data-label pair (X, y) and a neural network f (X; θ), adversarial attacks aim to craft an adversarial example X (adv) , X (adv) -X ≤ ε, so as to fool f (X; θ). Such goal can be formulated as the problem: X (adv) = arg max X L atk (f σ (X ; θ), y), subject to X -X ≤ ε. ( ) where L atk is some loss function, generally taken as CE. Eqn. 1 can generally be solved via iterative optimization, such as Projected Gradient Descent (PGD) (Madry et al., 2017) . Adversarial Training (AT) is generally considered the most effective defense against adversarial attacks, where the training set consists of both clean examples and adversarial examples, and the following objective is minimized (Goodfellow et al., 2014) , min θ E (X,y)∼D αCE (f σ (X; θ) , y) + (1 -α) max X (adv) -X ≤ε CE f σ (X (adv) ; θ), y , where the inner maximization is commonly solved via PGD. We specifically denote AT which uses PGD to solve the inner problem as PAT (Madry et al., 2017) . TRADES (Zhang et al., 2019) extends PAT by optimizing Eqn. 3 in similar min-max styles, leading to better robustness than PAT. adv) ; θ) f σ (X; θ) . min θ E (X,y)∼D CE(f σ (X; θ), y) + β max X (adv) -X ≤ε D KL f σ (X ( (3)

4. METHODOLOGY

4.1 ENTROPY MAXIMIZATION Given an example (X, y) and a neural network f (X; θ), the proposed entropy maximization (EntM) minimizes the cross entropy loss, while maximizing the Shannon entropy of the output probability. L entm (X, y) = CE (f σ (X; θ), y) -λH (f σ (X; θ)) , where λ is a hyperparameter controlling the strength of entropy maximization. For normal training, we minimize over E (X,y)∼D [L entm (X, y)]; for AT, such as PAT and TRADES, we replace CE with L entm in Eqn. 2, which for PAT yields Eqn. 5, and similarly for TRADES: min θ E (X,y)∼D αL entm (f σ (X; θ) , y) + (1 -α) max X (adv) -X ≤ε L entm f σ (X (adv) ; θ), y , The entropy maximization term penalizes for over-confident outputs and encourages uncertain predictions, which is different from CE encouraging one-hot outputs. Such a formulation is also seen in (Pereyra et al., 2017; Dubey et al., 2018) in general deep learning and fine-grained classification.

4.2. CONNECTION WITH LABEL SMOOTHING

Label smoothing is also a widely used technique that prevents assigning over-confident predictions. Given a data-label pair (X, y), and denote the uniform distribution over C classes as u C , the objective of label smoothing can be formulated as y smooth = y -γ(y -u C ), L smooth (X, y) = CE ((f σ (X; θ), y smooth ) . Label smoothing and entropy maximization are intrinsically similar. Specifically, both label smoothing and entropy maximization penalize over-confident outputs by: L smooth (X, y) = (1 -γ) CE(f σ (X; θ), y) + γD KL (u C f σ (X; θ)) + γ log C. (7) L entm (X, y) = CE(f σ (X; θ), y) + λD KL (f σ (X; θ) u C ) -λ log C. Therefore EntM and LS are connected, with differences in the asymmetric D KL (f σ (X; θ) u C ) and D KL (u C f σ (X; θ)). In our paper, we carry out experiments on LS and EntM for comparison.

Datasets and Models

We utilize four commonly used datasets for evaluation, CIFAR-10, CIFAR-100, SVHN and MNIST. We train ResNet18, a common selection in adversarial training (Madry et al., 2017) for CIFAR-10, SVHN and CIFAR-100, and a four-layer CNN for MNIST. Details about training settings and architectures can be found in Appendix A.

Comparison Models

We study the robustness of the following models without adversarial training. • Normal models trained using standard cross entropy. We denote them as Normal. • ADP (Pang et al., 2019) , where they built ensembles of models, regularized by maximizing their ensemble diversity. Entropy maximization is involved in the ensemble diversity term. We use 3 models as the ensemble, and take the recommended parameters (2, 0.5). • Models trained with EntM in Eqn. 4. We set λ = 2 for all datasets following ADP who also set the entropy term to 2. • Models trained with label smoothing (LS). We choose γ such that the non-maximal predictions are the same as EntM on each dataset, i.e. γ = 0.74 for CIFAR-10 and SVHN, and γ = 0.84 for CIFAR-100. We also study the robustness of the following models with adversarial training. All adversaries are taken as L ∞ adversary with ε = 8/255, step size 2/255 with 7 steps for CIFAR-10, CIFAR-100 and SVHN, and ε = 48/255, step size 6/255 with 10 steps for MNIST. • AT (Eqn. 2) with PGD adversary, (PAT). We take α = 0.5 as in (Goodfellow et al., 2014) . • PAT with EntM objective (Eqn. 5), PAT-EntM. We set identical λ as in EntM.. • PAT with label smoothing (Eqn. 6), PAT-LS. We set identical γ values as in LS. • TRADES (Zhang et al., 2019) , which shows better robustness than PAT. We take β = 3.foot_0  Threat models. Without explicit mentioning, we focus on untargeted L ∞ adversaries. We consider PGD adversaries with CE objective (PGD and FGSM), and also CW adversaries (Carlini & Wagner, 2017) , i.e. attacks using CW objective and optimized by PGDfoot_1 . We consider both white and blackbox settings. In black-box settings, adversarial examples are crafted on a proxy model trained on identical data and then transferred to the target model (Papernot et al., 2017) . We show test accuracy under various attacks on CIFAR-10, CIFAR-100, SVHN and MNIST in 

5.2.1. WITHOUT ADVERSARIAL TRAINING

We first analyze the case where no adversarial training is used, namely, EntM, ADP and LS, whose robustness under diverse and strong attacks is under question. We make the following findings. • Under attacks, EntM and LS perform better than ADP in CIFAR-10 and CIFAR-100, and in SVHN, LS performs better than ADP, while EntM performs similarly as ADP. Therefore we focus our analysis on EntM and LS instead of ADP. • EntM and LS show gradient obfuscation under weak attacks. On CIFAR-10 and SVHN, white-box PGD10 attacks succeed less often than black-box ones (See the area where gradient obfuscation happens and find adversarial examples. We provide more evidence of gradient obfuscation in Appendix C. • EntM and LS improve robustness only under small perturbations. We carry out evaluations on PGD40. It can be shown that improvements over random guessing can only be seen under PGD40-4 on CIFAR-10 and SVHN (Table 1 and 3 ). Therefore, the claims by (Pang et al., 2019; Shafahi et al., 2019) that ADP and LS improve robustness are conditional, and EntM, LS and ADP are only robust under small perturbations.

5.2.2. WITH ADVERSARIAL TRAINING

We study AT regularized by uncertainty promotion regularizers and make the following findings. • PAT-EntM and PAT-LS outperform PAT on clean examples by 1% on CIFAR-10, 3% on CIFAR-100, and under strong attacks like PGD40, outperform PAT by 2-9% on CIFAR-10, and 3-6% on CIFAR-100. See Clean and PGD40 in Table 1 , 3, 2. • PAT-EntM achieves generally similar or better performance than its counterpart PAT-LS, e.g. on CIFAR-10 and CIFAR-100, PAT-EntM achieves 1% better adversarial accuracy, while PAT-LS achieves <1% better accuracy on clean examples, see Table 1 and 2 . Therefore for the rest of the paper we primarily focus on PAT-EntM. • Robustness improvements are most evident under large perturbations. For example, accuracies under PGD40-16 improved by 9% on CIFAR-10 and SVHN from PAT to PAT-EntM, see Table 1 and 3 , which is exactly the opposite from where AT is not used. • PAT-EntM is generally more robust than TRADES by over 3% on CIFAR-10 and CIFAR-100, and is similarly robust as TRADES on SVHN, further showing the effectiveness of uncertainty promotion regularizers. • Results under CW (Table 1 and 2 ) show that PAT-EntM is still more robust than PAT (by 7% on CIFAR-10 and 3% on CIFAR-100), and TRADES (except CIFAR10, PGD40-8). The results under a different adversary further verifies the robustness improvement. In addition, we train WideResNet34-10 (Zagoruyko & Komodakis, 2016) following TRADES, and evaluate the robustness of PAT, TRADES and PAT-EntM. We show the results in Appendix D, Table 11 . Following (Athalye et al., 2018) , we perform the following sanity checks to make sure PAT-EntM is not causing gradient obfuscation, and to better approximate and evaluate its adversarial risk. Perturbation-Accuracy Curve. We expand perturbations to test whether the robust accuracy monotonically decreases, and whether sufficiently large perturbations lead to 0 accuracy. We utilize PGD with ε = k/255, step size 2/255 and k iterations, k ∈ [10, 150], on PAT, PAT-EntM and TRADES, on CIFAR-10 and CIFAR-100. We show the results in Fig. 1 . On both datasets and under all ε, PAT-EntM outperforms PAT and TRADES. Moreover, under sufficiently large ε, all accuracies reach 0, which is another evidence that PAT-EntM is not obfuscating gradients. Attacks with more iterations. We perform attacks with fixed ε and more iterations to make sure attacks converge, and to reliably evaluate the robustness of PAT-EntM. We take ε = 8/255, 16/255, with # steps k ∈ [8, 256], step size max(1/510, 2ε/k) on CIFAR-10 and CIFAR-100. We show the results in Fig. 2 . It can be shown that even when the attacks converge at 200 steps, PAT-EntM still consistently outperform TRADES (over 4% on both CIFAR-10 and CIFAR-100). Moreover, PAT-EntM is consistently similarly robust or more robust than TRADES. Adaptive Attacks. We perform adaptive attacks on PAT-EntM. Due to its connection with label smoothing, we leverage CE with smoothed labels in Eqn. 6 as the loss function L atk in Eqn. 1. We search for the best parameter γ for each attack from [0, 1] at an interval of 0.1. We consider ε = 8/255, 16/255 on CIFAR-10 and CIFAR-100, and plot corresponding curves also in Fig. 2 . With properly selected smooth parameter γ, adaptive attacks can succeed more often than nonadaptive ones. However, even under strong adaptive attacks, we can still see improvements of PAT-EntM over PAT (7% on CIFAR-10 and 4% on CIFAR-100, ε = 16/255) and in most cases over TRADES, except for CIFAR-10, ε = 8/255, where robust accuracies of PAT-EntM and TRADES are 45.05% and 45.21%, respectively.

5.3. HYPERPARAMETER ANALYSIS

We vary the hyperparameter λ in Eqn. 4 and 5 to see how the uncertainty level λ influences accuraries on both clean and adversarial examples. We vary λ ∈ [0, 10] on CIFAR-10 and CIFAR-100. We plot the accuracies on clean examples and under PGD40-8 in Fig. 3 . The results show that on both datasets, accuracies on both clean and adversarial examples simultaneously increase with λ from 0.1 to 5. This is an appealing property and contrary to the belief that accuracy and robustness are at odds with each other (Zhang et al., 2019; Tsipras et al., 2018) . We also observe in Fig. 3 that on both datasets, λ = 2 and λ = 5 achieve similar performances (within 0.5%) in terms of both accuracy and robustness, which shows that EntM is not highly sensitive to hyperparameters. 

6. HOW UNCERTAINTY PROMOTION WORKS -FURTHER ANALYSIS

In this section we dig deeper into how uncertainty promotion works, which may bring deeper insights towards more robust algorithms. For brevity, we use f (X) as a shorthand for f (X; θ) and omit θ. We define M f,X = f (X) y -max i =y f (X) i as the decision margin of f at point X, and introduce Theorem 1 connecting robustness with M f,X and the Lipschitz constant of f . Theorem 1 (Tsuzuku et al. (2018) ). Denote the global Lipschitz constant of f (X; θ) as l f . The following holds for any data point X, and any perturbation δ. M f,X ≥ √ 2 δ 2 l f ⇒ M f,X+δ ≥ 0. Theorem 1 shows that, for an arbitrary noise δ, as long as δ 2 ≤ M f,X √ 2l f , the model f will output correct predictions on X +δ. Therefore, we focus on the metric normalized margin, defined as M f,X l f for analyzing robustness. However, as l f is hard to compute, we then introduce an approximation for the analysis. We focus on the first-order approximation of f (X), f (X ) ≈ f (X) + ∇ X f (X) T (X -X) If the approximation holds well locally, we have the following inequality: f (X ) -f (X) 2 ≈ ∇ X f (X) T (X -X) 2 ≤ ∇ X f (X) 2 X -X 2 , ∇ X f (X) 2 ≥ f (X) -f (X ) 2 X -X 2 (12) where ∇ X f (X) 2 is the spectral norm (also the largest singular value) of the Jacobian matrix. Eqn. 12 shows that ∇ X f (X) 2 locally upper bounds the local Lipschitz constant of f at point X (the R.H.S of Eqn.12). Therefore, we compute ) 2 via differentiation to analyze robustness. However, since f (X) is non-linear, we define another term Q f (X , X) describing how much f (X) deviates from the linear approximation, to account for cases where Eqn.10 approximates poorly. M f,X ∇ X f (X Q f (X , X) = f (X ) -f (X) -∇ X f (X) T (X -X) 2 ∇ X f (X) T (X -X) 2 , X -X 2 ≤ ε. ( ) We sample 2,000 test samples from CIFAR-10, and study the relationship among normalized margins, non-linearity Q f (X , X), and robustness. We use L 2 adversaries in correspondence with Theorem 1. We list related results in Table 4 . With EntM, ∇ X f (X) 2 is shrunken by over 10 times regardless whether AT is used, leading to a 4-time increase of normalized margin in both EntM and PAT-EntM. However, while EntM alone achieves high regularized margin, it also massively compromises local linearity compared to Normal, as shown by the high Q f (X (adv) , X). Therefore, the area where the linear approximation in Eqn. 10-12 holds is limited, thereby unable to guarantee robustness. By contrast, PAT-EntM only worsens local linearity slightly compared to PAT, and therefore the enlarged normalized margin contributes to better robustness.

Models

Normal EntM PAT PAT-EntM M f,X 10.55 1.3054 6.12 1.12 ∇ X f (X) 2 72.26 3.44 10.69 0.84 M f,X ∇ X f (X) 2 0.1919 2.8132 0.6398 2.9913 Accuracy (PGD20-0.5) 0.0029 0.166 0.5909 0.6142 ) 2 , X (adv) -X 2 0.8327 0.4794 0.8216 0.8131 Table 4 : Analysis of margin M f,X , Jacobian matrix ∇ X f (X; θ) and non-linearity Q f (X (adv) , X). We further show that normalized margin, along with local linearity explains better robustness. We leverage an infinite L 2 attack on each example X until misclassification. We then compute the distortions X (adv) -X 2 , and study the relationship between distortions and normalized margins. A more robust model should require a larger distortion to successfully attack. Q f (X (adv) , X) 4.4 52.51 4.59 5.36 Correlation M f,X ∇ X f (X |X (adv) X|2 EntM, Q f (X (adv) , X) > 65 EntM, Q f (X (adv) , X) < 65 (b) EntM, High and low Q f (X (adv) , X) We plot the normalized margins and distortions in Fig 4(a) , and show their Pearson correlations in Table 4 . On Normal, PAT, PAT-EntM, the correlation is much higher than that on EntM. Also, Fig. 4 (a) shows that although EntM has similar normalized margins as PAT-EntM, the distortions required are smaller on EntM than on PAT-EntM. Since linearity is not only the key in the derivations of Eqn. 12, but also a major difference of PAT-EntM from EntM, these results indicate that high non-linearity compromises robustness on EntM. We further separate points with Q f (X (adv) , X) > 65 on EntM, which is the 40-percentile, and plot them separately in Fig. 4(b) . Samples with higher local non-linearity have higher normalized margins as PAT-EntM, but no higher distortions and hence no better robustness. All results show that only when combined with local linearity, normalized margins explains robustness.

7. CONCLUSION

In this paper, we revisit uncertainty promotion regularizers in the field of adversarial learning. We find out that uncertainty promotion regularizers alone are causing gradient obfuscation, and that they alone only provide inconsistent robustness against small perturbations. Contrarily, our extensive experiments demonstrate that uncertainty promotion regularizers augment AT, improving accuracy on clean examples and enhancing robustness, especially under large perturbations. We further demonstrate that both good local linearity and shrunken norm of Jacobian matrices contribute to better robustness shown by PAT-EntM than PAT. We hope that this paper would again underscore the necessity to evaluate under strong attacks, and raise attention to further insights about why uncertainty promotion regularizers work. We consider theoretical investigations of why they work well alongside AT as an important future work. 

C GRADIENT OBFUSCATION OF LS AND ENTM WITHOUT AT

In this section we present more evidence regarding gradient obfuscation of EntM and LS.

C.1 RANDOM SEARCHING

We present other evidence that EntM, LS and ADP alone may sometimes lead to gradient obfuscation. We sample 1000 images from the test set of CIFAR-10. We carry out random search attacks for 10,000 times on each image where EntM, LS and ADP succeeded in defending, and list the success rate of random attacks in We also perform exactly the same experiments for PAT and PAT-EntM. For samples where PAT and PAT-EntM succeeded in defending, none of them can be successfully attacked via a 10,000-time random search attack. It endorses that PAT-EntM is not suffering from gradient obfuscation.

C.2 LOSS SURFACE VISUALIZATION

We choose data-label pairs (X, y) from the CIFAR-10 test set, and plot its local loss surface at X = X + 1 d 1 + 2 d 2 to visually show the properties of EntM and PAT-EntM. d 1 = sign (∇ X CE(f σ (X; θ), y) is the gradient sign direction, and d 2 is a random direction with d 2 ∞ = 1. We choose 1 , 2 ∈ [-0.04, 0.04]. Among the first 8 samples in the CIFAR-10 test set, we select 2 of them and plot them in Fig. 5 . It can be shown that EntM alone is likely (25% would be a rough estimation. ) to create a 'hole' in a small neighborhood of X, thereby causing gradient-based iterative attackers to stuck. This result corresponds with the fact that weak gradient-based attackers cannot achieve a high success rate due to gradient obfuscation. It also corresponds to the results in Table 4 that EntM alone may compromise local linearity by creating a highly curved surface. We also plot the decision margin M f,X at X = X + 1 d 1 + 2 d 2 in Fig. 6 . It can be shown that although EntM alone creates a small neighborhood where the margin is large, the area of the neighborhood is small, leaving large areas where M f,X = 0. It shows that EntM alone cannot ensure robustness, especially when perturbations are large, and corresponds with results in Table 1 , 2, 3.

D RESULTS ON WIDERESNET34-10

Following TRADES (Zhang et al., 2019) , we train WideResNet34-10 (Zagoruyko & Komodakis, 2016) with PAT, PAT-EntM and TRADES on CIFAR-10 and CIFAR-100 using the same procedure as Appendix A. We use λ = 1 for EntM on CIFAR-10, and λ = 0.5 on CIFAR-100, which is chosen from [0.5, 1, 2, 5]. We report accuracy and robustness results in Table 11 . 



We also did experiments on TRADES-EntM, i.e. TRADES trained with entropy maximization objective. The results are shown in Table12and 13 in Appendix E. The C&W adversary is implemented and carried out exactly as(Zhang et al., 2020). Note that ADP builds ensemble models upon probability outputs, CW attacks operating on logits cannot be carried out. Results on MNIST, and results on PGD20 are shown in Appendix B. https://github.com/yaodongyu/TRADES/blob/master/models/resnet.py



Figure 1: Accuracy-ε curve on CIFAR-10 and CIFAR-100. x-axis values denote pixel changes.

Figure 3: Hyperparameter Analysis for λ.

Figure 4: Relationship between normalized margin and distortion for EntM, PAT and PAT-EntM.

Figure 5: Loss surface visualization at X = X + 1 d 1 + 2 d 2 for for two X taken from the first 8 images in CIFAR-10 test set.

2, 3 and 9 3 . We use FGSMk to denote FGSM with ε = k/255, PGDp -k as p-step PGD attacks with ε = k/255, step size k/2550, and CWp-k as similar to PGDp-k except for the attack objective. We adopt random initialization with Gaussian X = X + N (0, 0.005) with 5 restarts for all attacks. For black-box attacks, adversarial examples are crafted on a Normal model for defenses without AT, and on a PAT model for those with AT. Best accuracies higher than random are bolded.

Performance under various attacks on CIFAR-10 (%). 0 denotes < 0.1%. Italics indicate white-box accuracies higher than corresponding black-box ones.

Performance under various attacks on CIFAR-100 (%).

Performance under various attacks on SVHN (%). Italics indicate white-box accuracies higher than corresponding black-box ones.

Performance under various attacks on CIFAR-10 (%). 0 denotes < 0.1%. Italics indicate white-box accuracies higher than corresponding black-box ones.We show full experimental results in 6, 7, 8 and 9. Although in MNIST, the improvement is less significant (1% or below), and TRADES significantly outperforms both PAT and PAT-EntM, PAT-EntM still achieves improvement over AT in terms of robustness under various attacks.

Performance under various attacks on CIFAR-100 (%). 0 denotes < 0.1%

Performance under various attacks on SVHN (%). 0 denotes < 0.1%. Italics indicate white-box accuracies higher than corresponding black-box ones.

In many seemingly successful defenses (15% on LS, 6% on ADP and EntM), gradient-based attackers are obfuscated that they cannot find adversarial examples while such examples do exist.

Performance under various attacks on MNIST (%).

Attack success rate of random searching attacks on CIFAR-10.

annex

• Max pooling with stride 2.• Flatten.• FC layer with input shape 320 and output 10. ReLU activation.• FC layer with input shape 50 and output shape 10. We list all training settings in Table 5 . For each quantitative result, we train 3 models, and each PGD is repeated 3 times for average.For datasets, we perform standard data augmentation techniques, including random cropping and random horizontal flipping (on CIFAR-10 and CIFAR-100).

