RETHINKING UNCERTAINTY IN DEEP LEARNING: WHETHER AND HOW IT IMPROVES ROBUSTNESS

Abstract

Deep neural networks (DNNs) are known to be prone to adversarial attacks, for which many remedies are proposed. While adversarial training (AT) is regarded as the most robust defense, it suffers from poor performance both on clean examples and under other types of attacks, e.g. attacks with larger perturbations. Meanwhile, regularizers that encourage uncertain outputs, such as entropy maximization (EntM) and label smoothing (LS) can maintain accuracy on clean examples and improve performance under weak attacks, yet their ability to defend against strong attacks is still in doubt. In this paper, we revisit uncertainty promotion regularizers, including EntM and LS, in the field of adversarial learning. We show that EntM and LS alone provide robustness only under small perturbations. Contrarily, we show that uncertainty promotion regularizers complement AT in a principled manner, consistently improving performance on both clean examples and under various attacks, especially attacks with large perturbations. We further analyze how uncertainty promotion regularizers enhance the performance of AT from the perspective of Jacobian matrices ∇ X f (X; θ), and find out that EntM effectively shrinks the norm of Jacobian matrices and hence promotes robustness.

1. INTRODUCTION

Deep neural networks (DNNs) have achieved great success in image recognition (Russakovsky et al., 2015) , audio recognition (Graves & Jaitly, 2014), etc. However, as shown by (Szegedy et al., 2013) , DNNs are vulnerable to adversarial attacks, where slightly perturbed adversarial examples can easily fool DNN classifiers. The ubiquitous existence of adversarial examples (Kurakin et al., 2016) casts doubts on real-world DNN applications. Therefore, many techniques are proposed to enhance the robustness of DNNs to adversarial attacks (Papernot et al., 2016; Kannan et al., 2018) . Regarding defenses to adversarial attacks, adversarial training (AT) (Goodfellow et al., 2014; Madry et al., 2017) is commonly recognized as the most effective defense. However, it illustrates poor performance on clean examples and under attacks stronger than what they are trained on (Song et al., 2018) . Recent works (Rice et al., 2020) attribute such shortcomings to overfitting, but solutions, or even mitigation to them remain open problems. Meanwhile, many regularization techniques are proposed to improve robustness, among which two closely related regularizers, entropy maximization (EntM) and label smoothing (LS) show performance improvements under weak attacks (Pang et al., 2019; Shafahi et al., 2019) without compromising accuracy on clean examples. However, as stated by (Uesato et al., 2018; Athalye et al., 2018) , strong and various kinds of attacks should be used to better approximate the adversarial risk of a defense. Therefore, it remains doubtful how they perform under stronger attacks by themselves and how their adversarial risk is. One appealing property of both EntM and LS is that they both penalize over-confident predictions and promote prediction uncertainty (Pereyra et al., 2017; Szegedy et al., 2016) . Therefore, both of them are used as regularizers that mitigate overfitting in multiple applications (Müller et al., 2019) . However, it remains unclear whether EntM and LS can effectively regularize AT, where many other regularizers are ineffective (Rice et al., 2020) . Therefore, in this paper, we perform an extensive empirical study on uncertainty promotion regularizers, i.e. entropy maximization and label smoothing, in the domain of adversarial machine learning. We carry out experiments on both regularizers, with and without AT, on multiple datasets and under various adversarial settings. We found out that, although neither EntM and LS are able to provide consistent robustness by themselves, both regular-izers complement AT and improve its performances consistently. Specifically, we observe not only better accuracy on clean examples, but also better robustness, especially under attacks with larger perturbations (e.g. 9% improvement under perturbation ε = 16/255 for models trained with 8/255 AT. See Table 1 and 3 ). In addition, we investigate the underlying mechanisms about how EntM and LS complement AT, and attribute the improvements to the shrunken norm of Jacobian matrix ∇ X f (X; θ) (∼10 times. See Sect. 6), which indicates better numerical stability and hence better adversarial robustness than AT. To summarize, we make the following contributions: 1. We carry out an extensive empirical study about whether uncertainty promotion regularizers, i.e. entropy maximization (EntM) and label smoothing (LS), provide better robustness, and find out that while neither of them provides consistent robustness under strong attacks alone, both techniques serve as effective regularizers that improve the performance of AT consistently on multiple datasets, especially under large perturbations. 2. We provide further analysis on uncertainty promotion regularizers from the perspective of Jacobian matrices, and find out that by applying such regularizers, the norm of Jacobian matrix ∇ X f (X; θ) is significantly shrunken, leading to better numerical stability and adversarial robustness. shows that AT overfits on the adversary it is trained on and performs poorly under other attacks.

2. RELATED WORK AND DISCUSSIONS

Uncertainty Promotion Techniques in Machine Learning. The principle of maximum entropy is widely accepted in not only physics, but also machine learning, such as reinforcement learning (Ziebart et al., 2008; Haarnoja et al., 2018) . In supervised learning, uncertainty promotion techniques, including entropy maximization (Pereyra et al., 2017) and label smoothing (Szegedy et al., 2016) act on model outputs and can improve generalization in many applications, such as image classification and language modeling (Pereyra et al., 2017; Müller et al., 2019; Szegedy et al., 2016) . There are only several works that study uncertainty promotion techniques in the field of adversarial machine learning. ADP (Pang et al., 2019) et al., 2020) . We also discuss knowledge distillation (KD) (Hinton et al., 2015) as another line of work that introduces uncertainty. KD is a procedure that trains a student model to fit the outputs of a teacher network, which are also 'soft' labels and introduce uncertainty. However, KD alone has been shown not to improve robustness (Carlini & Wagner, 2016) . Also, while (Goldblum et al., 2020) proposed adversarially robust distillation (ARD), combining AT and KD, ARD primarily aims to build robust



claims that diversity promotion in ensembles can promote robustness, for which entropy maximization is used. Essentially, ADP consists of multiple mutually independent models that in total mimic the behavior of entropy maximization. However, ADP only shows improvements under weak attacks, e.g. PGD with 10 iterations. Shafahi et al. (2019) claims that label smoothing can improve robustness beyond AT by itself, yet the claim remains to be further investigated under a wider range of adversaries, e.g. adaptive attacks and black-box attacks. As stated by(Uesato et al., 2018; Athalye et al., 2018)  that, one should use strong and various attackers for better approximation of the adversarial risk(Uesato et al., 2018), it remains unknown how much robustness EntM or LS alone can provide under strong and diverse attacks. Moreover, neither of these works study the relationship between AT and uncertainty promotion, which is an important open problem since many regularizers fail to mitigate the overfitting of AT (Rice

