A CLOSER LOOK AT DUAL BATCH NORMALIZATION AND TWO-DOMAIN HYPOTHESIS IN ADVERSARIAL TRAINING WITH HYBRID SAMPLES Anonymous

Abstract

There is a growing concern about applying batch normalization (BN) in adversarial training (AT), especially when the model is trained on both adversarial samples and clean samples (termed Hybrid-AT). With the assumption that adversarial and clean samples are from two different domains, a common practice in prior works is to adopt dual BN, where BN adv and BN clean are used for adversarial and clean branches, respectively. A popular belief for motivating dual BN is that estimating normalization statistics of this mixture distribution is challenging and thus disentangling it for normalization achieves stronger robustness. In contrast to this belief, we reveal that what makes dual BN effective mainly lies in its two sets of affine parameters. Moreover, we demonstrate that the domain gap between adversarial and clean samples is actually not very large, which is counter-intuitive considering the significant influence of adversarial perturbation on the model. Overall, our work sheds new light on understanding the mechanism of dual BN in Hybrid-AT as well as its underlying two-domain hypothesis. Recommended practices are summarized as takeaway insights for future practitioners.

1. INTRODUCTION

Adversarial training (AT) (Madry et al., 2018 ) that optimizes the model on adversarial examples is a time-tested and effective technique for improving robustness against adversarial attack. Beyond classical AT (also termed Madry-AT) (Madry et al., 2018) , a common AT setup is to train the model on both adversarial samples and clean samples (termed Hybrid-AT) (Goodfellow et al., 2015; Kannan et al., 2018; Xie et al., 2020a) . Batch normalization (BN) (Ioffe & Szegedy, 2015) has become a de facto standard component in modern deep neural networks (DNNs) (He et al., 2016; Huang et al., 2017; Zhang et al., 2019a; 2021) , however, there is a notable concern regarding how to use BN in the Hybrid-AT setup. The concern mainly stems from a two-domain hypothesis: "clean images and adversarial images are drawn from two different domains" (Xie & Yuille, 2020). Guided by this hypothesis, a technique has been proposed to disentangle the mixture distribution of the two domains by applying a separate BN for each domain (Xie & Yuille, 2020). The above technique has been adopted in multiple works with different names, such as auxiliary BN (Xie et al., 2020a) , mixture BN (Xie & Yuille, 2020), Dual BN (Jiang et al., 2020; Wang et al., 2020; 2021) . Despite different names, they refer to the same practice of adopting BN adv and BN clean for adversarial and clean samples, respectively. To avoid confusion, we stick to using Dual BN for the remainder of this work. Despite its increasing popularity, the mechanism of how dual BN helps Hybrid-AT remains not fully clear. Towards a better understanding of the underlying mechanism, we first revisit a long-held belief motivated by the two-domain hypothesis (Xie & Yuille, 2020). Specifically, (Xie & Yuille, 2020) justifies the necessity of dual BN in hybrid AT with the following claim (quoted from the abstract of (Xie & Yuille, 2020)): "Estimating normalization statistics of the mixture distribution is challenging" and "disentangling the mixture distribution for normalization, i.e., applying separate BNs to clean and adversarial images for statistics estimation, achieves much stronger robustness." The underlying motivation for the above claim is that BN statistics calculated on clean domain are incompatible with training the model on adversarial domain, and vice versa. Therefore, Hybrid-AT with single BN suffers from such incompatibility with the mixed distribution for calculating the normalization statistics. Meanwhile, it is claimed in (Xie & Yuille, 2020) that this incompatibility can be avoided by Dual BN through training the clean branch on BN clean and the adversarial branch on BN adv . As a preliminary investigation of this incompatibility, our work experiments with a new variant of AT with cross-domain BN, namely training the adversarial branch with BN clean . Interestingly, we find that using BN from another domain only has limited influence on the performance. This observation inspires us to have closer look at what component in dual BN makes it more effective than single BN in Hybrid-AT. Through untwining normalization statistics (NS) and affine parameters (AP) in dual BN to include one effect while excluding the other, we demonstrate that disentangled AP instead of NS plays the main role in the merit of dual BN in Hybrid-AT. Moreover, we find that different APs in dual BN also well explain the performance discrepancy caused by the BN choice (either BN adv or BN clean ) at test time, which refutes prior claim which mainly attributes it to the role of NS (Xie & Yuille, 2020). The motivation for introducing Dual BN is inspired by a two-domain hypothesis that "clean images and adversarial images are drawn from two different domains" (Xie & Yuille, 2020). After showing their motivation does not hold, we further revisit this two-domain hypothesis itself. We reveal that the domain gap between adversarial and clean samples is not as large as claimed in prior work (Xie & Yuille, 2020). We point out a hidden flaw when prior work visualizes the NS from two domains for highlighting a large adversarial-clean domain gap. Specifically, their visualization ignores the influence of different AP. After fixing this hidden flaw, we demonstrate that this domain gap is small. Interestingly, under the same perturbation/noise magnitude, we show that there is no significant difference between adversarial-clean domain gap and noisy-clean counterpart. Therefore, we propose a two-task hypothesis to replace the two-domain hypothesis in (Xie & Yuille, 2020; Xie et al., 2020a) for theoretical justification on the necessity of dual BN in Hybrid AT. We design a dual linear classifier experiment to verify this two-domain hypothesis which also motivates us to apply dual AP to architectures with other normalization modules. Beyond vanilla Hybrid-AT, we further experiment with Trades-AT (another variant of Hybrid-AT) (Zhang et al., 2019b) which by default adopts single BN. We point out an NS inconsistency issue in their original implementation and demonstrate that fixing it can significantly improve performance. Moreover, we find that the KL regularization loss in Trades-AT can also be introduced to improve vanilla Hybrid-AT in the single BN setting. The model robustness under PGD-10 attack (PGD attack with 10 steps) and AutoAttack (AA) (Croce & Hein, 2020) are evaluated in our analysis as the basic experimental settings, with more details reported in Section A of the appendix and a more specific setup discussed in the context. Overall, considering the increasing interest in adopting dual BN in Hybrid-AT, our work comes timely by taking a closer look at dual BN in Hybrid-AT as well as its underlying hypothesis for theoretical justification. The main findings of our investigation are summarized as follows: • In contrast to prior work that attributes the merit of dual BN in Hybrid-AT to disentangling NS, we demonstrate that what plays the major role lies in its two sets of AP. • The claimed large domain gap in prior work is caused by a hidden flaw of ignoring the impact of AP, which motivates a two-task hypothesis for interpreting the task of Hybrid-AT. • As takeaways, we recommend NOT disentangling NS in Hybrid-AT, since disentangling NS has little influence on performance with dual AP and actually harms performance in the single AP setting. Moreover, with a careful choice of training details, a single BN might be sufficient for achieving competitive performance. 



DEVELOPMENT OF ADVERSARIAL TRAINING Adversarial training. Adversarial training (AT) has been the most powerful defense method against adversarial attacks, among which Madry-AT (Madry et al., 2018) is a typical method detailed as follows. Let's assume D is a data distribution with (x, y) pairs and f (•, θ) is a model parametrized by θ. l indicates cross-entropy loss in classification. Instead of directly feeding clean samples from D to minimize the risk of E (x,y)∼D [l(f (x, θ), y)], (Madry et al., 2018) formulates a saddle problem for

