TOWARDS DEFENDING MULTIPLE ADVERSARIAL PERTURBATIONS VIA GATED BATCH NORMALIZATION

Abstract

There is now extensive evidence demonstrating that deep neural networks are vulnerable to adversarial examples, motivating the development of defenses against adversarial attacks. However, existing adversarial defenses typically improve model robustness against individual specific perturbation types. Some recent methods improve model robustness against adversarial attacks in multiple p balls, but their performance against each perturbation type is still far from satisfactory. To better understand this phenomenon, we propose the multi-domain hypothesis, stating that different types of adversarial perturbations are drawn from different domains. Guided by the multi-domain hypothesis, we propose Gated Batch Normalization (GBN), a novel building block for deep neural networks that improves robustness against multiple perturbation types. GBN consists of a gated subnetwork and a multi-branch batch normalization (BN) layer, where the gated subnetwork separates different perturbation types, and each BN branch is in charge of a single perturbation type and learns domain-specific statistics for input transformation. Then, features from different branches are aligned as domain-invariant representations for the subsequent layers. We perform extensive evaluations of our approach on MNIST, CIFAR-10, and Tiny-ImageNet, and demonstrate that GBN outperforms previous defense proposals against multiple perturbation types, i.e., 1 , 2 , and ∞ perturbations, by large margins of 10-20%. 1 

1. INTRODUCTION

Deep neural networks (DNNs) have achieved remarkable performance across a wide areas of applications (Krizhevsky et al., 2012; Bahdanau et al., 2014; Hinton et al., 2012) , but they are susceptible to adversarial examples (Szegedy et al., 2013) . These elaborately designed perturbations are imperceptible to humans but can easily lead DNNs to wrong predictions, threatening both digital and physical deep learning applications (Kurakin et al., 2016; Liu et al., 2019a) . To improve model robustness against adversarial perturbations, a number of adversarial defense methods have been proposed (Papernot et al., 2015; Engstrom et al., 2018; Goodfellow et al., 2014) . Many of these defense methods are based on adversarial training (Goodfellow et al., 2014; Madry et al., 2018) , which augment training data with adversarial examples. However, most adversarial defenses are designed to counteract a single type of perturbation (e.g., small ∞ -noise) (Madry et al., 2018; Kurakin et al., 2017; Dong et al., 2018) . These defenses offer no guarantees for other perturbations (e.g., 1 , 2 ), and sometimes even increase model vulnerability (Kang et al., 2019; Tramèr & Boneh, 2019) . To address this problem, other adversarial training strategies have been proposed with the goal of simultaneously achieving robustness against multiple types of attacks, i.e., ∞ , 1 , and 2 attacks (Tramèr & Boneh, 2019; Maini et al., 2020) . Although these methods improve overall model robustness against adversarial attacks in multiple p balls, the performance for each individual perturbation type is still far from satisfactory. In this work, we propose the multi-domain hypothesis, which states that different types of adversarial perturbation arise in different domains, and thus have separable characteristics. Training on data from multiple domains can be regarded as solving the invariant risk minimization problem (Ahuja et al., 2020) , in which an invariant predictor is learnt to achieve the minimum risk for different environments. For a deep learning model, instance-related knowledge can be stored in the weight matrix of each layer, whereas domain-related knowledge can be represented by the batch normalization (BN) layer statistics (Li et al., 2017) . Inspired by the multi-domain hypothesis, we propose to improve model robustness against multiple perturbation types by separating domain-specific information for different perturbation types, and using BN layer statistics to better align data from the mixture distribution and learn domain-invariant representations for multiple adversarial examples types. In particular, we propose a novel building block for DNNs, referred to as Gated Batch Normalization (GBN), which consists of a gated subnetwork and a multi-branch BN layer. GBN first learns to separate perturbations from different domains on-the-fly and then normalizes them by obtaining domain-specific features. Specifically, each BN branch handles a single perturbation type (i.e., domain). Then, features computed from different branches are aligned as domain-invariant representations that are aggregated as the input to subsequent layers. Extensive experiments on MNIST, CIFAR-10, and Tiny-ImageNet demonstrate that our method outperforms previous defense strategies by large margins, i.e., 10-20%.

2. BACKGROUND AND RELATED WORK

In this section, we provide a brief overview of existing work on adversarial attacks and defenses, as well as batch normalization techniques.

2.1. ADVERSARIAL ATTACKS AND DEFENSES

Adversarial examples are inputs intentionally designed to mislead DNNs (Szegedy et al., 2013; Goodfellow et al., 2014) . Given a DNN f Θ and an input image x ∈ X with the ground truth label y ∈ Y, an adversarial example x adv satisfies f Θ (x adv ) = y s.t. xx adv ≤ , where • is a distance metric. Commonly, • is measured by the p -norm (p ∈{1,2,∞}). Various defense approaches have been proposed to improve model robustness against adversarial examples (Papernot et al., 2015; Xie et al., 2018; Madry et al., 2018; Liao et al., 2018; Cisse et al., 2017) , among which adversarial training has been widely studied and demonstrated to be the most effective (Goodfellow et al., 2014; Madry et al., 2018) . Specifically, adversarial training minimizes the worst case loss within some perturbation region for classifiers, by augmenting the training set {x (i) , y (i) } i=1...n with adversarial examples. However, these defenses only improve model robustness for one type of perturbation (e.g., ∞ ) and typically offer no robustness guarantees against other attacks (Kang et al., 2019; Tramèr & Boneh, 2019; Schott et al., 2019) . To address this problem, recent works have attempted to improve the robustness against several types of perturbation (Schott et al., 2019; Tramèr & Boneh, 2019; Maini et al., 2020) . Schott et al. (2019) proposed Analysis by Synthesis (ABS), which used multiple variational autoencoders to defend 0 , 2 , and ∞ adversaries. However, ABS only works on the MNIST dataset. Croce & Hein (2020a) proposed a provable adversarial defense against all p norms for p ≥1 using a regularization term. However, it is not applicable to the empirical setting, since it only guarantees robustness for very small perturbations (e.g., 0.1 and 2/255 for 2 and ∞ on CIFAR-10). Tramèr & Boneh (2019) tried to defend against multiple perturbation types ( 1 , 2 , and ∞ ) by combining different types of adversarial examples for adversarial training. Specifically, they introduced two training strategies, "MAX" and "AVG", where for each input image, the model is either trained on its strongest adversarial example or all types of perturbations. More recently, Maini et al. (2020) proposed multi steepest descent (MSD), and showed that a simple modification to standard PGD adversarial training improves robustness to 1 , 2 , and ∞ adversaries. In this work, we follow (Tramèr & Boneh, 2019; Maini et al., 2020) to focus on defense against 1 , 2 , and ∞ adversarial perturbations, which are the most representative and commonly used perturbations. However, we propose a completely different perspective and solution to the problem.

2.2. BATCH NORMALIZATION

BN (Ioffe & Szegedy, 2015)  = BN (x j ) = γ j x j -µ j σ 2 j + ξ + β j , j = 1, 2, ..., d, where µ j = 1 m m i=1 x (i) j and σ 2 j = 1 m m i=1 (x (i) j -µ j )foot_1 are the mini-batch mean and variance for each neuron, respectively, and ξ is a small number to prevent numerical instability. The learnable parameters γ and β are used to recover the representation capacity. During inference, the population statistics of mean μ and variance σ2 are used in Eqn. 1, which are usually calculated as the running average over different training iterations t with update factor α: μt = (1 -α)μ t-1 + αµ t-1 , (σ t ) 2 = (1 -α)(σ t-1 ) 2 + α(σ t-1 ) 2 . (2) A number of normalization techniques have been proposed to improve BN for style-transfer (Huang & Belongie, 2017) and domain adaption (Li et al., 2017; Chang et al., 2019; Deecke et al., 2019) . Compared to studies for domain adaptation, our GBN does not require input domain knowledge during inference. In contrast to Mode Normalization (Deecke et al., 2019) 2020) manually selected the BN branch for each input image, which requires prior knowledge of whether the input is an adversarial or clean example (the same as domain adaptation).

3. GATED BATCH NORMALIZATION

In this section, we present our proposed Gated Batch Normalization (GBN) approach, to improve the model robustness against multiple adversarial perturbations. We first illustrate our multi-domain hypothesis, which states that adversarial examples of different perturbation types are drawn from different domains. Motivated by this hypothesis, we demonstrate the architecture of our GBN block, then describe the training and inference procedures.

3.1. MOTIVATION: MULTI-DOMAIN HYPOTHESIS

We assume N adversarial perturbation types 2 , each characterized by a set S k of perturbations for an input x. Let D 0 denote the set of clean examples, and D k (k = 1, ..., N ) denote the set of adversarial examples generated by the k-th adversarial perturbation type S k . An adversarial example of the k-th type x k adv is generated by the pixel-wise addition of the perturbation δ k , i.e., x k adv = x + δ k . Our multi-domain hypothesis states that different types of perturbations D k (for all k ≥ 0, including the clean examples) are drawn from different domains, inspired by the following observations: (1) Training on a single perturbation type is insufficient for achieving the robustness against other perturbations. Further, training on a mixture of different perturbation types still fail to achieve acceptable performance for each type of perturbation (Tramèr & Boneh, 2019; Maini et al., 2020) . (2) Adversarial examples are separable from clean examples (Metzen et al., 2018) . Xie & Yuille (2020) also suggested that clean and adversarial examples are drawn from two different domains. We first empirically investigate the hypothesis for deep neural networks. Specifically, we train separate BNs for different input domains D k (k ≥ 0). As illustrated in Figure 1 (b), we construct separate mini-batches to estimate BN statistics for different D k , i.e., 1 , 2 , and ∞ adversarial images, as well as for clean images. We defer the implementation details to Appendix A. Figure 1(c ) and 1(d) show that different D k induce significantly different running statistics, according to the running means and variances of different BN branches. We further theoretically demonstrate that for a linear classifier f (x) = sign(w T x + b), different types of adversarial perturbations belong to different domains, with pair-wise Wasserstein distances greater than 0 (c.f. Appendix B). These theoretical and empirical results support our multi-domain hypothesis, and motivate our design of GBN, as discussed below. More details of running statistics of different models can be found in Appendix E.10.

ReLU

Conv BN xclean ℓ 1 ℓ 2 ℓ ∞ x x x (a) standard BN ReLU x Conv clean ℓ 1 BN BN BN BN ℓ 2 ℓ ∞ clean ℓ 1 ℓ 2 ℓ ∞ x x x

3.2. MODEL ARCHITECTURE WITH GATED BATCH NORMALIZATION (GBN)

Training a model that generalizes well on multiple domains can be regarded as solving an invariant risk minimization problem (Ahuja et al., 2020) . Define a predictor f : X → Y and the risk achieved by f in domain D k as R k (f ) = (x,y)∼D k (f (x), y), where (•) is the cross entropy loss. We say that a data representation π: X → Z elicits an invariant predictor ω • π across all domains k ∈ {0, 1, • • • , N } if there is a classifier ω: Z → Y that achieves the minimum risk for all domains. To obtain a domain-invariant representation π, we use the BN layer statistics to align data from the mixture distribution. Specifically, GBN uses a BN layer with multiple branches Ψ = {BN k (•), k = 0, ..., N } during training, where each branch BN k (•) is exclusively in charge of domain D k . The aligned data are then aggregated as the input to subsequent layers to train a classifier ω which achieves the minimum risk across different domains (i.e., model robust to different perturbations). One remaining problem is that the model does not know the input domain during inference. To calculate the normalized output in this case, GBN utilizes a gated sub-network Φ θ (x) to predict the domain for each layer input x; we will illustrate how to train Φ θ (x) in Section 3.3. Given the output of sub-network g = Φ θ (x) ∈ R N +1 , we calculate the normalized output in a soft-gated way: x = GBN (x) = N k=0 g k xk , where g k represents the confidence of the k-th domain for x, and xk is the normalized output of BN k (•), which uses the population statistics of the domain D k to perform standardization. We provide an overview of the inference procedure in Figure 2 (b), and the details are presented in Algorithm 1. In Appendix E.6, we discussed an alternative approach that takes the top-1 prediction of g (the hard label), and it achieved comparable performance. Our GBN aims to disentangle domain-invariant features and domain-specific features: (1) the distributions in different domains are aligned by their normalized outputs (all are standardized distributions), which ensures that the following linear layers learn domain-invariant representations; and (2) the domain-specific features for each domain D k are obtained by the population statistics {μ k , σk } of its corresponding BN branch BN k (•). We show empirically that {μ k , σk } of BN k (•) learns the domain-specific representation well, as shown in Figure 1 . During inference, given mini-batch data, the gated sub-network first predicts the domain of input x and then jointly normalizes it using multiple BN branches. Algorithm

3.3. TRAINING

We provide an overview of the training procedure in Figure 2 (a), and the details are in Algorithm 2. Specifically, for each mini-batch B 0 consisting of clean samples, we use PGD attack (Madry et al., 2018) To train the gated sub-network g, we provide the supervision of the input domain for each training sample. Specifically, we introduce the domain prediction loss L DP : L DP = N k=0 (x,k)∈D k (Φ θ (x), k). Finally, we optimize the parameters Θ of the entire neural network (e.g., the weight matrices of the convolutional layers, except for the gated sub-network) using the classification loss L cls :  L cls = N k=0 (x,y)∈D k (f Θ (x; BN k (•)), y),

4. EXPERIMENTS

We evaluate the effectiveness of our GBN block to simultaneously achieve robustness against 1 , 2 , and ∞ perturbations, which are the most representative and commonly used adversarial perturbation. We conduct experiments on the image classification benchmarks MNIST (LeCun, 1998) , CIFAR-10 (Krizhevsky & Hinton, 2009) , and Tiny-ImageNet (Wu et al., 2017) .

4.1. EXPERIMENT SETUP

Architecture and hyperparameters. We use LeNet architecture (LeCun et al., 1998) for MNIST; ResNet-20 (He et al., 2016) , VGG-16 (Simonyan & Zisserman, 2015) , and WRN-28-10 (Zagoruyko & Komodakis, 2016) for CIFAR-10; and ResNet-34 (He et al., 2016) for Tiny-ImageNet. For fair comparisons, we keep the architecture and main hyper-parameters the same for GBN and other baselines. A detailed description of the hyper-parameters can be found in Appendix D. Adversarial attacks. To evaluate the model robustness, we follow existing guidelines (Schott et al., 2019; Tramèr & Boneh, 2019; Maini et al., 2020) and incorporate multiple adversarial attacks for different perturbation types. For MNIST, the magnitude of perturbation for 1 , 2 , and ∞ is = 10, 2, 0.3. For CIFAR-10 and Tiny-ImageNet, the magnitude of perturbation for 1 , 2 , and ∞ is = 12, 0.5, 0.03. For 1 attacks, we adopt PGD attack (Madry et al., 2018) , and Brendel & Bethge attack (BBA) (Brendel et al., 2019) ; for 2 attacks, we use PGD attack, C&W attack (Carlini & Wagner, 2017) , Gaussian noise attack (Rauber et al., 2017) , and boundary attack (BA) (Brendel et al., 2018) ; For ∞ attacks, we use PGD attack, Fast Gradient Sign Method (FGSM) (Goodfellow et al., 2014) , SPSA (Uesato et al., 2018) , Nattack (Li et al., 2019) , Momentum Iterative Method (MI-FGSM) (Dong et al., 2018) , C&W attack, and the recently proposed AutoAttack (Croce & Hein, 2020b ) (a stronger ensemble of parameter-free attacks, including APGD, FAB (Croce & Hein, 2020c) , and Square Attack (Andriushchenko et al., 2020) ). We adopt FoolBox (Rauber et al., 2017) as the implementation for the attacks. Note that, to prove that no obfuscated gradient has been introduced, we adopt both the white-box and the black-box or gradient-free adversarial attacks. A more complete details of all attacks including hyper-parameters can be found in Appendix D.1. Adversarial defenses. We compare with works that achieve robustness against multiple perturbation types. All of them consider the union of 1 , 2 , ∞ adversaries. 3 We compare with ABS (Schott et al., 2019) , MAX, AVG (Tramèr & Boneh, 2019) , and MSD (Maini et al., 2020) . For completeness, we also compare with TRADES (Zhang et al., 2019) and PGD adversarial training (Madry et al., 2018) with single perturbation type (i.e., 1 , 2 , and ∞ , respectively as P 1 , P 2 , and P ∞ ). Normalization techniques. We further compare our methods to some normalization techniques (i.e., MN (Deecke et al., 2019) and MBN (Xie & Yuille, 2020) ). MN extends BN to more than a single mean and variance, detects the modes of the data, and then normalize samples that share common features. For MBN, since manually selecting the BN branch is infeasible in the adversarial setting, we add a 2-way gated sub-network to each MBN block.

4.2. ADVERSARIAL ROBUSTNESS FOR MULTIPLE PERTURBATIONS

In this section, we evaluate model robustness against multiple perturbation types (i.e., 1 , 2 , and ∞ ) in both the white-box and black-box setting. We use the worst case top-1 classification accuracy, where we report the worst result among various attacks for each perturbation type (the higher the better). We also report all attacks which represents the worst-case performance computed over all attacks for each example (Maini et al., 2020) . Specifically, given an image, if we generate an adversarial attack that leads to wrong predictions using any attacks, we mark it as a failure. For all attacks except FGSM and APGD CE (the default setting in AutoAttack), we made 5 random restarts for each of the results. As for models using our GBN, we trained each model 3 times, and the results are similar, showing that training with GBN is stable. The results on MNIST using LeNet, CIFAR-10 using ResNet-20, and Tiny-ImageNet using ResNet-34 are shown in Table 1 . We present the detailed breakdown results for individual attacks and more experimental results using VGG-16 and WRN-28-10 on CIFAR-10 in Appendix E.3. We observe that: (1) on multiple perturbation types (i.e., 1 , 2 , ∞ ), our GBN consistently outperforms the baselines by large margins of over 10% on all three datasets; (2) due to the trade-off between adversarial robustness and standard accuracy (Tsipras et al., 2019) , our clean accuracy is lower than the vanilla model. However, GBN maintains a comparatively high clean accuracy compared to other adversarial defenses; and (3) GBN is easy to train on different datasets, while the other methods (e.g., MSD) are difficult to converge on large datasets such as Tiny-ImageNet. We also evaluate our model on PGD attacks using different iteration steps (i.e., 50, 100, 200, and 1000) in terms of ∞ norm. As shown in Appendix E.4, our GBN still outperforms other methods and remains a high level of robustness. Comparison with other normalization methods. As shown in Table 1 , GBN outperforms MN and MBN by large margins (up to 60%). Here, we provide more comparisons with MN and MBN. Apart from the target task, our GBN differs them on the general ideas: our GBN first aligns the distribution among different perturbations by using one BN branch to handle one perturbation type (learn domain-specific features), and then the aligned distribution contributes to the subsequent modules or layers (e.g., convolutional or fully connected layer) for learning domain-invariant representations. GBN ensure that the normalized output for each perturbation type has a Gaussian distribution, and these Gaussian distributions among different perturbations are well aligned (e.g., one clean example and its corresponding adversarial examples are more likely to be close in the space of normalized output. See Appendix E.9 for the visualization of features after normalization blocks using t-SNE Laurens & Hinton (2008) ). By doing this, the other modules are easier to learn the native representation of the data (domain-invariant representations). However, MN and MBN cannot align the distributions. MN aims to preserve the diversity among different distributions, by learning the mixture of Gaussian, not one single Gaussian. As for the modified MBN, the data from different perturbations are mixed in the same BN. The results also confirm our multi-domain hypothesis. Different variants of white-box attacks. Here, we examine the performance of GBN in the more rigorous white-box attack scenario. In this case, we conduct two experiments as follows: (1) attacking the GBN layers, i.e., generate adversarial examples to fool the classification of the gated sub-networks; (2) the adversary manually selects the BN branch to generate adversarial examples. Note that these attacks are specially designed for our defense aiming to provide further analysis and understanding. As shown in Table 15 and 17 in Appendix E.8, GBN remains the same level of robustness against these attacks. We conjecture the reasons are as follows: (1) our GBN does not rely on obfuscated gradients (Athalye et al., 2018) to provide model robustness; (2) different GBN layers use different features to predict the domains, and it is difficult for an adversary to generate perturbations to fool all GBN layers within a model under the magnitude constraint (see Table 16 for more results of prediction accuracy of gated sub-networks at different layers). We defer more discussions to Appendix E.8. 

4.3. ABLATION STUDY

Attacks outside the perturbation model. To examine whether GBN could generalize to perturbation types that the model was not trained on, we trained a variant of GBN models that only includes 3 BN branches. Specifically, these GBN models are trained on two p perturbation types only, but are evaluated on all the three p perturbations, including the held-out perturbation that is not trained. Unsurprisingly, compared to the full GBN model with 4 BN branches, the robustness of the held-out perturbation type decreases to some degree. However, the robustness is still significantly better than the vanilla model, and sometimes even outperforms other baseline defenses that are trained on all perturbation types (c.f. Appendix E.7). Adding GBN into different layers. Our GBN block can be inserted at any layer in a deep network. In this part, we study GBN by single-layer and layer-group studies. We first add GBN to different single layers: as shown in Figure 3 (a), the standard performance and adversarial robustness decrease as we add GBN into the deeper layers; we then add GBN to layer groups (i.e., top-m layers): according to the results in Figure 3 (b), the adversarial robustness improves (but the clean accuracy remains comparatively stable) as more layers are involved. In summary, adding GBN into shallow layers achieves better performance with respect to both clean and adversarial accuracy. The reasons for this might be two-fold: (1) as shown in (Liu et al., 2019b) , shallow layers are more critical to model robustness than deeper layers; (2) as the layer depth increases, the features from different domains are highly entangled, making it harder for models to separate them (c.f. Appendix E.10.). Predictions of the gated sub-network Moreover, we provide the prediction results of the gated sub-network in GBN, i.e., the classification accuracy for the domain of different input samples. As shown in Table 11 , for MNIST, the gated sub-network of LeNet at different layers achieves high prediction accuracy for the input domains (i.e., clean, 1 , 2 , and ∞ adversarial examples). However, on CIFAR-10, the prediction accuracy of the gated sub-network drops with the increasing of the layer depth. We can conclude that shallow layers are more critical to model robustness as the prediction accuracy at the shallow layers is higher than that of the deeper layers. The reason might be that shallow layers contain limited information about the image statistics and can be learned via a few of samples (Asano et al., 2020) , in which the features are more easy to separate. However, the reason for the indistinguishability of features in the first layer is still unexplored, which we leave for future works. To further improve model robustness against multiple perturbations, we suggest to improve the prediction accuracy of the gated sub-network. 

5. CONCLUSIONS

Most adversarial defenses are typically tailored to a single perturbation type (e.g., small ∞ -noise), but offer no guarantees against other attacks. To better understand this phenomenon, we explored the multi-domain hypothesis that different types of adversarial perturbations are drawn from different domains. Guided by this hypothesis, we propose a novel building block for DNNs, Gated Batch Normalization (GBN), which consists of a gated network and a multi-branches BN layer. The gated sub-network separates different perturbation types, and each BN branch is in charge of a single perturbation type and learns the domain-specific statistics for input transformation. Then, features from different branches are aligned as domain-invariant representations for the subsequent layers. Extensive experiments on MNIST, CIFAR-10, and Tiny-ImageNet demonstrate that our method outperforms previous proposals against multiple perturbation types by large margins of 10-20%.

A EXPERIMENTAL SETTING FOR SEPARATING BNS

We provide the experimental setting for separating BNs study in Section 3.1. Specifically, we train a DNN on CIFAR-10 with each BN layer composed of 4 branches (i.e., we replace each BN with 4 BN branches). During training, we construct different mini-batch for clean, 1 , 2 , and ∞ adversarial images to estimate the normalization statistics of each branch of BN; during inference, given a mini-batch of data from a specific domain, we manually activate the corresponding BN branch and deactivate the others. We update the running statistics of each BN branch separately but optimize the model parameters using the sum of the four losses.

B PROOF IN SECTION 3.1

Proof. We first calculate the optimal perturbation δ p in the l p ball of radius . For simplicity, we assume that the classification loss is given by l(f (x), y) = h(-yf (x)) where h is a non-decreasing function. Note that this is a broad family of losses including the hinge loss and logistic loss. The objective of the adversary is to find a perturbation δ p which maximizes the loss. For the linear classifier, δ p can be explicitly calculated as follows δ p = arg max ||δ||p≤ l(f (x + δ), y) = arg max ||δ||p≤ h(-yf (x + δ)) = arg max ||δ||p≤ -yw T δ = • arg max ||δ||p≤1 -yw T δ = ∂|| -yw|| q , where 1/p + 1/q = 1 and the last equality uses the property that ∂||x|| q = arg max ||s||p≤1 s T x. Notice that the optimal perturbation δ p is independent of the inputs corresponding to a particular label. Therefore, the new distribution D p y of the adversarial examples generated from the perturbation type S p can be written as D 0 y + ∂|| -yw|| q , where D 0 represents the clean example domain. For p = 1 or p = ∞, it is easy to verify that ∂|| -yw|| q = (0, 0, • • • , sign(-yw i ), • • • , 0) where i = arg max j |w j | or ∂|| -yw|| q = (sign(-yw 1 ), • • • , sign(-yw j ), • • • , sign(-yw d )) respectively. For p = 2, we can simply obtain that ∂|| -yw|| 2 = (- yw i /||w|| 2 , • • • , -yw d /||w|| 2 ). Thus, we have the Wasserstain distance between D 1 y and D ∞ y as W 2 (D 1 y , D ∞ y ) = W 2 (D 0 y + ∂|| -yw|| ∞ , D 0 y + ∂|| -yw|| 1 ) = √ d -1, where the last equality follows from the translation invariance of Wasserstein distance. Similarly, the Wasserstein distance between D 2 y and D 1 y is W 2 (D 1 y , D 2 y ) = 2 -2|w i |/||w|| 2 . Finally, the Wasserstein distance between D 2 y and D ∞ y can be calculated as W 2 (D 2 y , D ∞ y ) = d + 1 -2||w|| 1 /||w|| 2 .

C GBN IMPLEMENATION DETAILS

Network architecture. We first illustrate the architecture of the gated sub-network g in GBN. As shown in Figure 4 , we devise two types of gated sub-network, namely Conv gate and FC gate. The notation Conv2d(d, s, k) refers to a convolutional layer with k filters whose size are d × d convolved with stride s. The notation ReLU(•) denotes the rectified linear unit used as the activation function in the network. FC(m) denotes a fully connected layer with output size m. Conv gate is a convolutional neural network consisting of two convolutional layers, two ReLU layers and one fully-connected layer. c denotes the input channel of this gate. In the default setting, we use stride 1 for the first convolutional layer and 2 for the second. For the filter sizes in both layers, padding is set to 3 and 1, respectively. We set the fully-connected layer output size m=4 to match the four data distributions used in our experiment. FC gate is a simple fully-connected neural network that includes two fully-connected layers and one ReLU layer. We use m=512 and m=4 for the two fully-connected layers. To train models containing GBN, we add the GBN block into all layers within a model. Specifically, we set the g in the first GBN as the Conv gate and use the FC gates for other GBN blocks to further capture domain-specific information and improve model robustness. We use the default values for ξ and α in PyTorch. Discussion. To empirically prove the effectiveness of the above strategy, we conduct additional experiments using the Conv gate and FC gate for all GBN blocks. In other words, we train a VGG-16 model with all GBN blocks using the Conv gate denoted "Conv all ", and train another model with all GBN blocks using the FC gate denoted "FC all ". As shown in Table 2 , our strategy (denoted "Conv+FC") achieves the greatest robustness. Thus, we use Conv gate for all the GBN blocks in the single layer study, and use Conv gate for the first GBN and FC gates for the other GBN blocks in the layer group study (Section 4.3). We conjecture that there are two reasons for this: (1) the running statistics between different domains in the first layer are almost indistinguishable (c.f. Appendix E.10). Thus, solely using the FC layer (FC gate) fails to extract sufficient features from the first layer to perform correct classification; (2) using conv layers for all GBN blocks may suffer from over-fitting problem, since only adding GBN into the first layer achieves considerable robustness (as shown in Figure 3(a) ). We will further address it in the future studies. Implementation details. Here, we provide the details of the experimental settings for GBN and other baselines on different datasets. All the models for each method on MNIST, CIFAR-10, and Tiny-ImageNet are trained for 40, 40, and 20 epochs, respectively. We set the mini-batch size=64, use the SGD optimizer with weight decay 0.0005 for Tiny-ImageNet and use no weight decay for MNIST and CIFAR-10. We set the learning rate as 0.1 for MNIST and 0.01 for CIFAR-10 and Tiny-ImageNet. For these baselines, we use the published implementations for ABSfoot_4 , MAX/AVGfoot_5 , MSDfoot_6 , and TRADESfoot_7 .

D.1 ADVERSARIAL ATTACKS

In this paper, we adopt both the white-box and black-box adversarial attacks. In the white-box setting, adversaries have the complete knowledge of the target model and can fully access the model; while in the black-box setting, adversaries have limited knowledge of the target classifier (e.g., its architecture) but can not access the model weights. For 1 attacks, we use PGD attack (Madry et al., 2018) , and Brendel & Bethge attack (BBA) (Brendel et al., 2019) . For 2 attacks, we use PGD attack (Madry et al., 2018) , C&W attack (Carlini & Wagner, 2017) , Gaussian noise attack (Rauber et al., 2017) , and boundary attack (BA) (Brendel et al., 2018) . For ∞ attacks, though ∞ PGD adversary is quite effective, for completeness, we additionally use attacks including, Fast Gradient Sign Method (FGSM) (Goodfellow et al., 2014) , SPSA (Uesato et al., 2018) , Nattack (Li et al., 2019) , Momentum Iterative Method (MI-FGSM) (Dong et al., 2018) , and AutoAttack (Croce & Hein, 2020b) (a stronger ensemble of parameter-free attacks, including APGD, FAB (Croce & Hein, 2020c) , and Square Attack (Andriushchenko et al., 2020) ) . Among the attacks mentioned above, BA, SPSA, Nattack, and Square Attack are black-box attacks. We then provide the hyper-parameters of each attack. PGD-1 . On MNIST, we set the perturbation magnitude =10, iteration number k=50, and step size α= /10. On CIFAR-10 and Tiny-ImageNet, we set the perturbation magnitude =12, iteration number k=50, and step size α=0.05. PGD-2 . On MNIST, we set the perturbation magnitude =2.0, iteration number k=100, and step size α=0.1. On CIFAR-10 and Tiny-ImageNet, we set the perturbation magnitude =0.5, iteration number k=50, and step size α= /10. PGD-∞ . On MNIST, we set the perturbation magnitude =0.3, iteration number k=50, and step size α=0.01. On CIFAR-10 and Tiny-ImageNet, we set the perturbation magnitude =0.03, iteration number k=40, and step size α= /10. PGD-1000-∞ . On CIFAR-10, we set the perturbation magnitude =0.03, iteration number k=1000, and step size α= /10. BBA. On MNIST, CIFAR-10 and Tiny-ImageNet, we set the perturbation magnitude =10, 12, 12 in terms of 1 norm, respectively. For all datasets, we set the number of optimization steps as 1000, learning rate as 0.001, momentum as 0.8, and the binary search steps as 10. C&W-2 . On MNIST, CIFAR-10 and Tiny-ImageNet, we set the perturbation magnitude =2.0, 0.5, 0.5 in terms of 2 norm, respectively. For all datasets, we set the number of optimization steps as 10000, each step size as 0.01, and the confidence required for an example to be marked as adversarial as 0. C&W-∞ . On CIFAR-10, we set the perturbation magnitude =0.03 in terms of ∞ norm. We set the number of optimization steps as 10000, each step size as 0.01, and the confidence required for an example to be marked as adversarial as 0. Gaussian noise. On MNIST, CIFAR-10 and Tiny-ImageNet, we set the perturbation magnitude =2.0, 0.5, 0.5 in terms of 2 norm, respectively. BA. On MNIST, CIFAR-10 and Tiny-ImageNet, we set the perturbation magnitude =2.0, 0.5, 0.5 in terms of 2 norm, respectively. For all datasets, we set the maximum number of steps as 25000, initial step size for the orthogonal step as 0.01, and initial step size for the step towards the target as 0.01. FGSM. On MNIST, we set the perturbation magnitude =0.3 in terms of ∞ norm. On CIFAR-10 and Tiny-ImageNet, we set the perturbation magnitude =0.03 in terms of ∞ norm. MI-FGSM. On MNIST, we set the perturbation magnitude =0.3, the decay factor µ=1 in terms of ∞ norm, and set the step number k=10. On CIFAR-10 and Tiny-ImageNet, we set the perturbation magnitude =0.03, the decay factor µ=1 in terms of ∞ norm, and set the step number k=10. The step size α= /k. SPSA. We set the maximum iteration as 100, the batch size as 8192, and the learning rate as 0.01. The magnitude of perturbation is 0.3 in terms of ∞ norm on MNIST, and 0.03 in terms of ∞ norm on CIFAR-10 and Tiny-ImageNet. NATTACK. We set T =600 as the maximum number of optimization iterations, b=300 for the sample size, variance of the isotropic Gaussian σ 2 =0.01, and learning rate as 0.008. The magnitude of perturbation is 0.3 in terms of ∞ norm on MNIST, and 0.03 in terms of ∞ norm on CIFAR-10 and Tiny-ImageNet. AutoAttack. AutoAttack selects the following variants of adversarial attacks: APGD CE without random restarts, APGD DLR , the targeted version of FAB as FAB T , and Square Attack with one run of 5000 queries. We use 100 iterations for each run of the white-box attacks. For APGD, we set the momentum coefficient α=0.75, ρ=0.75, initial step size η (0) =2 , where is the perturbation magnitude in terms of the ∞ norm. For FAB, we keep the standard hyper-parameters based on the implementation of AdverTorch. For Square Attack, we set the initial value for the size of the squares p = 0.8.

D.2 ADVERSARIAL DEFENSES

ABS. ABS uses multiple variational autoencoders to construct a complex generative architecture to defend against adversarial examples in the MNIST dataset. As for the limited code released by the authors, we directly use the results reported in Schott et al. (2019) . They set the perturbation magnitude =12, 1.5, 0.3 for 0 , 2 , and ∞ attack, respectively. ABS used an 0 perturbation model of a higher radius and evaluated against 0 attacks. So the reported number is a near estimate of the 1 adversarial accuracy. AVG. For each batch of clean data (size=64), we generate corresponding adversarial examples ( 1 , 2 , and ∞ ) using PGD attack. We train the model using a combination of clean, 1 , 2 , and ∞ adversarial examples simultaneously. The hyper-parameters of PGD adversaries can be found in Appendix D.1 (PGD-1 , PGD-2 , and PGD-∞ ).

MAX.

For each batch of clean data (size=64), we generate the strongest adversarial examples (one of the 1 , 2 , and ∞ attack) using PGD attack. We train the model using a combination of clean examples and the strongest attack. The hyper-parameters of PGD adversaries can be found in Appendix D.1 (PGD-1 , PGD-2 , and PGD-∞ ). MSD. MSD creates a single adversarial perturbation by simultaneously maximizing the worst-case loss over all perturbation models at each projected steepest descent step. For MNIST, we set the perturbation magnitude =10, 2.0, 0.3 for 1 , 2 , and ∞ attack, respectively, and iteration number k=100. For CIFAR-10 and Tiny-ImageNet, we set the perturbation magnitude =12, 0.5, 0.03 for 1 , 2 , and ∞ attack respectively, and iteration number k=50. TRADES. TRADES is an adversarial defense method trading adversarial robustness off against accuracy which won 1st place in the NeurIPS 2018 Adversarial Vision Challenge. We set 1/λ=3.0 and set other hyper-parameters as the default values following the original paper.  P

E MORE EXPERIMENTAL RESULTS

In this section, we provide further experimental results. We study the differences between different perturbation types from the Fourier perspective (Yin et al., 2019) . We evaluate four different ResNet-20 models: a vanilla model and 3 adversariallytrained models using only 1 , 2 , and ∞ adversarial examples, respectively. We visualize the Fourier heatmap following (Yin et al., 2019) and shift the low frequency components to the center of the spectrum. Error rates are averaged over 1000 randomly sampled images from the test set on CIFAR-10. The redder the zone, the higher the error rate of the model to perturbations of specific frequency. As shown in Figure 5 , models trained using different p perturbations demonstrate different model weaknesses in the Fourier heatmap (i.e., different hot zones). For example, 2 -PAT is more susceptible to high-frequency perturbations while less susceptible to some low-frequency perturbations (the smaller blue zone in the center), while ∞ -PAT is comparatively more robust to middle-frequency perturbations (light yellow zones). Therefore, different perturbation types have different frequency properties, and models trained on different perturbations are sensitive to noise from different frequencies. This observation further demonstrates that different perturbation types may arise from different domains (our multi-domain hypothesis).

E.2 COMPUTATIONAL COST

Here, we provide the time consumption of AVG, MAX, MSD, MN, MBN, and our GBN. All the experiments are conducted on a NVIDIA Tesla V100 GPU cluster. We compute the overall training time using ResNet-20 on CIFAR-10 for 40 epochs. We compute the inference time using ResNet-20 on CIFAR-10 for 10000 images. As seen in Table 3 , our method achieves comparable results to other baselines, demonstrating its applicability in practice. In this part, we provide the breakdown for each individual attack on MNIST, CIFAR-10, and Tiny-ImageNet in Tabel 4, 5, and 6. We also report the results for individual attacks of AutoAttack in 9. Further, we provide the breadkdown for each individual attack on CIFAR-10 using VGG-16 and WideResNet-28-10 in Table 7 and Table 8 . According to the results, our GBN outperforms other methods for almost all attacks by large margins. However, it is reasonable to notice that our GBN shows slightly weaker or comparable performance on some individual attacks compared to defenses trained for the specific perturbation types. For example, P 1 outperforms GBN for PGD-1 on CIFAR-10 and TRADES shows better performance for some ∞ attacks on MNIST. In summary, our proposed GBN trains robust models in terms of multiple perturbation types (i.e., 1 , 2 , ∞ ) and outperforms other methods by large margins.

E.4 ATTACKS WITH DIFFERENT ITERATION STEPS

Here, we further provide the experimental results of PGD adversarial attacks using different iteration step numbers (i.e., 50, 100, 200, 1000) in terms of ∞ norm on CIFAR-10. For PGD attacks, we set the perturbation magnitude =0.03, step size α= /10, and different iteration steps k. As shown in Table 10 , our GBN outperforms other methods on PGD attacks using different iteration steps. . However, on CIFAR-10, the prediction accuracy of the gated sub-network drops with the increasing of the layer depth. From the results, we can also confirm our findings in Section 4.3: (1) shallow layers are more critical to model robustness; and (2) as the layer depth increases, the features from different domains are highly entangled and mixed, making it harder for models to separate them. To further improve model robustness against multiple perturbations, we suggest to improve the prediction accuracy of the gated sub-network. We will further address it in the future studies. We empirically observe that training on 2 perturbation types actually improves overall robustness against all three perturbation adversaries. Unsurprisingly, the robustness of GBN 1+ ∞ for 2 adversarial examples and the robustness of GBN 2+ ∞ for 1 adversarial examples decrease to some degree, but the models still perform better than vanilla, and sometimes even outperform baseline methods. In addition, we further evaluate model robustness on attacks outside the perturbation model using different prediction approaches of the gated sub-network (i.e., soft label and hard label). As shown in Table 14 , we achieve the similar results using the soft or hard label.

E.8 DIFFERENT VARIANTS OF WHITE-BOX ATTACKS

Here, we examine the performance of GBN in more rigorous white-box attack scenarios. In this case, the adversary not only knows every detail of the GBN architecture, but also can manually select the BN branch to generate adversarial attacks. We first generate adversarial attacks by fooling all the GBN layers in the model. Specifically, we generate adversarial attacks using PGD- 1 , suggesting that the robustness of GBN does not rely on obfuscated gradients (Athalye et al., 2018) . Here, we further provide the visualization of features before and after the normalization blocks of each method using t-SNE. Specifically, given 100 CIFAR-10 clean samples from a specific class (e.g., "dog"), we first generate corresponding 1 , 2 , and ∞ adversarial examples; we then visualize the features before and after the normalization blocks (i.e., the BN block for AVG and the GBN block for our method). Note that we use the hyper-parameters of PGD-1 , PGD-2 , and PGD-∞ attacks in Appendix D.1. As can be seen in Figure 6 , the features of different domains are aligned to domaininvariant representations after normalized by GBN. Specifically, after the normalization of GBN, one clean example (the red dot) and its corresponding adversarial examples (the corresponding blue, green, and yellow dots) are more likely to be close in the space of normalized output (Figure 6 (b)). However, after the normalization of BN (Figure 7 ), the distance between the clean example and its corresponding adversarial examples are still far, and we cannot figure our any tendency between Figure 7 (a) and Figure 7 (b).

E.10 MORE DETAILS OF THE RUNNING STATISTICS OF DIFFERENT MODELS

We train a VGG-16 and WideResNet-28-10 model with the multiple BN branch structure and inspect the running statistics of each BN at different layers, as shown in Figure 8 and Figure 9 . Surprisingly, the running statistics between different domains in the first layer are almost indistinguishable. However, according to the ablation study in , adding GBN into shallow layers is the most robust than adding GBN to other layers. We may understand it from three following viewpoints: (1) gated sub-network prediction accuracy. We provided the prediction accuracy of the gated sub-network on inputs from different domains in Table 11 . According to the experimental results, the gated subnetwork achieves the highest prediction accuracy in the first layer, though the running statistics in (2) the convolutional layers in the gated sub-network may still be able to distinguish the differences between these features (see discussions in Appendix C). (3) adding GBN to the first layer shows better performance than solely adding GBN to deeper layers might because those shallow layers are more critical to model robustness Liu et al. (2019b) . We will study it in future work. 



Our code will be available upon publication. In this work, we consider N = adversarial perturbation types: 1, 2 and ∞. Schott et al. (2019) consider the 0 perturbations, which is subsumed within the 1 ball of the same radius. https://github.com/bethgelab/AnalysisBySynthesis https://github.com/ftramer/MultiRobustness https://github:com/locuslab/robust_union https://github.com/yaodongyu/TRADES



Figure 1: (a) the standard BN structure; (b) the structure with 4 BN branches, which construct different mini-batches for different D k to estimate the normalization statistics; (c) and (d): running means and variances of multiple BN branches on 16 randomly sampled channels in a VGG-16's conv2 1 layer, showing that different perturbation types induce different normalization statistics. More details of running statistics of different models can be found in Appendix E.10.

Training of Gated Batch Normalization (GBN) for Each Iteration. Input: Network f with GBN Output: Robust model parameters Θ and θ 1: Given mini-batch data B 0 , generate corresponding adversarial example batch B k (k ∈ {1, ..., N }) with specific perturbation type by PGD attacks 2: for k in N +1 domains do 3: Let B k go through BN k (•) at each layer to obtain normalized outputs B k based on Eqn. 1. 4:Update the population statistics {μ k , σk } of BN k (•) at each layer based on Eqn. 2. 5: end for 6: Update parameters θ of the gated sub-network at each layer based on Eqn. 4. 7: Update parameters Θ of the whole network based on Eqn. 5.

Figure 3: (a) demonstrates the results of adding GBN to different single layers. (b) shows the results of adding GBN to top-m layers. All the experiments are conducted using a VGG-16 on CIFAR-10.

Figure 4: The architecture of the gated sub-network used in this paper.

Figure 5: Model sensitivity to additive noise aligned with different Fourier basis vectors on CIFAR-10. From left to right: vanilla, 1 -trained, 2 -trained, and ∞ -trained models. The numbers indicate the model error rates.

Figure 6: We use the GBN block at layer conv3 1 of a VGG-16 model. (a) demonstrates the visualization of features before the GBN block; (b) shows the visualization of features after the GBN block.

Figure 7: We use the BN block at layer conv3 1 of a VGG-16 model trained using AVG. (a) demonstrates the visualization of features before the BN block; (b) shows the visualization of features after the BN block.

Figure 8: Running statistics (running mean and running variance) of each BN in the multiple BN branches at different layers of a VGG-16 model on CIFAR-10. 26

Figure 9: Running statistics (running mean and running variance) of each BN in the multiple BN branches at different layers of a WideResNet-28-10 model on CIFAR-10. 27

is typically used to stabilize and accelerate DNN training. Let x ∈ R d denote the input to a neural network layer. During training, BN normalizes each neuron/channel within m mini-batch data by xj

to generate batches of adversarial examples B k (k ∈ {1, ..., N }) for each domain D k . To capture the domain-specific statistics for different perturbation types, given mini-batch data B k from domain D k , we ensure that B k goes through its corresponding BN branch BN k (•), and we use Eqn. 1 to compute the normalized output B k . The population statistics {μ k , σk } of BN k (•) are updated based on Eqn. 2. In other words, we disentangle the mixture distribution for normalization and apply separate BN branches to different perturbation types for statistics estimation.

Computational cost. With respect to the parameters, we have N +1 pairs of {μ, σ} to estimate compared to one in standard BN. With respect to the time consumption, in the adversarial learning literature, the most time-consuming procedure in adversarial training is generating adversarial examples. Although we introduce more parameters, our time cost is close to other adversarial defensive baselines (see Appendix E.2 for more details).

Model robustness on different datasets (%). We also provide the Standard Deviation for All attacks of each method on each dataset.

Model robustness of VGG-16 on CIFAR-10 (the higher the better).

∞ . We adversarially train the model with 50% clean examples and 50% adversarial examples generated by ∞ PGD attack for each mini-batch data. For PGD attack, on CIFAR-10, we set the perturbation magnitude =0.03, iteration number k=40, and step size α= /10. We set the number of modes in MN to 2, which achieves the best performance according to the original paper(Deecke et al., 2019). During training, we feed the model with a mixture of clean, 1 , 2 , and ∞ adversarial examples using the same setting as AVG. MBN. The original MBN (Xie et al., 2020; Xie & Yuille, 2020) manually selects the BN branches for clean and adversarial examples during inference, which is infeasible in adversarial defense setting (the model is unaware of the type of inputs). Thus, we add the 2-way gated sub-network in MBN to predict the input domain label; we then keep the following 2 BN branches the same. During training, we compel the clean examples to go through the first BN branch and the adversarial examples (i.e., 1 , 2 , and ∞ ) to the second BN branch. The adversarial examples are generated via PGD using the same setting as AVG.

Runtime analysis of different methods on CIFAR-10 (the lower the better).

Model robustness of LeNet on MNIST over each individual attack (the higher the better).

Model robustness of ResNet-20 on CIFAR-10 over each individual attack (the higher the better).

Model robustness of ResNet-34 on Tiny-ImageNet over each individual attack (the higher the better).In this part, we show the prediction results of the gated sub-network in GBN, i.e., the classification accuracy for the domain of different input samples. The results of LeNet on MNIST and ResNet-20 on CIFAR-10 are presented in Table11. For MNIST, the gated sub-network of LeNet at different layers achieves high prediction accuracy for the input domains (i.e., clean, 1 , 2 , and ∞ adversarial examples)

Robustness evaluation by AutoAttack (the higher the better). We report the clean accuracy, the robust accuracy of the individual attacks as well as the combined one of AutoAttack (denoted AA) usingLeNet,

Model performance of ResNet-20 on CIFAR-10 on different PGD-k attacks (k denotes the iteration steps).

Gated sub-network prediction accuracy on inputs from different domains on different datasets (the higher the better). We demonstrate the prediction accuracy of gated sub-network at each layer from top to bottom. The adversarial examples are generated by PGD-1 , PGD-2 , and PGD-∞ attacks in Appendix D.1.

Model robustness of ResNet-20 on CIFAR-10 using different prediction approaches of the gated sub-network (the higher the better). We use PGD-1 , PGD-2 , and PGD-∞ attacks in Appendix D.1.

Attacks outside the perturbation model of ResNet-20 on CIFAR-10 (the higher the better).

Attacks outside the perturbation model of ResNet-20 on CIFAR-10 using different prediction approaches of the gated sub-network ("h" denotes the hard label and "s" denotes the soft label).GBN 1+ ∞ (s) GBN 2+ ∞ (s) GBN 1 + ∞ (h) GBN 2+ ∞ (h)also provide the prediction accuracy of the gated sub-network in GBN at different layers given adversarial examples above. As can be found in Table16: it is hard to fool all GBN layers within a model. In other words, though some GBN layers are fooled to misclassify the domains of the samples, some other GBN layers could still keep comparatively high prediction accuracy.We then generate adversarial examples by manually selecting the BN branches. Specifically, we generate 1 , 2 , and ∞ adversarial examples through BN 0 , BN 1 , BN 2 , and BN 3 in GBN, respectively. The hyper-parameters for PGD-1 , PGD-2 , and PGD-∞ attacks in Appendix D.1. Note that their statistics are estimated by clean, 1 , 2 , and ∞ adversarial examples during training, respectively. As presented in Table17, the model achieves good robustness on different datasets in this more rigorous white-box attack scenario. In particular, the worst-case accuracies among all branches are largely comparable to the results presented in Table

White-box attacks on CIFAR-10 using ResNet-20 by fooling all the GBN layers.

Gated sub-network prediction accuracy on adversarial examples generated to fool the GBN layers. We demonstrate the prediction accuracy of gated sub-network at each layer from top to bottom. Clean

White-box attacks on MNIST and CIFAR-10 by manually selecting BN branches to generate adversarial examples (BN 0 , BN 1 , BN 2 , and BN 3 denotes the BN branch for clean, 1 , 2 , and ∞ adversarial examples in GBN, respectively).

annex

In the main body of our paper, we calculate the normalized output using the gated sub-network in a soft-gated way (denoted "soft") based on Eqn. 3. In this section, we also try taking the top-1 prediction of g (the hard label) to normalize the output as an alternative approach (denoted "hard"). As shown in Table 12 , the two approaches achieve comparable performance in terms of both robustness and standard accuracy.

E.7 ATTACKS OUTSIDE THE PERTURBATION MODEL

In this section, we present some additional experiments exploring our model's performance on attacks which lying outside the perturbation model.We conduct two experiments: (1) adversarially train a GBN model on 1 and ∞ perturbations, and evaluate it on 2 adversaries (denoted GBN 1 + ∞ ); and (2) adversarially train a GBN model on 2 and ∞ perturbations, and evaluate it on 1 adversaries (denoted GBN 2+ ∞ ). We use PGD-1 , PGD-2 , and PGD-∞ attacks in Appendix D.1. The results are shown in Table 13 .

