PREPARE FOR THE WORST: GENERALIZING ACROSS DOMAIN SHIFTS WITH ADVERSARIAL BATCH NOR-MALIZATION

Abstract

Adversarial training is the industry standard for producing models that are robust to small adversarial perturbations. However, machine learning practitioners need models that are robust to other kinds of changes that occur naturally, such as changes in the style or illumination of input images. Such changes in input distribution have been effectively modeled as shifts in the mean and variance of deep image features. We adapt adversarial training by adversarially perturbing these feature statistics, rather than image pixels, to produce models that are robust to distributional shifts. We also visualize images from adversarially crafted distributions. Our method, Adversarial Batch Normalization (AdvBN), significantly improves the performance of ResNet-50 on ImageNet-C (+8.1%), Stylized-ImageNet (+6.7%), and ImageNet-Instagram (+3.9%) over standard training practices. In addition, we demonstrate that AdvBN can also improve generalization on semantic segmentation. ImageNet ImageNet-C Stylized-ImageNet ImageNet-AdvBN ImageNet-Instagram 89.6% goldfinch 7.4% sulphur butterfly 0.5% hummingbird 99.9% goldfinch 0.05% bulbul 0.02% house finch 57.4% goldfinch 11.8% brambling 8.8% guillotine 16.2% gong 8.8% bolete 4.5% fox squirrel 10.3% hen-of-the-woods 5.1% Ibizan hound 4.0% flamingo

1. INTRODUCTION

Robust optimization for neural networks has been a major focus of recent research. A mainstream approach to reducing the brittleness of classifiers is adversarial training, which solves a min-max optimization problem in which an adversary makes perturbations to images to degrade network performance, while the network adapts its parameters to resist degradation (Goodfellow et al., 2015; Kurakin et al., 2017; Madry et al., 2018) . The result is a hardened network that is no longer brittle to small perturbations to input pixels. While adversarial training makes networks robust to adversarial perturbations, it does not address other forms of brittleness that plague vision systems. For example, shifts in image style, lighting, color mapping, and domain shifts can still severely degrade the performance of neural networks (Hendrycks & Dietterich, 2019) . We propose adapting adversarial training to make neural networks robust to changes in image style and appearance, rather than small perturbations at the pixel level. We formulate a min-max game in which an adversary chooses adversarial feature statistics, and network parameters are then updated to resist these changes in feature space that correspond to appearance differences of input images. This game is played until the network is robust to a variety of changes in image space including texture, color, brightness, etc. The idea of adversarial feature statistics is inspired by the observation that the mean and variance of features maps encode style information, and thus, they enable the transfer of style information from a source image to a target image through normalization (Huang & Belongie, 2017; Ulyanov et al., 2016) . Unlike standard approaches that rely on feature statistics from auxiliary images to define an image style, we use adversarial optimization of feature statistics to prepare classifiers for the worst-case style that they might encounter. We propose training with Adversarial Batch Normalization (AdvBN) layers. Before each gradient update, the AdvBN layers perform an adversarial feature shift by re-normalizing with the most damaging mean and variance. By using these layers in a robust optimization framework, we create networks which are resistant to any domain shift caused by feature statistics shift. An advantage of this method is that it does not require additional auxiliary data from new domains. We show that robust training with AdvBN layers hardens classifiers against changes in image appearance and style using a range of vision tasks including Stylized-ImageNet (Geirhos et al., 2019) and ImageNet-Instagram (Wu et al., 2020) .

2. BACKGROUND 2.1 FEATURE NORMALIZATION

Feature normalization is an important component of modern neural networks that stabilizes training and improves model generalization. Let f ∈ R N ×C×H×W denote feature maps output by a layer, where N is the batch size, C is the number of channels, and H and W represent the height and width of the feature maps, respectively. Different normalization methods compute the mean, µ, and standard deviation, σ, over different dimensions of the feature maps. They use the derived feature statistics, often along with learned multiplicative and additive parameters, to produce normalized features, f : f = γ • f -µ(f ) σ(f ) + β, where γ and β are learnable parameters which re-scale and shift normalized features. Although feature normalization was originally proposed to accelerate the training process (Bjorck et al., 2018 ), previous work (Huang & Belongie, 2017; Li et al., 2017) has shown that feature statistics effectively capture information concerning the appearance of images. Motivated by this observation, we impose uncertainty on these statistics during training in order to obtain models that are less sensitive to non-semantic characteristics, thus generalizing to images with different appearances.

2.2. ADVERSARIAL TRAINING

Untargeted adversarial examples are generated by maximizing classification loss with respect to the input. One popular method, projected gradient descent (PGD), involves performing gradient ascent in the signed gradient direction and projecting the perturbation in order to enforce an ∞ -norm constraint (Madry et al., 2018) . Adversarial training aims to solve the saddlepoint optimization problem, min θ E (X,y)∼D max δ p < L(g θ (X + δ), y) , where g θ is a model with parameter vector θ, X, y is a clean input and the corresponding label drawn from distribution D, and L denotes cross-entropy loss. Adversarial training solves this problem by iteratively sampling a batch of data, perturbing the batch adversarially, and performing a parameter update on the new adversarial batch (Madry et al., 2018) . We harness adversarial training in order to create models robust to distributional shifts rather than the standard pixel-wise adversarial attacks.



Figure 1: Images from ImageNet variants along with classification scores by a pre-trained ResNet-50 model. The right-most image is generated by our Adversarial Batch Normalization module. Details on how we generate this image can be found in Section 3.

For example, Batch Normalization (BN) (Ioffe & Szegedy, 2015) estimates feature statistics along the N, H, W dimensions. On the other hand, Instance Normalization (IN) (Ulyanov et al., 2016) computes µ and σ for each individual sample in the batch and only normalizes across the H and W dimensions.

