MODEL-BASED ROBUST DEEP LEARNING: GENERAL-IZING TO NATURAL, OUT-OF-DISTRIBUTION DATA Anonymous

Abstract

While deep learning (DL) has resulted in major breakthroughs in many applications, the frameworks commonly used in DL remain fragile to seemingly innocuous changes in the data. In response, adversarial training has emerged as a principled approach for improving the robustness of DL against norm-bounded perturbations. Despite this progress, DL is also known to be fragile to unbounded shifts in the data distribution due to many forms of natural variation, including changes in weather or lighting in images. However, there are remarkably few techniques that can address robustness to natural, out-of-distribution shifts in the data distribution in a general context. To address this gap, we propose a paradigm shift from perturbation-based adversarial robustness to model-based robust deep learning. Critical to our paradigm is to obtain models of natural variation, which vary data over a range of natural conditions. Then by exploiting these models, we develop three novel model-based robust training algorithms that improve the robustness of DL with respect to natural variation. Our extensive experiments show that across a variety of natural conditions in twelve distinct datasets, classifiers trained with our algorithms significantly outperform classifiers trained via ERM, adversarial training, and domain adaptation techniques. Specifically, when training on ImageNet and testing on various subsets of ImageNet-c, our algorithms improve over baseline methods by up to 30 percentage points in top-1 accuracy. Further, we show that our methods provide robustness (1) against natural, out-ofdistribution data, (2) against multiple simultaneous distributional shifts, and (3) to domains entirely unseen during training.

1. INTRODUCTION

The last decade has seen remarkable progress in deep learning (DL), which has prompted widescale integration of DL frameworks into myriad application domains (LeCun et al., 2015) . In many of these applications, and in particular in safety-critical domains, it is essential that the DL systems are robust and trustworthy (Dreossi et al., 2019) . However, it is now well-known that DL is fragile to seemingly innocuous changes to the input data (Szegedy et al., 2013) . Indeed, well-documented examples of fragility to carefully-designed noise can be found in a variety of contexts, including image classification (Madry et al., 2017) , clinical trials (Papangelou et al., 2018) , and robotics (Melis et al., 2017) . Accordingly, a number of adversarial training algorithms (Goodfellow et al., 2014b; Wong & Kolter, 2017) as well as certifiable defenses (Raghunathan et al., 2018; Fazlyab et al., 2019a) have recently been proposed, which have provided a rigorous framework for improving the robustness of DL against norm-bounded perturbations (Fazlyab et al., 2019b; Dobriban et al., 2020) . Despite this encouraging progress, very recent papers have unanimously shown that DL is also fragile to unbounded shifts in the data-distribution due to a wide range of natural phenomena (Djolonga et al., 2020; Taori et al., 2020; Hendrycks et al., 2020; Hendrycks & Dietterich, 2019) . For example, in image classification, such shifts include changes due to lighting, blurring, or weather conditions (Pei et al., 2017; Chernikova et al., 2019) . However, there are remarkably few general, principled techniques that have been shown to provide robustness against these forms of out-of-distribution, natural variation (Hendrycks et al., 2019a) . Furthermore, as these unseen distributional shifts are arguably more common in safety-critical domains, the task of designing algorithms that generalize to natural, out-of-distribution data is an important and novel challenge for the DL community. (b) Natural variation. In this paper, we study robustness with respect to natural variation. For example, differences in weather conditions such as snow illustrate one form of natural variation. 1a . In this paper, we focus on robustness with respect to natural variation, shown in Figure 1b , which often does not obey perceptual or norm-bounded constraints. In this paper, we propose a paradigm shift from perturbation-based adversarial robustness to modelbased robust deep learning. Our goal is to provide principled, general algorithms that can be used to train neural networks to be robust against natural, out-of-distribution shifts in data. Our experiments show that across a variety of challenging, naturally-occurring conditions, such as variation in lighting, haze, rain, and snow, and across various datasets, including SVHN, GTSRB, CURE-TSR, and ImageNet, classifiers trained with our model-based algorithms significantly outperform standard DL baselines, adversarially-trained classifiers, and, when applicable, domain adaptation methods. Contributions. The contributions of this paper can be summarized as follows: • Paradigm shift. We propose a paradigm shift from perturbation-based robustness to model-based robust deep learning, where models of natural variation express changes due to natural conditions. • Optimization-based formulation. We formulate a novel model-based robust training problem by constructing a general robust optimization procedure to search for challenging natural variation. • Models of natural variation. For many challenging forms of natural variation, we use deep generative models to learn models of natural variation that are consistent with realistic conditions. • Novel algorithms. We propose a family of novel robust training algorithms that exploit models of natural variation to improve the robustness of DL against worst-case natural variation. • Out-of-distribution robustness. We show that our algorithms are the first to consistently provide robustness against natural, out-of-distribution shifts that frequently occur in real-world environments, including snow, rain, fog, and brightness on SVHN, GTSRB, CURE-TSR, and ImageNet. • ImageNet-c robustness. We show that our algorithms can significantly improve the robustness of classifiers trained on ImageNet and tested on ImageNet-c by as much as 30 percentage points. • Robustness to simultaneous distributional shifts. We show that our methods are composable and can improve robustness to multiple simultaneous sources of natural variation. To evaluate this feature, we curate several new datasets, each of which has two simultaneous distributional shifts. • Robustness to unseen domains. We show that models of natural variation can be reused on datasets that are entirely unseen during training to improve out-of-distribution generalization.

2. PERTURBATION-BASED ROBUSTNESS: APPROACHES AND LIMITATIONS

In this paper, we consider a standard classification task in which training data (x, y) ⇠ D is distributed according to a joint distribution D over instances x 2 R d and labels y 2 [k] := {0, 1, . . . , k}. In this setting, given a finite training sample drawn i.i.d. from D, the goal of the learning problem is to obtain a classifier f w parameterized by weights w 2 R p such that f w can correctly predict the labels y corresponding to new instances x drawn i.i.d. from D. In practice, one can learn f w by approximately solving the non-convex empirical risk-minimization (ERM) problem arg min w E[`(x, y; w)] where `is a suitable loss-function. However, neural networks trained using ERM are known to be susceptible to adversarial attacks. This means that given a datum x with a corresponding label y, one can find another datum x adv such that x is close to x adv in a given Euclidean norm and x adv is predicted by the learned classifier as belonging to a different class c 6 = y. If such a datum x adv exists, it is called an adversarial example. This is illustrated in Figure 1a ; although these pandas look identical, they were classified differently in (Goodfellow et al., 2014b) . The dominant paradigm toward improving robustness against adversarial examples relies on a robust optimization perspective wherein neural networks are trained to correctly classify worst-case perturbations of data (Madry et al., 2017; Wong & Kolter, 2017) . This can be formulated as follows: arg min w E (x,y)⇠D h max 2 `(x + , y; w) i (1) We can think of (1) as comprising two coupled optimization problems: an inner maximization problem in which we seek a challenging perturbation and an outer minimization problem in which we seek weights that lead to strong classification performance. Limitations of perturbation-based robustness. Despite remarkable progress toward improving the robustness of DL against norm-bounded perturbations, there are significant limitations to adversarial training. Notably, DL is known to be fragile to many forms of natural variation, which cannot be described by small perturbations x 7 ! x + . In image classification, such natural variation includes changes in weather or background color (Eykholt et al., 2018; Hendrycks et al., 2019b; Hosseini & Poovendran, 2018) , spatial transformations such as rotation or scaling (Xiao et al., 2018b; Karianakis et al., 2016) , and sensor-based attacks (Kurakin et al., 2016) . Because such transformations frequently arise in safety-critical domains, it is critically important for the DL community to develop algorithms that are robust against out-of-distribution, natural variation in data. In this paper, we specifically address this challenge by proposing a principled, optimization-based framework which can be used in general settings to provide robustness against arbitrary sources of natural variation.

3. A NEW ROBUSTNESS PARADIGM: MODEL-BASED ROBUST DEEP LEARNING

Underlying the task of improving the robustness of neural networks against natural, out-ofdistribution data are two fundamental challenges. Firstly, unlike in the adversarial robustness community, in real-world, safety-critical environments, data can vary in unknown and highly nonlinear ways. Thus, the first step toward building a robust training procedure must be to design a mechanism that accurately describes how data varies in such environments. Next, assuming a suitable model of natural variation, the second challenge is to formulate a training procedure that exploits this model toward improving robustness. In this section, we present novel solutions to each of these challenges.

3.1. MODELS OF NATURAL VARIATION

In order to effectively model sources of natural variation in a domain-agnostic setting, we will abstractly define models of natural variation. Concretely, a model of natural variation G(x, ) is a map that describes how an input datum x can be naturally varied by a nuisance parameter resulting in a new datum x 0 := G(x, ). Ideally, for a fixed datum x, varying the nuisance parameter should vary the severity of the natural conditions in the generated datum x 0 . An example of such a model is shown in Figure 2 , where an image x on the left (in this case, in sunny weather) can be naturally varied by and consequently transformed into the image x 0 on the right (in snowy weather). In the remainder of this subsection, we consider cases in which (1) a model G is known a priori, and (2) a model G is unknown and therefore must be learned offline from data. In this second case in which models of natural variation must be learned, we propose a method for obtaining these models. Known models of natural variation. In many problems, a model G(x, ) is known a priori due to intrinsic geometric structure. For example, there is underlying structure that describes how data can be rotated, translated, or scaled; models for rotating an image can be characterized by G(x, ) = R( )x where R( ) is a rotation matrix and 2 := [0, 2⇡). In prior work, this idea has been used to train classifiers to be robust to rotation and scaling (Engstrom et al., 2017; Kamath et al., 2020) . Learning models of natural variation from data. In many situations, models natural variation are not known a priori or are too costly to obtain. For example, consider Figure 2 in which a model G(x, ) takes an image x of a street sign in sunny weather and maps it to an image x 0 := G(x, ) in snowy weather. Even though there is a relationship between the two images, obtaining a model G relating these two domains is extremely challenging if we resort to geometric structure. For such problems we advocate for learning the model G from data. To do so, we assume that we have access to two unpaired domains A and B that are drawn from a common distribution. Domain A contains the original data, such as the images with sunny weather, and domain B contains naturally transformed data, such as images with snow. Ideally, a model of natural variation should learn to map images from domain A to corresponding images with different levels of natural variation captured by the images of domain B. In our experiments section, we rely on the MUNIT framework (Huang et al., 2018) , which combines two autoencoders and two generative adversarial networks (Goodfellow et al., 2014a) , to learn mappings between domains A and B. Furthermore, we note that many choices unpaired, unconditional image-to-image translation networks satisfy our criteria for G, and in future work we plan to investigate the efficacy of these architectures. In Appendix A, we describe parallel experiments that we carried out with two other architectural choices for G, and we fully characterize the MUNIT architecture used in our experiments.

3.2. MODEL-BASED ROBUST TRAINING FORMULATION

The model-based robust training paradigm that we propose retains the basic elements of adversarial training described in Section 2. Our point of departure from the classical adversarial training formulation is in the choice of the so-called adversarial perturbation. In this paper, we assume that data can be transformed according to a model of natural variation G(x, ) by choosing different values of from a given nuisance space . The goal of the model-based approach is to train a classifier that achieves high accuracy both on a test set drawn i.i.d. from D and on more-challenging test data that has been subjected to the source of natural variation that G models. This perspective can be captured by the following robust optimization problem: min w E (x,y)⇠D h max 2 `(G(x, ), y; w) i . In the inner maximization problem of this formulation, given an instance-label pair (x, y), we seek a vector ⇤ 2 that produces a corresponding instance x 0 := G(x, ⇤ ) which gives rise to high loss values `(G(x, ⇤ ), y; w) under the current weight w. One can think of this vector ⇤ as characterizing the worst-case nuisance that can be generated by the model G(x, ⇤ ) for the original instance x. After solving this inner problem, we solve the outer minimization problem by finding weights w that minimize the risk against the challenging instance G(x, ⇤ ). By training the network to correctly classify this worst-case data, the goal is to become invariant to the model G(x, ) for any 2 .

4. MODEL-BASED TRAINING ALGORITHMS

We now assume that we have access to a suitable model of natural variation G(x, ) and shift our attention toward exploiting G in the development of novel robust training algorithms. In the empirical version of (2), rather than assuming access to the full joint distribution D, we assume that we are given given a set of i.i.d. samples D n := {(x j , y j )} n j=1 drawn from D. Thus we have: w ? 2 arg min w2R p n X j=1 h max 2 `(G (x j , ) , y j ; w) i . Note that when w parameterizes a neural network, (3) is a nonconvex-nonconcave min-max problem, which is difficult to solve exactly. for minibatch B m := {(x 1 , y 1 ), (x 2 , y 2 ), . . . , (x m , y m )} ⇢ D n do 3: Initialize := ( 1 , 2 , . . . , m ) (0 q , 0 q , . . . , 0 q ) 4: for k steps do 5: g r P m j=1 `(G(x j , j ), y j ; w) 6: ⇧ [ + ↵g] # ⇧ denotes projection onto the set 7: end for 8: g r w P m j=1 [`(G(x j , j ), y j ; w) + • `(x j , y j ; w)] 9: w Update(g, w) # Update can be SGD, Adam, Adadelta, etc. 10: end for 11: until convergence end, each of these algorithms uses SGD to solve the outer problem; the methods differ in how they seek solutions to the inner problem, and in what follows, we describe each of these procedures in more detail. In each algorithm, given a datum (x, y), the solution ? to the inner problem is used to create a new datum (G(x, ? ), y) that is added to the training set before solving the outer problem.

Model-based Adversarial Training.

In MAT, we seek an exact solution to the inner problem by performing k steps of gradient ascent in on the objective `(G(x, ), y; w). The resulting nuisance parameter ? is one that causes `(G(x, ? ), y; w) to have high loss under the current weight w.

Model-based Robust Training.

In MRT, we first randomly sample i 2 for i 2 [k]. We then select the i ? 2 [k] such that `(G(x, i ? ), y; w) is maximized. In this way, rather than exactly solving the inner problem, MRT uses a sampling-based approach to finding challenging data G(x, i ? ). Model-based Data Augmentation. In MDA, rather than explicitly trying to solve the inner problem, we seek a diversity of naturally-varying data rather than the "worst-case." In this way, MDA samples i 2 for i 2 [k] and then appends {G(x, i ), y} k i=1 to the training dataset.

5. EXPERIMENTS

We present experiments in five different settings over twelve distinct datasets to demonstrate the broad applicability of MBRDL. First, in Sections 5.1-5.2, we show that our algorithms are the first to consistently provide out-of-distribution robustness across a range of challenging corruptions, including shifts in brightness, contrast, snow, fog, frost, and haze on CURE-TSR, ImageNet, and ImageNet-c. In Section 5.3, we curate several new datasets containing simultaneous sources of natural variation, and we then show that models of natural variation can be composed to provide robustness against these simultaneous shifts. In Section 5.4, we show that models of natural variation trained on a fixed dataset can be reused to provide robustness on datasets entirely unseen during training. Finally, in Section 5.5, we assume access to unlabeled data corresponding to a fixed domain shift, and we compare our algorithms to suitable baselines, including domain adaptation methods. Throughout these experiments, we use the notation "source (A!B)" to denote a distributional shift from domain A to domain B. For example, "contrast (low!high)" will denote a shift from low-contrast to high-contrast. Images from domains A and B for each of the shifts used in this paper are available in Appendix A. We note that our experiments contain domains with both natural and artificially-generated variation; details concerning how we extracted non-artificial variation can be found in Appendix D. Architecture and hyperparameter details are given in Appendix C.

5.1. OUT-OF-DISTRIBUTION ROBUSTNESS

In many applications, one might have data corresponding to low levels of natural variation, such as a dusting of snow in images of street signs. However, it is often difficult to collect data corresponding to high levels of natural variation, such as images taken during a blizzard. In such cases, we show that our algorithms can be used to provide significant out-of-distribution robustness against data with high levels of natural variation by training on data with relatively low levels of the same source of natural variation. To do so, we use data from the CURE-TSR dataset (Temel et al., 2019) , which contains images of street signs divided into subsets according to various sources of natural variation and corresponding severity levels. For example, for images in the "snow" subset, level 0 corresponds to no snow, whereas level 5 corresponds to a full blizzard. Thus, for each row of Table 1 , we use unlabeled data from levels 0 and 2 to learn a model of natural variation corresponding to a given source of natural variation in CURE-TSR. We then train classifiers using MDA with labeled level 0 data. We also train classifiers using ERM and PGD using the labeled data from levels 0 and 2. We then test all classifiers on data from levels 3, 4, and 5. Note that while this is an unfair comparison for our methods, given that the model-based algorithms are not given access to labeled level 2 data, our algorithms still outperform the baselines by as much as 20 percentage points on level 5 data.

5.2. MODEL-BASED ROBUSTNESS ON THE SHIFT FROM IMAGENET TO IMAGENET-C

To demonstrate the scalability of our approach, we perform experiments on ImageNet (Deng et al., 2009) and the recently-curated ImageNet-c dataset (Hendrycks & Dietterich, 2019) . ImageNet-c contains images from the ImageNet test set that are corrupted according to artificial transformations, such as snow, rain, and fog, and are labeled from 1-5 depending on the severity of the corruption. For numerous challenging corruptions, we train models to map from the classes 0-9 of ImageNet to the corresponding classes of ImageNet-c. We then train all networks on classes 10-59 of ImageNet, and test on the corresponding classes for various subsets of ImageNet-c. Note that in this setting, the ImageNet classes used to train the model of natural variation are disjoint from those that are used to train the classifier, so many techniques, including most domain adaptation methods, do not apply; to offer a point of comparison, we include the accuracies of classifiers trained using AugMix, which is a recently proposed method that adds known transformations to the data (Hendrycks et al., 2019a) .

5.3. ROBUSTNESS TO SIMULTANEOUS DISTRIBUTIONAL SHIFTS

In practice, it is common to encounter multiple simultaneous distributional shifts. For example, in image classification, there may be shifts in both brightness and contrast; yet while there may be examples corresponding to shifts in either brightness or contrast in the training data, there may not be any examples of both shifts occurring simultaneously. To address this robustness challenge, for each row of Table 3 , we learn two models of natural variation G 1 and G 2 using unlabeled training data corresponding to two separate shifts, which map domains A 1 !B 1 (e.g. low-to highbrightness) and A 2 !B 2 (e.g. low-to high-contrast). We then compose these models to form a new model G(x, ) = G 1 (G 2 (x, ), ) which can be used to provide robustness against both shifts simultaneously. We then train classifiers on labeled data from A 1 \A 2 and test on data from B 1 \B 2 . To create the data from B 1 \ B 2 for the ImageNet experiments, we apply pairs of transformations that were originally used to create the ImageNet-c datasets; more details are in Appendix D.

5.4. TRANSFERABILITY OF MODEL-BASED ROBUSTNESS

Because we learn models of natural variation offline before training a classifier, our paradigm can be applied to domains that are entirely unseen while training the model. In particular, we show that models can be reused on similar yet unseen datasets to provide robustness against a common source of natural variation. For example, one might have access to two domains corresponding to the shift from images of European street signs taken during the day to images taken at night. However, one might wish to provide robustness against the same shift from daytime to nighttime on a new dataset of American street signs without access to any nighttime images in this new dataset. Whereas many techniques, including most domain adaptation methods, do not apply in this scenario, in the MBRDL paradigm, we can simply learn a model corresponding to the changes in lighting for the European street signs and then apply this model to the dataset of the American signs. Table 4 shows several experiments of this stripe in which a model G is learned on one dataset D 1 and then applied on another D 2 ; we improve robustness on unseen domains by up to 40 percentage points.

5.5. MODEL-BASED ROBUST DEEP LEARNING FOR UNSUPERVISED DOMAIN ADAPTATION

While our approach does not require labeled data from domain B, when such data is available, it is of interest to evaluate how our approach compares to relevant methods such as domain adaptation. In Table 5 , for each shift from domain A to B, we assume access to labeled data from domain A and unlabeled data from domain B. In each row, we use unlabeled data from both domains to train a model of natural variation. We then train classifiers using our algorithms, as well with ERM and PGD, using data from domain A and test on data from the test set for domain B. Furthermore, we compare to ADDA, which is a well-known domain adaptation method (Tzeng et al., 2017) . In every scenario, our model-based algorithms significantly outperform the baselines, often by 10-20 percentage points. Note that while this is one of the most commonly studied settings in domain adaptation, it represents only one particular setting to which the MBRDL paradigm can be applied.

6. RELATED WORK

Aside from the algorithms we introduced in Section 4, we are not aware of any other algorithms that can be used to address out-of-distribution robustness across the diverse array of tasks presented in the previous section. However, several lines of research have sought to address this problem in constrained settings or under highly restrictive assumptions. In the domain adaptation literature, various methods have been proposed which rely on the restrictive assumption that unlabeled data corresponding to a fixed distributional shift is available during training (Tzeng et al., 2017; Ajakan et al., 2014; Ganin & Lempitsky, 2015) . Unlike these approaches, our solution does not assume access to unlabeled data from a fixed shift and can be applied to datasets that are entirely unseen during training. Furthermore, several works have used generative models to create adversarial perturbations (Xiao et al., 2018a; Lee et al., 2017; Wang & Yu, 2019; Samangouei et al., 2018; Jalal et al., 2017) or perceptually-realistic images subject to relatively simple corruptions in specific application domains (Dunn et al., 2019; Song et al., 2018; Vandenhende et al., 2019; Arruda et al., 2019) . On the other hand, our approach is broadly applicable to arbitrary and challenging sources of natural variation. Two concurrent works formulate robust training procedures assuming that data is corrupted according to a fixed generative architecture. Gowal et al. (2020) exploit properties specific to the StyleGAN architecture to formulate a training algorithm that provides robustness against color-based shifts on MNIST and CelebA. In our work, we propose a more general framework and three novel robust training algorithms that can exploit any suitable generative network, and we show improvements on more challenging, naturally-occurring shifts across twelve distinct datasets. Wong & Kolter (2020) use conditional VAEs to learn perturbation sets corresponding to simple corruptions from pairs of images. In our framework we improve robustness against more challenging, natural shifts by learning from unpaired datasets and we do not rely on class-conditioning to generate realistic images.

7. CONCLUSION

In this paper, we proposed a novel model-based robust training paradigm for deep learning that provides robustness with respect to models of natural variation. Our notion of robustness offers a departure from adversarial training with respect to norm-bounded data perturbations. In our experiments, we show that our paradigm can provide significant out-of-distribution robustness on many challenging distributional shifts. Furthermore, our paradigm can provide robustness against multiple simultaneous distribution shifts and on domains that are entirely unseen while training the model, and shows significant out-of-distribution robustness as datasets become more challenging.



(a) Perturbation-based adversarial example. In a perturbation-based robustness setting, an input datum (left) is perceptually indistinguishable from a corresponding adversarial example (right).

Figure 1: A new notion of robustness. Past work has focused on perturbation-based adversarial examples, such as Figure1a. In this paper, we focus on robustness with respect to natural variation, shown in Figure1b, which often does not obey perceptual or norm-bounded constraints.

(a) Models take the form G(x, ), where is a nuisance parameter that describes how the output image x 0 := G(x, ) is varied. (b) Input image x and corresponding generated images for a learned model of natural variation on ImageNet.

Figure 2: In this paper, we introduce models of natural variation to describe natural transformations.

Model-based Adversarial Training (MAT)Input: Data sample D n = {(x j , y j )}

Out-of-distribution robustness. In each experiment, we train a model of natural variation to map from challenge-level 0 to challenge-level 2 data from different subsets of CURE-TSR. We then perform model-based training using challenge-level 0 data and test on challenge-levels 3-5.

ImageNet to ImageNet-c robustness. In each experiment, we train a model of natural variation to map from classes 0-9 of ImageNet to the same classes from a subset of ImageNet-c. Next, we use this model to perform model-based training on classes 10-59 of ImageNet, and we test each network on classes 10-59 from the same subset ImageNet-c on which the model was trained.

Composing models of natural variation. We consider shifts in two distinct and simultaneous sources of natural variation. To perform model-based training, we compose two models of natural variation trained separately on each of the two sources of natural variation.

Transferability of model-based robustness. In each experiment, we train a model of natural variation on a given training dataset D 1 . Then, we use this model to perform model-based training on a new dataset D 1 entirely unseen during the training of the model.

In each experiment, we assume access to unlabeled data from domain B, which we use to train a model of natural variation. We compare to suitable baselines, including domain adaptation.

