MODEL-BASED ROBUST DEEP LEARNING: GENERAL-IZING TO NATURAL, OUT-OF-DISTRIBUTION DATA Anonymous

Abstract

While deep learning (DL) has resulted in major breakthroughs in many applications, the frameworks commonly used in DL remain fragile to seemingly innocuous changes in the data. In response, adversarial training has emerged as a principled approach for improving the robustness of DL against norm-bounded perturbations. Despite this progress, DL is also known to be fragile to unbounded shifts in the data distribution due to many forms of natural variation, including changes in weather or lighting in images. However, there are remarkably few techniques that can address robustness to natural, out-of-distribution shifts in the data distribution in a general context. To address this gap, we propose a paradigm shift from perturbation-based adversarial robustness to model-based robust deep learning. Critical to our paradigm is to obtain models of natural variation, which vary data over a range of natural conditions. Then by exploiting these models, we develop three novel model-based robust training algorithms that improve the robustness of DL with respect to natural variation. Our extensive experiments show that across a variety of natural conditions in twelve distinct datasets, classifiers trained with our algorithms significantly outperform classifiers trained via ERM, adversarial training, and domain adaptation techniques. Specifically, when training on ImageNet and testing on various subsets of ImageNet-c, our algorithms improve over baseline methods by up to 30 percentage points in top-1 accuracy. Further, we show that our methods provide robustness (1) against natural, out-ofdistribution data, (2) against multiple simultaneous distributional shifts, and (3) to domains entirely unseen during training.

1. INTRODUCTION

The last decade has seen remarkable progress in deep learning (DL), which has prompted widescale integration of DL frameworks into myriad application domains (LeCun et al., 2015) . In many of these applications, and in particular in safety-critical domains, it is essential that the DL systems are robust and trustworthy (Dreossi et al., 2019) . However, it is now well-known that DL is fragile to seemingly innocuous changes to the input data (Szegedy et al., 2013) . Indeed, well-documented examples of fragility to carefully-designed noise can be found in a variety of contexts, including image classification (Madry et al., 2017 ), clinical trials (Papangelou et al., 2018 ), and robotics (Melis et al., 2017) . Accordingly, a number of adversarial training algorithms (Goodfellow et al., 2014b; Wong & Kolter, 2017) as well as certifiable defenses (Raghunathan et al., 2018; Fazlyab et al., 2019a) have recently been proposed, which have provided a rigorous framework for improving the robustness of DL against norm-bounded perturbations (Fazlyab et al., 2019b; Dobriban et al., 2020) . Despite this encouraging progress, very recent papers have unanimously shown that DL is also fragile to unbounded shifts in the data-distribution due to a wide range of natural phenomena (Djolonga et al., 2020; Taori et al., 2020; Hendrycks et al., 2020; Hendrycks & Dietterich, 2019) . For example, in image classification, such shifts include changes due to lighting, blurring, or weather conditions (Pei et al., 2017; Chernikova et al., 2019) . However, there are remarkably few general, principled techniques that have been shown to provide robustness against these forms of out-of-distribution, natural variation (Hendrycks et al., 2019a) . Furthermore, as these unseen distributional shifts are arguably more common in safety-critical domains, the task of designing algorithms that generalize to natural, out-of-distribution data is an important and novel challenge for the DL community. (b) Natural variation. In this paper, we study robustness with respect to natural variation. For example, differences in weather conditions such as snow illustrate one form of natural variation. In this paper, we propose a paradigm shift from perturbation-based adversarial robustness to modelbased robust deep learning. Our goal is to provide principled, general algorithms that can be used to train neural networks to be robust against natural, out-of-distribution shifts in data. Our experiments show that across a variety of challenging, naturally-occurring conditions, such as variation in lighting, haze, rain, and snow, and across various datasets, including SVHN, GTSRB, CURE-TSR, and ImageNet, classifiers trained with our model-based algorithms significantly outperform standard DL baselines, adversarially-trained classifiers, and, when applicable, domain adaptation methods.

Contributions. The contributions of this paper can be summarized as follows:

• Paradigm shift. We propose a paradigm shift from perturbation-based robustness to model-based robust deep learning, where models of natural variation express changes due to natural conditions. • Optimization-based formulation. We formulate a novel model-based robust training problem by constructing a general robust optimization procedure to search for challenging natural variation. • Models of natural variation. For many challenging forms of natural variation, we use deep generative models to learn models of natural variation that are consistent with realistic conditions. • Novel algorithms. We propose a family of novel robust training algorithms that exploit models of natural variation to improve the robustness of DL against worst-case natural variation. • Out-of-distribution robustness. We show that our algorithms are the first to consistently provide robustness against natural, out-of-distribution shifts that frequently occur in real-world environments, including snow, rain, fog, and brightness on SVHN, GTSRB, CURE-TSR, and ImageNet. • ImageNet-c robustness. We show that our algorithms can significantly improve the robustness of classifiers trained on ImageNet and tested on ImageNet-c by as much as 30 percentage points. • Robustness to simultaneous distributional shifts. We show that our methods are composable and can improve robustness to multiple simultaneous sources of natural variation. To evaluate this feature, we curate several new datasets, each of which has two simultaneous distributional shifts. • Robustness to unseen domains. We show that models of natural variation can be reused on datasets that are entirely unseen during training to improve out-of-distribution generalization.

2. PERTURBATION-BASED ROBUSTNESS: APPROACHES AND LIMITATIONS

In this paper, we consider a standard classification task in which training data (x, y) ⇠ D is distributed according to a joint distribution D over instances x 2 R d and labels y 2 [k] := {0, 1, . . . , k}. In this setting, given a finite training sample drawn i.i.d. from D, the goal of the learning problem is to obtain a classifier f w parameterized by weights w 2 R p such that f w can correctly predict the labels y corresponding to new instances x drawn i.i.d. from D. In practice, one can learn f w by approximately solving the non-convex empirical risk-minimization (ERM) problem arg min w E[`(x, y; w)] where `is a suitable loss-function. However, neural networks trained using ERM are known to be susceptible to adversarial attacks. This means that given a datum x with a corresponding label y, one can find another datum x adv such that x is close to x adv in a given Euclidean norm and x adv is predicted by the learned classifier as belonging to a different class c 6 = y. If such a datum x adv exists, it is called an adversarial example. This is illustrated in Figure 1a ; although these pandas look identical, they were classified differently in (Goodfellow et al., 2014b) .



(a) Perturbation-based adversarial example. In a perturbation-based robustness setting, an input datum (left) is perceptually indistinguishable from a corresponding adversarial example (right).

Figure 1: A new notion of robustness. Past work has focused on perturbation-based adversarial examples, such as Figure1a. In this paper, we focus on robustness with respect to natural variation, shown in Figure1b, which often does not obey perceptual or norm-bounded constraints.

