MODEL-BASED ROBUST DEEP LEARNING: GENERAL-IZING TO NATURAL, OUT-OF-DISTRIBUTION DATA Anonymous

Abstract

While deep learning (DL) has resulted in major breakthroughs in many applications, the frameworks commonly used in DL remain fragile to seemingly innocuous changes in the data. In response, adversarial training has emerged as a principled approach for improving the robustness of DL against norm-bounded perturbations. Despite this progress, DL is also known to be fragile to unbounded shifts in the data distribution due to many forms of natural variation, including changes in weather or lighting in images. However, there are remarkably few techniques that can address robustness to natural, out-of-distribution shifts in the data distribution in a general context. To address this gap, we propose a paradigm shift from perturbation-based adversarial robustness to model-based robust deep learning. Critical to our paradigm is to obtain models of natural variation, which vary data over a range of natural conditions. Then by exploiting these models, we develop three novel model-based robust training algorithms that improve the robustness of DL with respect to natural variation. Our extensive experiments show that across a variety of natural conditions in twelve distinct datasets, classifiers trained with our algorithms significantly outperform classifiers trained via ERM, adversarial training, and domain adaptation techniques. Specifically, when training on ImageNet and testing on various subsets of ImageNet-c, our algorithms improve over baseline methods by up to 30 percentage points in top-1 accuracy. Further, we show that our methods provide robustness (1) against natural, out-ofdistribution data, (2) against multiple simultaneous distributional shifts, and (3) to domains entirely unseen during training.

1. INTRODUCTION

The last decade has seen remarkable progress in deep learning (DL), which has prompted widescale integration of DL frameworks into myriad application domains (LeCun et al., 2015) . In many of these applications, and in particular in safety-critical domains, it is essential that the DL systems are robust and trustworthy (Dreossi et al., 2019) . However, it is now well-known that DL is fragile to seemingly innocuous changes to the input data (Szegedy et al., 2013) . Indeed, well-documented examples of fragility to carefully-designed noise can be found in a variety of contexts, including image classification (Madry et al., 2017 ), clinical trials (Papangelou et al., 2018) , and robotics (Melis et al., 2017) . Accordingly, a number of adversarial training algorithms (Goodfellow et al., 2014b; Wong & Kolter, 2017) as well as certifiable defenses (Raghunathan et al., 2018; Fazlyab et al., 2019a) have recently been proposed, which have provided a rigorous framework for improving the robustness of DL against norm-bounded perturbations (Fazlyab et al., 2019b; Dobriban et al., 2020) . Despite this encouraging progress, very recent papers have unanimously shown that DL is also fragile to unbounded shifts in the data-distribution due to a wide range of natural phenomena (Djolonga et al., 2020; Taori et al., 2020; Hendrycks et al., 2020; Hendrycks & Dietterich, 2019) . For example, in image classification, such shifts include changes due to lighting, blurring, or weather conditions (Pei et al., 2017; Chernikova et al., 2019) . However, there are remarkably few general, principled techniques that have been shown to provide robustness against these forms of out-of-distribution, natural variation (Hendrycks et al., 2019a) . Furthermore, as these unseen distributional shifts are arguably more common in safety-critical domains, the task of designing algorithms that generalize to natural, out-of-distribution data is an important and novel challenge for the DL community.

