ON INTERACTION BETWEEN AUGMENTATIONS AND CORRUPTIONS IN NATURAL CORRUPTION ROBUSTNESS

Abstract

Invariance to a broad array of image corruptions, such as warping, noise, or color shifts, is an important aspect of building robust models in computer vision. Recently, several new data augmentations have been proposed that significantly improve performance on ImageNet-C, a benchmark of such corruptions. However, there is still a lack of basic understanding on the relationship between data augmentations and test-time corruptions. To this end, we develop a feature space for image transforms, and then use a new measure in this space between augmentations and corruptions called the Minimal Sample Distance to demonstrate there is a strong correlation between similarity and performance. We then investigate recent data augmentations and observe a significant degradation in corruption robustness when the test-time corruptions are sampled to be perceptually dissimilar from ImageNet-C in this feature space. Our results suggest that test error can be improved by training on perceptually similar augmentations, and data augmentations may risk overfitting to the existing benchmark. We hope our results and tools will allow for more robust progress towards improving robustness to image corruptions.

1. INTRODUCTION

Robustness to distribution shift, i.e. when the train and test distributions differ, is an important feature of practical machine learning models. Among many forms of distribution shift, one particularly relevant category for computer vision are image corruptions. For example, test data may come from sources that differ from the training set in terms of lighting, camera quality, or other features. Postprocessing transforms, such as photo touch-up, image filters, or compression effects are commonplace in real-world data. Models developed using clean, undistorted inputs typically perform dramatically worse when confronted with these sorts of image corruptions (Hendrycks & Dietterich, 2018; Geirhos et al., 2018) . The subject of corruption robustness has a long history in computer vision (Simard et al., 1998; Bruna & Mallat, 2013; Dodge & Karam, 2017) and recently has been studied actively with the release of benchmark datasets such as ImageNet-C (Hendrycks & Dietterich, 2018). One particular property of image corruptions is that they are low-level distortions in nature. Corruptions are transformations of an image that affect structural information such as colors, textures, or geometry (Ding et al., 2020) and are typically free of high-level semantics. Therefore, it is natural to expect that data augmentation techniques, which expand the training set with random low-level transformations, can help with learning robust models. Indeed, data augmentation has become a central technique in several recent methods (Hendrycks et al., 2019; Lopes et al., 2019; Rusak et al., 2020) that achieve large improvements on ImageNet-C and related benchmarks. One caveat for data augmentation based approaches is the test corruptions are expected to be unknown at training time. If the corruptions are known, they may simply be applied to the training set as data augmentations to trivially adapt to the test distribution. Instead, an ideal robust model needs to be robust to any valid corruption, including ones unseen in any previous benchmark. Of course, in practice the robustness of a model can only be evaluated approximately by measuring its corruption error on a representative corruption benchmark. To avoid trivial adaptation to the benchmark, recent works manually exclude test corruptions from the training augmentations. However, with a toy experiment presented in Figure 1 , we argue that this strategy alone might not be enough and that visually similar augmentation outputs and test corruptions can lead to significant benchmark improvements even if the exact corruption transformations are excluded.

