DJMIX: UNSUPERVISED TASK-AGNOSTIC AUGMEN-TATION FOR IMPROVING ROBUSTNESS

Abstract

Convolutional Neural Networks (CNNs) are vulnerable to unseen noise on input images at the test time, and thus improving the robustness is crucial. In this paper, we propose DJMix, a data augmentation method to improve the robustness by mixing each training image and its discretized one. Discretization is done in an unsupervised manner by an autoencoder, and the mixed images are nearly impossible to distinguish from the original images. Therefore, DJMix can easily be adapted to various image recognition tasks. We verify the effectiveness of our method using classification, semantic segmentation, and detection using clean and noisy test images.

1. INTRODUCTION

CNNs are the de facto standard components of image recognition tasks and achieve excellent performance. However, CNNs are vulnerable to unseen noise on input images. Such harmful noise includes not only adversarially generated noise (Szegedy et al., 2014; Goodfellow et al., 2014) , but also naturally possible noise such as blur by defocusing and artifacts generated by JPEG compression (Vasiljevic et al., 2016; Hendrycks & Dietterich, 2019) . Natural noise on input images is inevitable in the real world; therefore, making CNNs robust to natural noise is crucial for practitioners. A simple approach to solving this problem is adding noise to training images, but this does not make models generalize to unseen corruptions and perturbations (Vasiljevic et al., 2016; Geirhos et al., 2018; Gu et al., 2019) . For example, even if Gaussian noise of a certain variance is added during training, models fail to generalize to Gaussian noise of other variances. Nonetheless, some data augmentation methods are effective for improving robustness. For example, Yin et al. reported that extensive augmentation, such as AutoAugment (Cubuk et al., 2019) , improves the robustness. Similarly, Hendrycks et al. proposed to mix differently augmented images during training to circumvent the vulnerability. We will further review previous approaches in Section 2. Despite the effectiveness of these data augmentation and mixing approaches, these methods require handcrafted image transformations, such as rotation and solarizing. Particularly when geometrical transformations are used, the mixed images cannot have trivial targets in non classification tasks, for instance, semantic segmentation and detection. This lack of applicability to other tasks motivates us to introduce robust data augmentation without such transformations. In this paper, we propose Discretizing and Joint Mixing (DJMix) which mixes original and discretized training images to improve the robustness. The difference between the original and obtained images is nearly imperceptible, as shown in Figure 1 , which enables the use of DJMix in various image recognition tasks. In Section 3, we will introduce DJMix and analyze it empirically and theoretically. We show that DJMix reduces mutual information between inputs and internal representations to ignore harmful features and improve CNNs' resilience to test-time noise. To benchmark the robustness of CNNs to unseen noise, Hendrycks & Dietterich (2019) introduced ImageNet-C as a corrupted counterpart of the ImageNet validation set (Russakovsky et al., 2015) . CNN models are evaluated using this dataset on the noisy validation set, whereas they are trained without any prior information on the corruptions on the original training set. Similarly, Geirhos et al. created noisy ImageNet and compared different behaviors between humans and CNN models with image noise. In addition to these datasets designed for classification, we cre-

