INCREASING THE COVERAGE AND BALANCE OF ROBUSTNESS BENCHMARKS BY USING NON-OVERLAPPING CORRUPTIONS Anonymous

Abstract

Neural Networks are sensitive to various corruptions that usually occur in realworld applications such as blurs, noises, low-lighting conditions, etc. To estimate the robustness of neural networks to these common corruptions, we generally use a group of modeled corruptions gathered into a benchmark. We argue that corruption benchmarks often have a poor coverage: being robust to them only imply being robust to a narrow range of corruptions. They are also often unbalanced: they give too much importance to some corruptions compared to others. In this paper, we propose to build corruption benchmarks with only non-overlapping corruptions, to improve their coverage and their balance. Two corruptions overlap when the robustnesses of neural networks to these corruptions are correlated. We propose the first metric to measure the overlapping between two corruptions. We provide an algorithm that uses this metric to build benchmarks of Non-Overlapping Corruptions. Using this algorithm, we build from ImageNet a new corruption benchmark called ImageNet-NOC. We show that ImageNet-NOC is balanced and covers several kinds of corruptions that are not covered by ImageNet-C.

1. INTRODUCTION

Neural Networks perform poorly when they deal with images that are drawn from a different distribution than their training samples. Indeed, neural networks are sensitive to adversarial examples (Szegedy et al., 2014 ), background changes (Xiao et al., 2020 ), and common corruptions (Hendrycks & Dietterich, 2019) . Common corruptions are perturbations that change the appearance of images without changing their semantic content. For instance, neural networks are sensitive to noises (Koziarski & Cyganek, 2017 ), blurs (Vasiljevic et al., 2016) or lighting condition variations (Temel et al., 2017) . Contrary to adversarial examples (Szegedy et al., 2014) , common corruptions are not artificial perturbations especially crafted to fool neural networks. They naturally appear in industrial applications without any human interfering, and can significantly reduce the performances of neural networks. A neural network is robust to a corruption c, when its performances on samples corrupted with c are close to its performances on clean samples. Some methods have been recently proposed to make neural networks more robust to common corruptions (Geirhos et al., 2019; Hendrycks* et al., 2020; Rusak et al., 2020) . To determine whether these approaches are effective, it is required to have a method to measure the neural network robustness to common corruptions. The most commonly used method consists in evaluating the performances of neural networks on images distorted by various kinds of common corruptions: (Hendrycks & Dietterich, 2019; Karahan et al., 2016; Geirhos et al., 2019; Temel et al., 2017) . In this study, we call the group of perturbations used to make the robustness estimation a corruption benchmark. We also use this term to refer to a set of test images that have been corrupted with these various corruptions. We identify two important factors that should be taken into account when building a corruption benchmark: the balance and the coverage. In this paper, we consider that a corruption c is covered by a benchmark, when increasing the robustness of a network to all the corruptions of this benchmark, also increases the robustness of

