FUNDAMENTAL LIMITS ON THE ROBUSTNESS OF IMAGE CLASSIFIERS

Abstract

We prove that image classifiers are fundamentally sensitive to small perturbations in their inputs. Specifically, we show that given some image space of n-by-n images, all but a tiny fraction of images in any image class induced over that space can be moved outside that class by adding some perturbation whose p-norm is O(n 1/ max (p,1) ), as long as that image class takes up at most half of the image space. We then show that O(n 1/ max (p,1) ) is asymptotically optimal. Finally, we show that an increase in the bit depth of the image space leads to a loss in robustness. We supplement our results with a discussion of their implications for vision systems.

1. INTRODUCTION

Image classification, the task of partitioning images into various classes, is a classical problem in computer science with countless practical applications. Progress on this problem has advanced with leaps and bounds since the advent of deep learning, with modern image classifiers attaining some incredible results (Beyer et al., 2020) . However, it has been observed that image classes tend to be brittle -classifiers like to partition images in a way such that most images lie very close to images of different classes (Szegedy et al., 2013) . Although usually studied in computer vision systems, such phenomena also appear to manifest in natural vision systems (Elsayed et al., 2018; Zhou & Firestone, 2019) . Given these observations, it is natural to ask the following question: is the brittleness of image classes a result of classifier construction, or does it arise from some fundamental property of image spaces? Previous work demonstrate that classifiers can be made more robust as a function of how they are constructed, and attempts to improve the robustness of existing computer vision systems through such means is an active area of research (Moosavi-Dezfooli et al., 2016; Madry et al., 2017; Ma et al., 2018; Tramer et al., 2020; Machado et al., 2021) . However, there is also a fundamental limit to the robustness achievable by any classifier that arises as a consequence of the geometry of image spaces. In this work we show that this fundamental limit of achievable robustness is surprisingly low. Roughly speaking, in most cases it suffices to change the contents of only a few columns of pixels in an image to change its class. Even smaller changes are sufficient when measured using other metrics, such as the Euclidean metric. Our results are a consequence of the geometry of image spaces, and so they apply regardless of the architecture of the classifier. This suggests that there is an inherent brittleness in the semantic content of images, and that robustness as an objective is only desirable with respect to distributions that are concentrated over small subsets of the image space.

1.1. OUR CONTRIBUTIONS AND RELATED WORK

The observation that image classes tend to be brittle was popularized by Szegedy et al. (2013) , where it was observed that tiny perturbations suffice to change the image class of many images. This has since opened up a rich field of research on how the brittleness of image classes arise from specific classifier formulations or training distributions (Goodfellow et al., 2014; Gilmer et al., 2018; Tsipras 

