FUNDAMENTAL LIMITS ON THE ROBUSTNESS OF IMAGE CLASSIFIERS

Abstract

We prove that image classifiers are fundamentally sensitive to small perturbations in their inputs. Specifically, we show that given some image space of n-by-n images, all but a tiny fraction of images in any image class induced over that space can be moved outside that class by adding some perturbation whose p-norm is O(n 1/ max (p,1) ), as long as that image class takes up at most half of the image space. We then show that O(n 1/ max (p,1) ) is asymptotically optimal. Finally, we show that an increase in the bit depth of the image space leads to a loss in robustness. We supplement our results with a discussion of their implications for vision systems.

1. INTRODUCTION

Image classification, the task of partitioning images into various classes, is a classical problem in computer science with countless practical applications. Progress on this problem has advanced with leaps and bounds since the advent of deep learning, with modern image classifiers attaining some incredible results (Beyer et al., 2020) . However, it has been observed that image classes tend to be brittle -classifiers like to partition images in a way such that most images lie very close to images of different classes (Szegedy et al., 2013) . Although usually studied in computer vision systems, such phenomena also appear to manifest in natural vision systems (Elsayed et al., 2018; Zhou & Firestone, 2019) . Given these observations, it is natural to ask the following question: is the brittleness of image classes a result of classifier construction, or does it arise from some fundamental property of image spaces? Previous work demonstrate that classifiers can be made more robust as a function of how they are constructed, and attempts to improve the robustness of existing computer vision systems through such means is an active area of research (Moosavi-Dezfooli et al., 2016; Madry et al., 2017; Ma et al., 2018; Tramer et al., 2020; Machado et al., 2021) . However, there is also a fundamental limit to the robustness achievable by any classifier that arises as a consequence of the geometry of image spaces. In this work we show that this fundamental limit of achievable robustness is surprisingly low. Roughly speaking, in most cases it suffices to change the contents of only a few columns of pixels in an image to change its class. Even smaller changes are sufficient when measured using other metrics, such as the Euclidean metric. Our results are a consequence of the geometry of image spaces, and so they apply regardless of the architecture of the classifier. This suggests that there is an inherent brittleness in the semantic content of images, and that robustness as an objective is only desirable with respect to distributions that are concentrated over small subsets of the image space.

1.1. OUR CONTRIBUTIONS AND RELATED WORK

The observation that image classes tend to be brittle was popularized by Szegedy et al. (2013) , where it was observed that tiny perturbations suffice to change the image class of many images. This has since opened up a rich field of research on how the brittleness of image classes arise from specific classifier formulations or training distributions (Goodfellow et al., 2014; Gilmer et al., 2018; Tsipras et al., 2018) foot_0 . While these advances offer insights into the deficiencies of our current methodologies, their analyses ultimately depend on some aspect of the architecture or training distribution, so do not rule out the possible existence of ideal classifiers that do not induce brittle image classes. By contrast, our work provides a non-trivial upper bound to the robustness of any image classfoot_1 . Specifically: • We prove that most images in any image class consisting of at most half the images in an image space of n-by-n images can be moved into a different class by adding a perturbation whose p-norm is O(n 1/ max (p,1) ). This is a vanishingly small quantity in relation to the average distances in the image space, which is O(n 2/ max (p,1) ), and therefore provides a non-trivial upper bound for the robustness attainable by even an ideal classifier. • We show that there exist image classes where most images cannot be moved into a different class with any perturbation whose p-norm is o(n 1/ max (p,1) ) (note the small-o notationfoot_2 ). Therefore, the bound we derive is asymptotically optimal in n, so proving stronger robustness bounds will require examining classifier-specific properties. • We show that discretization through lowering the bit depth of the image space permits the existence of more robust image classes. This lends theoretical backing to the idea of using discretization as a method of defending against adversarial attacks (Panda et al., 2019) . • We demonstrate that brittle features in images can deliver semantic content. We argue that a deeper understanding of such features can lead to advances in aligning human and computer vision systems. To our knowledge, there are two previous works that investigate upper bounds of robustness that arise from the geometry of image spaces. One is from Fawzi et al. (2018a) , which provides an upper bound for the probability that an image drawn from a given distribution is far from images of a different class. They further perform numerical experimental analyses of their bounds. However, our analysis differs and improves on theirs in a few key aspects. Firstly, they only analyze the case where distance is measured using the 2-norm, while we provide bounds for p-norms for any p. Secondly, they do not account for the discrete nature of image spaces with finite bit depth, which allows for classifiers that are more robust than their bounds implyfoot_3 . Finally, their bound is parametrized by a modulus of continuity which differs depending on the image distribution, potentially resulting in trivial bounds for certain distributions. Furthermore, this parameter cannot be computed exactly, so in application their bound is inexact. By contrast, we formulate our results independently of specific image distributionsfoot_4 . Our bounds can therefore be computed exactly and unconditionally, and we are able to show the asymptotic optimality of our result. The other work is from Diochnos et al. (2018) , which investigates partitions of bit vectors. Since bit vectors can be used to encode discrete inputs, their results can be viewed as results about classifiers over discrete inputs. They also view each bit vector as being equally weighted, so their results are not parametrized by data distributions and are unambiguous. They show that given a finite probability of misclassification, an arbitrarily high proportion of vectors can be turned into misclassified vectors through small numbers of bit flips proportional to the square root of the vector dimension. This result has been generalized in follow-up work (Mahloujifar et al., 2019) , where it was shown that a small number of modifications proportional to the square root of the data dimension suffices to induce misclassification in the more general setting of Lévy families as well. However, these results are still dependent on the existence of a finite fraction of misclassified datapoints, and therefore do not preclude the existence of asymptotically infinitesimal image classes that are robust, something which our analysis does preclude. Furthermore our bounds, due to our focus on image spaces, are much stronger than the ones they derive and are asymptotically optimal. The observation that brittle features can deliver semantic content is an observation on the inadequacy of p-norms for quantifying visual similarity. While such inadequacies have been noted in prior work (Tramèr et al., 2020; Fawzi et al., 2018b) , to our knowledge ours is the first that have been derived as a consequence of theoretical bounds. The theoretical foundation of our observations allows for quantification and has the potential to imply further consequences.

2.1. PRELIMINARIES

Images consist of pixels on a two dimensional grid, with each pixel consisting of a set of channels (for example R, G, and B) of varying intensity. We therefore define the image space of h-channel images of height n and aspect ratio q, denoted I n,q,h,(∞) , as the set of all real valued tensors with shape (qn, n, h) with entries lying in the interval [0, 1]. We require that qn be an integerfoot_5 . The first two dimensions index the x and y coordinate of the pixel, while the third indexes the channel. Only a finite subset of these images can be represented with finite bit strings. Therefore, we use I n,q,h,(b) to denote the subset of I n,h,q,(∞) where each entry in the tensor is one of 2 b equally spaced values between 0 and 1 inclusive (in other words, each entry belongs to [0, 1] ∩ {i/(2 b -1)|i ∈ Z}). We will refer to b as the bit depth of the image space. Although technically image spaces of equal height, aspect ratio, and number of channels intersect, we will treat them as disjoint (in other words, if x ∈ I n,q,h,(b) , then x / ∈ I n ′ ,q ′ ,h ′ ,(b ′ ) if I n,q,h,(b) ̸ = I n ′ ,q ′ ,h ′ ,(b ′ ) ). Images in the image space I n,q,h,(b) contain n 2 qh entries. This quantity n 2 qh appears often in our results 7 and can be thought of as the data dimension (the number of dimensions required to specify the data), which we will often denote with N for simplicity. The data dimension can be set to any integer value (for example by setting n = 1 and q = 1), so any data type that can be represented as a Cartesian product of some fixed number of unit intervals [0, 1] can be viewed as an image as we have defined it. Consequently our theoretical results can be applied to a more general class of inputs than images, though we will continue to focus on the case of image classification in this work.

2.1.1. CLASSIFIERS AND CLASSES

A classifier C is a function I n,q,h,(b) → Y, where Y is some finite set of labels. For each y ∈ Y, we define the class of y as the preimage of y, denoted as the set of images C -1 (y). We say that such a class is induced by C. If a class takes up a large part of the image space, then it contains a lot of images that look like randomly sampled noise, since randomly sampling channel values from a uniform distribution yields a uniform distribution over the image space. Therefore, many images in these classes tend to be uninteresting, which motivates the following definition: Definition 1. A set C ⊆ I n,q,h,(b) is an interesting image class if it is not empty, and if it contains no more than half of the total number of images in I n,q,h,(b) . Note that as long as every class of a classifier is populated, no more than one class can be uninteresting since image classes are disjoint. Therefore, if a classifier does induce an uninteresting class, we can think of it as the uninteresting class. Intuitively we can think of it as a junk class, since it contains images that look like randomly sampled noise. Most of our results will pertain to interesting image classes. This is to eliminate pathological considerations such as when an image class covers the entire image space.

2.1.2. PERTURBATIONS AND ROBUSTNESS

In order to discuss perturbations, we define addition and subtraction over tensors that are of the same shape to be element-wise, and we define the p-norm of a tensor A, denoted ∥A∥ p , to be the pth root of the sum of the absolute values of the entries of A raised to the pth power. p is assumed to be a non-negative integer, and for the special case of p = 0 we let ∥A∥ 0 be the number of non-zero entries in A. Note that when p is 0 the 0-"norm" is not truly a norm since it does not obey homogeneity. We can then define what it means for an image to be robust to perturbations: Definition 2. Let C ⊆ I n,q,h,(b) be a class of images. We say an image I ∈ C is robust to L p -perturbations of size d if for all I ′ ∈ I n,q,h,(b) , ∥I - I ′ ∥ p ≤ d implies I ′ ∈ C. We can then define what it means for a class to be robust to perturbations. Note that unless a class occupies the entire image space, it must contain some non-robust images, so the best we can hope for is to attain robustness for a large fraction of the images within a class. This is reflected in the following definition. Definition 3. Let C ⊆ I n,q,h,(b) be a class of images. Then we say that C is r-robust to L pperturbations of size d if it is not empty, and the number of images I ∈ C that are robust to L p -perturbations of size d is at least r|C|, where |C| is the number of images in C.

2.2. UNIVERSAL UPPER BOUND ON CLASSIFIER ROBUSTNESS

We can now state a universal non-robustness result that applies to all classifiers over discrete image spaces I n,q,h,(b) . Theorem 1. For all real values c > 0, there exists no interesting class C ⊆ I n,q,h,(b) (the image space of h-channel images of height n and aspect ratio q with bit depth b) that is 2e 1) , where N = n 2 qh is the data dimension. -2c 2 -robust to L p -perturbations of size (2 + c √ N ) 1/ max(p, Proof sketch. We can use the images in I n,q,h,(b) to form a graph where images are the vertices, and images are connected if and only if they differ at exactly one channel. In other words, the image tensors must differ at precisely one entry. Figure 1a illustrates the construction of this graph. Note that graph distance between vertices coincides with the Hamming distance between the images represented by the vertices. Such graphs are known as Hamming graphs, and they have a vertex expansion (or isoperimetry) property (Harper, 1999) which implies that for any sufficiently small set, if we add all vertices that are within a graph distance of O( √ N ) to that set, then the size of that set increases by at least some given factor (see Figure 1b for an example). This expansion property is contingent on the size of the vertex set being sufficiently small, which is why we require the "interesting class" property. We can then show that an interesting class C cannot be too robust in the following way: suppose for contradiction that it is. Then there must be some set C ′ ⊆ C that is pretty large, and has the property that all vertices within some graph distance of C ′ are in C. We can then use the vertex expansion property to show that adding these vertices to C ′ gives a set larger than C, which contradicts the assumption that all vertices within some graph distance to C ′ are in C. Plugging explicit values into this argument yields the statement of the theorem. We can then generalize to L p -perturbations for arbitrary p since each coordinate varies by at most 1 unit. The full proof can be found in Appendix A.1. Intuitively, the above results state that to change the class of most "interesting" images, the number of pixels that need to be changed is roughly the number contained in a few columns of the image.

2.2.1. THE UNIVERSAL NON-ROBUSTNESS RESULTS ARE ASYMPTOTICALLY OPTIMAL UP TO A CONSTANT FACTOR

Up to a constant factor, the bounds in Theorem 1 are the best possible for a universal non-robustness result that applies to arbitrary predictors if we only consider the data dimension N and hold the bit depth b constant. In other words, there exists no bound on robustness that applies universally to all classifiers that grows much more slowly in N than the ones given in Theorem 1. Therefore, if we wish a) b) Figure 1 : Interpreting image spaces as Hamming graphs a) We show how we construct a Hamming graph using the elements of I 2,1,1,(1) , the space of binary images on four pixels. By construction, graph distance coincides exactly with Hamming distance. b) We demonstrate the expansion property of Hamming graphs on a Hamming graph constructed using I 3,1,1,(1) as the vertex set. If we pick some initial set of vertices (in black), then the set of vertices that are a graph distance of at most 3 (3 being the image height n in this case) from that initial set (in black and red) is much larger than that initial set. The nature of "much larger" is expanded on in Appendix A.1. to show that the classes induced by some classifier are not robust to, for instance, L 0 -perturbations of size O(log(N )), more specific properties of that classifier would need to be considered. To prove this, consider the classifier defined by Algorithm 1foot_7 . Algorithm 1: Robust Classifier Input :An image I ∈ I n,q,h,(b) Result: A label belonging to {0, 1} S ← 0; for x ← 1 to qn do for y ← 1 to n do for a ← 1 to h do S ← S + I x,y,a ; if S < n 2 qh/2 then return 0; else return 1; Theorem 2. The classifier described by Algorithm 1 induces an interesting class C ⊆ I n,q,h,(b) (the image space of h-channel images of height n and aspect ratio q with bit depth b) such that for all c > 0: 1. C is (1 -4c)-robust to L p -perturbations of size c √ N -2 for all p ≤ 1. 2. C is (1 -4c)-robust to L p -perturbations of size (c √ N -2) 1/p (2 b -1) (p-1)/p for all p ≥ 2. Where N = n 2 qh is the data dimension. Proof sketch. Given an image I, let S(I) be the sum of all its channel values subtracted by N/2 (where N = n 2 qh is the data dimension). Then I being robust to L 1 -perturbations of size x is approximately equivalent to S(I) / ∈ [-x, x]. By the central limit theorem, the fraction of images I such that S(I) / ∈ [-c √ N , c √ N ] is some monotonic function of c independent of N if N is sufficiently large, which is our desired result. Appendix A.2 provides a more careful analysis of this that does not rely on limiting behaviour and extends the result to all p-norms. We remark that the c in Theorem 2 should be set to less than 1/4 in order to yield a non-trivial statement.

2.2.2. CLASSIFIER ROBUSTNESS TO

L p -PERTURBATIONS DECREASES WITH INCREASING BIT DEPTH FOR p ≥ 2 In this section we investigate the role played by the bit depth b. Theorem 2 has a dependency on b when considering L p -perturbations for p ≥ 2, so the statement about attainable robustness becomes increasingly vacuous as the bit depth increases. Somewhat surprisingly, this is not an artifact of suboptimal analysis: it is really the case that the fundamental limits of robustness drops as a function of the bit depth of the image space. Theorem 3. For all real values c > 0 and p ≥ 2, no interesting class C ∈ I n,q,h,(b) (the image space of h-channel images of height n and aspect ratio q with bit depth b) is 2e -c 2 /2 -robust to L p -perturbations of size c + 2 √ N 2 b 2/p , where N = n 2 qh is the data dimension. Proof sketch. We will focus on the 2-norm. Extension to higher p-norms is straightforward and is given as part of the full proof found in Appendix A.3. The main idea of the proof rests on the fact that if we extend the classifier to the continuous image space with something like a nearest neighbour approach, the measure of the images that are robust to perturbations of a constant size is small (the statement and proof may be found in Appendix A.4). Therefore, if we randomly jump from an image in the discrete image space to an image in the continuous image space, with high probability we will be within a constant distance of an image in a different class. The size of this random jump can be controlled with a factor that shrinks with increasing bit depth. Summing up the budget required for this jump, the perturbation required on the continuous image space, and the jump back to the discrete image space yields the desired bound. We remark that this suggests that the bounds in Theorem 1 pertaining to L p -perturbations for p ≥ 2 can be improved to reflect its dependency on the bit depth b. However, whether the component that shrinks with b scales with N 1/2p rather than N 1/p remains an open problem.

3.1. SUMMARY OF ROBUSTNESS LIMITS AND THEIR RELATION TO AVERAGE IMAGE DISTANCES

We summarize the bounds we derived in the previous section in Table 1 , where the bounds are reparametrized in terms of the robustness. An asymptotic bound is also provided with respect to the data dimension N and bit depth b. A plot of the relation between perturbation sizes and robustness can be found in Appendix A.6. With respect to N and b, the bounds derived for the 0-norm and 1-norm are asymptotically optimal, while the bounds for the other p-norms are asymptotically optimal with respect to N . Finding the optimal bound with respect to both b and N for p-norms for p ≥ 2 remains an open problem, although our current analysis suffices to show that b does fundamentally influence how robust an interesting class can be. We note that the bounds we derived are vanishingly small when compared to typical distances between random elements of the image space. If a pair of images I, I ′ ∈ I n,q,h,(b) are sampled independently and uniformly, we have: E[∥I -I ′ ∥ p ] ≥ k b,p N 1/ max(1,p) Where k b,p is some constant parametrized by b and p, and N = n 2 qh is the data dimension. See Appendix A.5 for additional details. While our analysis there does not necessarily hold for images drawn non uniformly, we demonstrate in Appendix A.5.1 that typical distances on natural distributions tend to be similar in size. When this observation is combined with the bounds in Table 1 , we can see that when N is sufficiently large, for 99% (or some arbitrarily high percentage) of images I ′′ within an interesting class C: Table 1 : Fundamental bounds for robustness attainable by any interesting image class in I n,q,h,(b) . N = n 2 qh is the data dimension. Rather than leaving the robustness and bound parametrized by a separate constant c, the bounds have been reparametrized in terms of the robustness r. Figures plotting the relation between the bounds and r can be found in Appendix A.6. The bounds are also given in big-Θ notation, where r is held constant for simplicity. The upper bound should be understood as "no interesting class is r-robust to perturbations of these sizes" and the lower bound should be interpreted as "there exists an interesting class that is r-robust to perturbations of these sizes". PERTURBATION UPPER BOUND LOWER BOUND L 0 -PERTURBATION L 1 -PERTURBATION 2 + 1 2 ln( 2 r ) √ N -2 + 1 -r 4 √ N L p -PERTURBATION, p ≥ 2 min 2 + 1 2 ln( 2 r ) √ N 1/p , 2ln( 2 r ) + 1 2 b-1 √ N 2/p -2 + 1 -r 4 √ N 1/p (2 b -1) (p-1)/p PERTURBATION UPPER BOUND LOWER BOUND L 0 -PERTURBATION L 1 -PERTURBATION Θ( √ N ) Θ( √ N ) L p -PERTURBATION, p ≥ 2 Θ(min(N 1/2p , ( √ N 2 b + 1) 2/p ) Θ( N 1/2p 2 b((p-1)/p) ) min X∈I n,q,h,(b) ,X / ∈C ∥I ′′ -X∥ p E[∥I -I ′ ∥ p ] ≤ c b,p N - 1 2 max(p,1) (2) Where c b,p is some constant parametrized by b and p. The right hand side approaches 0 as N grows without bound, so compared to typical distances one finds in an image space, the distance of an image to an image outside of its class is vanishingly small in any p-norm it is measured in.

3.2. IMPLICATIONS OF ROBUSTNESS LIMITS FOR COMPUTER VISION SYSTEMS

According to our results, a large fraction of images in any interesting image class are can have their classes modified by a small perturbation. We note that we only consider the case of image classifiers that partition images into a finite number of discrete classes, so our results do not apply directly to vision models that output class probabilities or some more abstract representation. However, these outputs must ultimately be converted into decisions when the model is deployed, at which point our bounds do apply. Taken at face value, our results appear to pose a barrier for the construction of reliable computer vision systems. For illustration, suppose we implement a system that selects from a finite pool of actions to take depending on the output of some image classifier. Then for most images, a tiny perturbation can make the given image trigger undesired behaviour, ostensibly making the classifier unreliable. One way of circumventing this barrier is to consider reliability conditioned on a prespecified image distribution. Our bounds do not immediately preclude the existence of small fractions of images within interesting image classes that are robust to large perturbations, and it is possible that those are precisely the set of images that are commonly encountered in deployment. Therefore, our bounds do not directly prevent the construction of classifiers that are robust with respect to some given image distribution, which is an ongoing field of research (Madry et al., 2017) .

Published as a conference paper at ICLR 2023

We remark that our bounds can be converted into bounds that take an image distribution into account through tricks similar to ones used by Fawzi et al. (2018a) . We can define a map f : I n,q,h,(b) → I n,q,h,(b) such that f (X) approximates the desired image distribution if X is distributed uniformly over I n,q,h,(b) (we can think of the preimage as some latent space). If we then define a monotonically increasing ω : R → R such that ∥f (X) -f (X ′ )∥ p ≤ ω(∥X -X ′ ∥ p ) for all pairs of images, then all our bounds stating that no interesting class is r-robust to L p -perturbations of size d can be converted into statements about how the probability of encountering an image in an interesting class that is robust to L p -perturbations of size ω(d) is less than r. Clearly the value of such a bound necessarily depends on f and ω, and it is easy to construct examples where these bounds must be vacuous -for example, we could make f map all images of one class to the entirely black image and all images of the other class to the entirely white image. However, this is an unrealistic toy example, and experimental analysis carried out by Fawzi et al. (2018a) suggest that analyzing plausible approximations of f that produce useful distributions can produce informative bounds. However, we also note that formulating robustness in a distribution specific way does not address all reliability concerns: for example, an ostensibly benign image modified imperceptibly to trigger dangerous behaviour can be a reliability concern regardless of whether said image is drawn from a prespecified distribution. Furthermore, identifying the correct distribution of such images is a non-trivial task, and distribution gaps can lead to significant reductions in performance (Recht et al., 2019) . Another way of circumventing the barrier to constructing reliable computer vision systems imposed by our bounds is to note that a perturbation with a small p-norm is not necessarily imperceptible. In these cases, the classifier ought to adjust its output with respect to such perturbations. Robustness should then be defined with respect to a perception aligned metric, as opposed to with some p-norm. The barrier to constructing reliable computer vision systems then becomes the alignment problem for human and computer vision systems.

3.3. BRITTLE FEATURES CAN IMPART SEMANTICALLY SALIENT INFORMATION

We show in this section that the semantic contents of most images are contained within brittle features that can be erased with a small perturbation. First, we note that our bounds apply universally to any image classifier, so they must apply to an ideal classifier that is able to faithfully mimic human classifications. Although such a classifier has yet to be achieved in the computer vision space, human based classifiers can be built quite easily: simply place people in front monitors and ask them to apply labels to images 9 . Memoization could be applied to prevent the same image from being classified multiple times with conflicting labels. This system, on top of producing human classifications, acts like a classifier which partitions the set of all images into disjoint classes (as described in Section 2.1.1), therefore our bounds must be satisfied 10 . At this point, we have proven the existence of a classifier that reproduces human classifications. To simplify the discussion, we will consider classifiers with only two classes: given a set of words 11 , we will consider the class of images that such a label can be applied to and its complement. Specifically, the words should be chosen in a way such that random noise should not receive a label with high probability. Roughly speaking, the idea we attempt to capture here is that the first class is the class of "meaningful" images, and the latter is the class of "meaningless" images. Since images drawn from random noise should not receive a label with high probability, the class of meaningless images is far larger than the class of meaningful images. Therefore, the class of meaningful images is an interesting class, and all our bounds apply. This has the surprising consequence that the semantic content of most images can be "erased" with a small perturbation by turning them into meaningless images. Put differently, the semantic contents of most meaningful images are contained in brittle features that can be erased with a small perturbation, which is what we set out to show. 9 Such constructions have been commercially implemented for purposes like content moderation. 10 It may be the case that a pair of counterfactual trajectories of classifications would yield contradictory labels on certain images. However, since only one such trajectory can occur factually, memoization suffices to make such a classifier observationally indistinguishable from the kind of classifiers discussed in Section 2.1.1. 11 For example, words like "truck" or "parachute". a Sourced from from (Howard) What do these features look like? A possible example of brittle features that convey semantic information is line drawings (see Figure 2 ). Line drawings can be erased with a perturbation of size O(n) (proportional to the height of the image) when measured using the 1-norm or 0-norm, which is congruent with the bounds we derived for those metrics. Line drawings are also semantically salient. However, the size of a line drawing is generally larger than O(1) when measured with a p-norm with p ≥ 2, so the saliency of line drawings do not necessarily account for our bit depth dependent bounds. Phenomena like optical illusions and pareidolia may offer some hints as to how saliency can be present in even more brittle features. Some past work highlight other possible ideas, such as the object of interest only taking up a small portion of the picture (Tramèr et al., 2020) , or salient patterns being drawn with low opacity (Fawzi et al., 2018b) . However, a full understanding remains elusive. Understanding the ways in which brittle features can carry semantic meaning is not merely of academic curiosity. We have shown that the robustness of computer vision systems have fundamental limits. However, a computer vision system that is aligned to the human visual system ought to obey these limits, since a human based classifier must do so as well. Over the past decade we have learned that standard machine learning methodology does not automatically produce vision systems that are aligned to the human visual system with respect to small perturbations (Szegedy et al., 2013) , and methodologies that seek to produce such vision systems still contain misalignments (Tramer et al., 2020) . A deeper understanding of how semantic content is conveyed in brittle features may inform the development of future methodologies (for example we may wish to explicitly train computer vision systems on such features), which is becoming increasingly necessary as computer vision systems become increasingly deployed in safety and security critical applications, where the trustworthiness of the system is essential (Pereira & Thomas, 2020; Ma et al., 2018) .

4. CONCLUSION

We have derived universal non-robustness bounds that apply to any arbitrary image classifier. We have further demonstrated that up to a constant factor, these are the best bounds attainable with respect to the dimensions of the image. These bounds provide fundamental limits to the robustness achievable by computer vision systems, and reveal that most images in any interesting class, even those induced by ideal classifiers, can have their class changed with a perturbation that is asymptotically infinitesimal when compared to the average distance between images. We then discuss the barriers to constructing safe and secure computer vision systems imposed by our results and how these barriers may be circumvented. Finally, we show the abundance of brittle features that convey semantic information, and propose that an improved understanding of these features may yield progress on the problem of aligning human and computer vision systems. We discuss line drawings as an attractive candidate for brittle features that are semantically salient. However, they are not sufficiently brittle to account for all our bounds, so a full understanding of these features remains the subject of future work.

5. REPRODUCIBILITY STATEMENT

Complete proofs for our claims can be found in the appendix.

6. ACKNOWLEDGEMENT

This work was supported in part by Schmidt Futures.

A APPENDIX

A.1 PROOF OF THEOREM 1

A.1.1 PROPERTIES OF BINOMIAL COEFFICIENTS

We will work with binomial coefficients extensively. To simplify some of our statements, we will extend the definition of a binomial coefficient to work with any n > 0 and arbitrary integer k: n k =    n! k!(n -k)! if 0 ≤ k ≤ n 0 otherwise Binomial coefficients can be bound in the following way: Lemma 1. n k < 2 n √ n when n ≥ 1. Proof. We first note that n! is bounded by the following for all n ≥ 1 (Robbins, 1955) : √ n n n e n < n! √ 2π < √ n n n e n e 1/(12n) Applying the appropriate inequalities for the numerator and denominator yields the following for when n is even: n k ≤ n n/2 = n! ((n/2)!) 2 < 2 2 n √ n e 1/(12n) √ 2π When n is odd, we have: n k ≤ n ⌊n/2⌋ (6) = 1 2 n + 1 (n + 1)/2 (7) < 2 2 n √ n + 1 e 1/(12(n+1)) √ 2π (8) < 2 2 n √ n e 1/(12n) √ 2π Where the third comparison is an application of Equation 5. If n ≥ 1, we have e 1/(12n) √ 2π < 0.5, which proves the claim. It will also be useful to define the following cumulative sums (which are also the tails of binomial distributions): U n,p (k) = k i=0 n i p i (1 -p) n-i if k ≥ 0 0 otherwise (10) We can show that the ratio of these cumulative sums are monotonic increasing: Lemma 2. Let p ∈ (0, 1). Then Un,p(x-k) Un,p(x) is monotonic increasing in x, where 0 ≤ x ≤ n and k is any positive integer. Proof. First, we note that the ratio n x-k / n x is monotonic increasing in x when x ≥ 0. This holds by definition if x -k < 0. Otherwise, we have the following: n x-k / n x n x-k+1 / n x+1 = (n -x) (n -x + k) * (x -k + 1) (x + 1) ≤ 1 (11) We then claim the following holds for all x where 0 ≤ x ≤ n -1: U n,p (x -k) U n,p (x) ≤ U n,p (x -k + 1) U n,p (x + 1) ≤ n x-k+1 (1 -p) k n x+1 p k The above holds with equality when x -k + 1 < 0. If x -k + 1 = 0, the above also holds: the leftmost ratio is 0. For the other two ratios, if we multiply the rightmost ratio by (1 -p) n-k above we can see that the numerators are equal while the denominator of the rightmost ratio is smaller. We can prove the other cases by induction on x: U n,p (x -k) U n,p (x) ≤ n x-k (1 -p) k n x p k (13) ≤ n x-k+1 (1 -p) k n x+1 p k (14) = n x-k+1 p x-k+1 (1 -p) n-x+k-1 n x+1 p x+1 (1 -p) n-x-1 Where the first inequality follows by induction, and the second inequality follows because n x-k / n x is monotonic increasing in x. For any positive numbers a, c and strictly positive numbers b, d, where a b ≤ c d , we have a b ≤ a+c b+d ≤ c d because: d dλ a + λc b + λd = bc -ad (b + λd) 2 ≥ 0 (16) Therefore, we have: U n,p (x -k) U n,p (x) ≤ U n (x -k) + n x-k+1 p x-k+1 (1 -p) n-x+k-1 U n,p (x) + n x+1 p x+1 (1 -p) n-x-1 (17) = U n,p (x -k + 1) U n,p (x + 1) (18) ≤ n x-k+1 (1 -p) k n x+1 p k As claimed. Carrying on the induction up to x = n -1 yields the statement.

A.1.2 BOUNDING THE INTERIOR OF A SET OVER A HAMMING GRAPH

We will prove our main results by an application of isoperimetry bounds over a Hamming graph. Let W be a set of w symbols. Then we define the n dimensional Hamming graph over w letters, denoted H(n, w), as the graph with a vertex set W n and an edge set containing all edges between vertices that differ at precisely one coordinate. For example, H(n, 2) is isomorphic to the Boolean hypercube. We will use V (H(n, w)) to denote the vertex set of the Hamming graph. Let S ⊆ H(n, w). We define the expansion of S, denoted EXP(S), as the set of vertices that are either in S or have a neighbour in S. Since EXP(.) inputs and outputs sets of vertices, we can iterate it. We will use EXP k (.) to denote k applications of EXP(.). We now adapt a a result from (Harper, 1999) (Theorem 3 in the paper). Additional details on how it has been adapted can be found in Appendix B.1. Lemma 3 (Isoperimetric Theorem on Hamming graphs). Let S ⊊ H(n, w). Then: |EXP k (S)| |V (H(n, w))| ≥ min{U n,p (r + k) |U n,p (r) = |S| |V (H(n, w))| , p ∈ (0, 1), r ∈ [0, n -k)} To work with this we first obtain bounds for the expression on the right hand side of Lemma 3. Lemma 4. Let p be any value in (0, 1). Let n > r ≥ k such that U n,p (r) ≤ 1 2 . Then Un,p(r-k) Un,p(r) ≤ 2e -2(max(k-1,0)) 2 /n . Proof. Let X be a binomially distributed random variable with n trials and probability of success p. Let m be the median of X. We have m ≤ np + 1 because the median and mean differ by at most 1 (Kaas & Buhrman, 1980) . U n,p (m -k) can be interpreted as Pr(X ≤ m -k), We can then apply Hoeffding's inequality (Hoeffding, 1994) : Published as a conference paper at 2023 Pr(X ≤ m -k) ≤ Pr(X ≤ np + 1 -k) (21) ≤ e -2(max(k-1,0)) 2 /n (22) Since m is the median of X, we also have U n,p (m) ≥ 1 2 . Combining this with the above equation gives: U n,p (m -k) U n,p (m) ≤ 2e -2(max(k-1,0)) 2 /n (23) Since Un,p(x-k) Un,p(x) is monotonically increasing in x via Lemma 2, this also implies that the above relation holds for all r ≤ m. This completes the proof. We can then plug this into Lemma 3 to obtain a non-robustness result on Hamming graphs, which we will then apply to image spaces. Theorem 4. Let S ⊊ V (H(n, w)) such that |S| ≤ |V (H(n, w))|/2, and c > 0 be any number. Let S ′ ⊆ S be the set of vertices for which no path with c √ n + 2 edges or less leads to a vertex not in S. Then |S ′ | |S| < 2e -2c 2 . Proof. Suppose for contradiction that |S ′ | ≥ 2e -2c 2 |S|. Since for any vertex in S ′ no path with c √ n + 2 edges or less leads to a vertex outside of S, we have EXP c √ n+2 (S ′ ) ⊆ S. Then: |EXP c √ n+2 (S ′ )| ≥|V (H(n, w))| min{U n,p (r + c √ n + 2) |U n,p (r) = |S ′ | |V (H(n, w))| , p ∈ (0, 1), r ∈ [0, n -c √ n -2)} (24) ≥ 1 2 e 2(max(c √ n+1,0)) 2 /n |S ′ | (25) > 1 2 e 2c 2 |S ′ | (26) The first relation follows from Lemma 3 and the second follows from Lemma 4. Lemma 4 applies since EXP c √ n+2 (S ′ ) ⊆ S, so |EXP c √ n+2 (S ′ )| ≤ |S| ≤ 1 2 . But then |EXP c √ n+2 (S ′ )| > 1 2 e 2c 2 |S ′ | ≥ |S|, which implies that EXP c √ n+2 (S ′ ) ⊈ S. This is a contradiction, so we obtain our desired statement.

A.1.3 PROVING THEOREM 1

All that remains is to massage Theorem 4 into the form of Theorem 1. Let C ⊆ I n,q,h,(b) be any interesting class. Lemma 5. C is not 2e -2c 2 -robust to L 0 -perturbations of size c √ qh * n + 2. Proof. This is a straightforward corollary of Theorem 4 since we can construct a hamming graph out of I n,q,h,(b) as shown in Figure 1 in the main text. In detail: let M : V (H(n 2 qh, 2 b )) → I n,q,h,(b) be the following bijection: first let Q be a set of 2 n equally spaced values between 0 and 1, where the largest value is 0 and the smallest is 1. Then the elements of V (H(n 2 qh, 2 b )) can be viewed as Q n 2 qh . We then map elements from Q n 2 qh to I n,q,h,(b) such that the inverse operation is a flattening of the image tensor. Note that such a mapping preserves graph distance on V (H(n 2 qh, 2 b )) as Hamming distance on I n,q,h,(b) . Let C ′ ⊆ C be the set of images that are robust to L 0 -perturbations of size c √ qh * n + 2. Let S = M -1 (C) and S ′ = M -1 (C ′ ). S ′ is then the set of vertices for which no path with c √ qh * n + 2 edges or less leads to a vertex outside of S. C is an interesting class and M(.) preserves cardinality due to it being a bijection. Therefore |C ′ | ≤ |V (H(n 2 qh, 2 b ))|/2, so by Theorem 4 we have |S ′ |/|S| < 2e -2c 2 . Again, since M(.) preserves cardinality, this implies that |C ′ |/|C| < 2e -2c 2 , which means that C is not 2e -2c 2 -robust to L 0 -perturbations of size c √ qh * n + 2. We remark that if the domain of M(.) is changed to H(qn 2 , h2 b ), the above argument also shows that C is not 2e -2c 2 -robust to c √ qn + 2 pixel changes. It is straightforward to generalize this to p-norms with larger p. Lemma 6. C is not 2e -2c 2 -robust to L p -perturbations of size (c √ qh * n + 2) 1/p . Proof. Let S 1 be the set of images that are r-robust to L 0 -perturbations of size d, and let S 2 be the set of images that are r-robust to L p -perturbations of size d 1/p . Suppose I / ∈ S 1 . Then there exists some image I ′ in a different class from I such that ∥I -I ′ ∥ 0 ≤ d. Therefore, for all p > 0, we have: A.2 PROOF OF THEOREM 2 d ≥ ∥I -I ′ ∥ 0 (27) = x,y,c ⌈|I x,y,c -I ′ x,y,c |⌉

A.2.1 ANTI-CONCENTRATION INEQUALITIES

We first prove an anti-concentration lemma concerning the binomial distribution. Lemma 7. Let X be a random variable following the binomial distribution with n trials and a probability of success of 0.5. Let Y be a discrete random variable independent of X whose distribution is symmetric about the origin. Then for any t where t < E[X] and t -⌊t⌋ = 1/2, we have: Pr(X + Y ≤ t) ≥ Pr(X < t) (31) Proof. We have the following: Pr(X + Y ≤ t) =Pr(X + Y ≤ t, X < t) (32) + Pr(X + Y ≤ t, X > t) Pr(X < t) =Pr(X + Y ≤ t, X < t) (33) + Pr(X + Y > t, X < t) Therefore it suffices to show that Pr(X + Y ≤ t, X > t) ≥ Pr(X + Y > t, X < t). We have for any r ≥ 0: Pr(X + Y ≤ t, X = t + r) = Pr(Y ≤ -r)Pr(X = t + r) (34) ≥ Pr(Y > r)Pr(X = t + r) (35) ≥ Pr(Y > r)Pr(X = t -r) (36) = Pr(X + Y > t, X = t -r) Where Equation 34follows from the independence of X and Y , Equation 35follows from the symmetry of the distribution of Y , and Equation 36 follows from our assumption that t < E[X] and t -⌊t⌋ = 1/2. Summing over all positive r for which Pr(X = t + r) ≥ 0 yields the desired result. Lemma 8. Let X 1 , X 2 , ..., X n be independently and identically distributed random variables such that each X i is uniformly distributed on 2k evenly spaced real numbers a = r 1 < r 2 < ... < r 2k = b. Then for t > 0, we have: Pr( n i=1 X i ≤ ( n i=1 E[X i ]) -t + (b -a)) > 1 2 - 2t √ n(b -a) Proof. Let Y 1 , Y 2 , ..., Y n be independently and identically distributed Bernoulli random variables with p = 0.5. Let Z 1 , Z 2 , ..., Z n be a set of independently and identically distributed random variables uniformly distributed between the integers between 1 and k inclusive. If the Y s and Zs are independent of each other as well, we have: n i=1 (X i -E[X i ]) = b -a 2k -1 n i=1 (kY i + Z i -E[kY i + Z i ]) (39) =k b -a 2k -1 ( n i=1 Y i ) + ( n i=1 Z i -E[Z i ] k ) -( n i=1 E[Y i ]) Let  n i=1 Y i = B, n i=1 Z i -E[Z i ] k = D, (X i -E[X i ]) ≤ -t) =Pr(B + D ≤ - t c + E[B]) ≥Pr(B + D ≤ - t c + E[B] -u) (42) ≥Pr(B < - t c + E[B] -1) (43) ≥Pr(B -E[B] < - 2t b -a -1) ≥ 1 2 -Pr(B -E[B] ∈ [- 2t b -a -1, 0]) (45) ≥ 1 2 - n ⌊n/2⌋ 2 -n ( 2t b -a + 2) Where 1 ≥ u ≥ 0 is chosen such that -t c +E[B]-u is the average of two adjacent integers. Equation 43is then an application of Lemma 7 since B is binomially distributed with p = 0.5 and D has a distribution that is symmetric about the origin, and Equation 46 follows from the fact that no more than x + 1 values are supported on an interval of length x, and no supported value has probability greater than n ⌊n/2⌋ 2 -n . Observing that n ⌊n/2⌋ 2 -n < 1 √ n due to Lemma 1 and substituting t with t -(b -a) yields the desired result.

A.2.2 PROVING THEOREM 2

Let A : I n,q,h,(b) → {0, 1} be described by Algorithm 1. In other words, it is the classifier that inputs an image, sums all of its channels, and outputs 0 if the sum is less than n 2 qh/2 and 1 otherwise. Let Z be the class of images that A outputs 0 on. Note that Z is an interesting class since it cannot be larger than its complement, so it suffices to prove that Z is robust. Lemma 9. Z is (1 -4c)-robust to L 1 -perturbations of size c √ qh * n -2 Proof. Let Z ′ ⊆ Z be the set of images in Z that are robust to L 1 -perturbations of size c √ qh * n -2. Let I be a random image sampled uniformly. Then |Z ′ | = Pr(I ∈ Z ′ )2 -(n 2 qh2 b ) . We then have the following: Pr(I ∈ Z ′ ) = Pr( x,y,a I x,y,a + c qh * n -2 < n 2 qh/2) ≥ Pr( x,y,a I x,y,a ≤ n 2 qh/2 -c qh * n + 1) (48) > 1 2 -2c Where the last inequality follows from Lemma 8 since each channel is sampled from a uniform distribution over a set of 2 b evenly spaced values between 0 and 1. Noting that |Z| ≤ 2 (n 2 qh2 b )-1 since it cannot be larger than its complement yields |Z ′ | |Z| ≥ 1 -4c. Therefore, Z is (1 -4c)-robust to L 1 -perturbations of size c √ qh * n -2. Lemma 10. Z is (1 -4c)-robust to L 0 -perturbations of size c √ qh * n -2 Proof. It suffices to show that an image that is robust to L 1 -perturbations of size d is also robust to L 0 -perturbations of size d, since the statement then follows directly from Lemma 9. Let I be an image that is not robust to L 0 -perturbations of size d, so there exists some I ′ in a different class such that ∥I - This implies that I is not robust to L 1 -perturbations of size d. Therefore any image that is not robust to L 0 -perturbations of size d is also not robust to L 1 -perturbations of size d. The contraposition yields the desired statement. I ′ ∥ 0 ≤ d. Then: d ≥ ∥I -I ′ ∥ 0 (50) = (x,y,a) ⌈|I x,y,a -I ′ x,y,a |⌉ Lemma 11. Z is (1 -4c)-robust to L p -perturbations of size (c √ qh * n-2) 1/p (2 b -1) (p-1)/p for p ≥ 2. Proof. It suffices to show that any image that is robust to L 1 -perturbations of size d is also robust to L p -perturbations of size d 1/p (2 b -1) (p-1)/p for any p ≥ 2, since the statement then follows directly from Lemma 9. Let I ∈ I n,q,h,(b) be an image that is robust to L 1 -perturbations of size d. Let I ′ be any image in a different class, so ∥I -I ′ ∥ 0 > d. Then for any p ≥ 1: ∥I -I ′ ∥ p p = (x,y,a) |I x,y,a -I ′ x,y,a | p (54) = (x,y,a) (2 b -1)|I x,y,a -I ′ x,y,a | (2 b -1) p (55) ≥ (x,y,a) (2 b -1)|I x,y,a -I ′ x,y,a | 1 (2 b -1) p (56) = (2 b -1) ∥I -I ′ ∥ 1 (2 b -1) p (57) > d (2 b -1) p-1 Where the second relation follows from the fact that if two channel values differ, they must differ by at least 1 2 b -1 . Therefore, ∥I - I ′ ∥ p > d 1/p (2 b -1) (p-1)/p for any I ′ whose class is different from I, so I is robust to L p -perturbations of size d 1/p (2 b -1) (p-1)/p for p ≥ 2.

A.3 PROOF OF THEOREM 3

We fix arbitrary n, q, h, and b. For simplicity, we define N = n 2 qh as the data dimension. Let C ⊆ I n,q,h,(b) be any interesting class. Our objective is to show that C is not robust to various perturbations. For the first part of the proof, we construct a grid over the unit hypercube and then map I n,q,h,(b) to cells in this lattice while preserving distances up to a constant factor. • Let T be a set of 2 b disjoint intervals of equal length whose union is the interval [0, 1] (specifically, we have T = {[x * 2 -b , (x + 1) * 2 -b )|x ∈ Z ∩ [0, 2 b -2]} ∪ {[1 -2 -b , 1]}). • Let T N be their N th Cartesian power. This forms a partition over the unit hypercube [0, 1] N since the elements of T N are disjoint and their union is precisely the hypercube. Note that each element in the partition has equal measure. • We can associate each element of I n,q,h,(b) with an element of T N . We first map I n,q,h,(b) to [0, 1] N , which can be done by flattening the image tensor (which we denote by ♭(I) for an image I ∈ I n,q,h,(b) ). We then map that point to the element of T N the point falls within. The overall mapping is bijective, and we will denote it by F . This completes the construction. To recap: each element in I n,q,h,(b) is now associated to a cell in T N via F : I n,q,h,(b) → T N . T N is a partitioning of the unit hypercube into cells of equal measure. In the next part of the proof we define an algorithm that is able to find small perturbations. We then show that this algorithm succeeds with high probability, which proves the statement. Let A : [0, 1] N × R → [0, 1] N ∪ {⊥} be a partial function that maps a point p 1 and a real value c to a point p 2 such that the following hold: 1. ∥p 1 -p 2 ∥ 2 ≤ c. 2. Let I 1 , I 2 ∈ I n,q,h,(b) such that p 1 ∈ F (I 1 ) and p 2 ∈ F (I 2 ). Then we require that I 1 ∈ C =⇒ I 2 / ∈ C. A(.) returns ⊥ if and only if no such p 2 exists. We can then define a procedure FINDPERTURBATION for finding a perturbation given an image I, which is outlined in Algorithm 2. Algorithm 2: Find Perturbation Input :An image I ∈ C and a real value c. Result: An image I ′ ∈ I n,q,h,(b) such that I ′ / ∈ C, or ⊥. Sample p 1 from F (I) uniformly at random; p 2 ← A(p 1 , c) if p 2 = ⊥ then return ⊥; else Find I 2 / ∈ C such that p 2 ∈ F (I 2 ); return I 2 ; Our proof strategy is to show that the perturbations found by FINDPERTURBATION are guaranteed to be small, and that the probability of failure is low. This must then imply that most images are not robust. Lemma 12. If I ′ = FINDPERTURBATION(I, c) is not ⊥, then ∥I -I ′ ∥ 2 ≤ c + 2 √ N 2 b . Proof. Each element of T N has a diameter of √ N 2 b , thus p 1 differs from ♭(I) by at most that distance. Similarly, p 2 differs from ♭(I 2 ) = ♭(I ′ ) by that distance. We also must have ∥p 1 -p 2 ∥ 2 ≤ c since I ′ ̸ = ⊥. Putting it altogether with the triangle inequality we get ∥♭(I) -♭(I ′ )∥ 2 ≤ c + 2 N 2 b . Since ♭(.) preserves distances, we get the desired statement. If the input I is drawn uniformly from C, then p 1 is distributed uniformly over F (C). The procedure fails if and only if A(p 1 , c) = ⊥, which happens if and only if all elements within a radius of c from p 1 all belong to F (C). Let C ′ denote the set of all such points.

A.4 PROOF OF THEOREM 5

Our objective in this section is to complete the proof of Theorem 3 by proving Theorem 5, stated below. We will use µ(.) to denote Lebesgue measure throughout this section. Definition 4. We say a set S ⊆ [0, 1] n is a regular set if there exists some finite set T such that S = t∈T t and T consists of elements that are Cartesian products of n intervals that are either open or closed. By this definition the sets defined the proof of Lemma 13 are regular, so the following theorem is applicable. Theorem 5. Let S ⊆ [0, 1] n be a regular set such that µ(S) ≤ 1/2. Let S r ⊆ S contain all the points in S such that for all y ∈ [0, 1], ∥x -y∥ 2 ≤ r =⇒ y ∈ S. Then µ (Sr) µ(S) < 2e c 2 /2 .

A.4.1 PROPERTIES OF THE STANDARD NORMAL DISTRIBUTION

First, we define the cumulative distribution function for the standard normal distribution and its derivative. Φ(x) = x -∞ 1 √ 2π e -t 2 /2 dt (68) Φ ′ (x) = 1 √ 2π e -x 2 /2 Similarly to the discrete case, the ratio of the cumulative distribution functions is monotonic increasing. Lemma 16. Φ(x-k) Φ(x) is monotonic increasing in x for all k ≥ 0. Proof. Let f (x) = e -x 2 /2 x -∞ e -t 2 /2 dt . Then: d dx f (x) = -e -x 2 /2 x x -∞ e -t 2 /2 dt -e -x 2 /2 e -x 2 /2 x -∞ e t 2 /2 dt 2 When x ≥ 0, this derivative is negative since both terms in the numerator are negative. If x < 0, we have the following: -x x -∞ e -t 2 /2 dt < -x x -∞ e -t 2 /2 + 1 t 2 e -t 2 /2 dt (71) = -x - 1 t e -t 2 /2 x -∞ (72) = e -x 2 /2 (73) So the sum is strictly smaller than (e -x 2 /2 ) 2 -(e -x 2 /2 ) 2 = 0. Therefore, the derivative is everywhere negative, so f (x) is strictly decreasing. Therefore, we have the following for any non-negative k: d dx ln( Φ(x -k) Φ(x) ) = f (x -k) -f (x) ≥ 0 Since ln(.) is a monotonic increasing function, Φ(x-k) Φ(x) must also be monotonic increasing. Proof. Let z = Φ -1 (µ(C)). Then for any c ≥ 0, µ(C c ) µ(C) ≤ Φ(z -c) Φ(z) ≤ Φ(1/2 -c) Φ(1/2) < 2e -c 2 /2 (83) Where the first inequality follows from Lemma 18, the second inequality follows from Lemma 16 and the fact that µ(C) ≤ 1/2, and the third inequality follows from the Gaussian tail bound Φ(x) < e -x 2 /2 for all x ≤ 1/2.

A.5 AVERAGE DISTANCE BETWEEN IMAGES

We wish to show that for a pair of images I, I ′ ∈ I n,q,h,(b) that are sampled independently and uniformly, there exists a k b,p such that: E[∥I -I ′ ∥ p ] ≥ k b,p N 1/ max(1,p) If p = 0 then set k b,p to 1 -2 -b and the relation will hold with equality, so we are done. Otherwise, we note that we have: E[∥I -I ′ ∥ max(1,p) p ] = N * E[|X -Y | max(1,p) ] Where X and Y are independent random variables that are both drawn uniformly from a set of 2 b equally spaced values, where the largest is 1 and the smallest is 0. For simplicity, we denote E[|X -Y | max(1,p) ] with z. ∥I -I ′ ∥ max(1,p) p is non-negative and cannot be larger than N . Therefore, the probability that ∥I -I ′ ∥ max(1,p) p ≥ N z/2 is at least z 2-z . Via a monotonicity argument we can deduce that the probability that ∥I -I ′ ∥ p ≥ (z/2) 1/ max(p,1) N 1/ max(p,1) is at least z 2-z as well. We can then apply Markov's inequality to get the following: E[∥I -I ′ ∥ p ] ≥ z 2 -z (z/2) 1/ max(p,1) N 1/ max(p,1) By setting k b,p to be z 2-z (z/2) 1/ max(p,1) we attain our desired result.

LARGE

The above analysis shows that average distances between images over the entire image space is large, but it does not preclude the possibility that average distances between images from a distribution of natural images is small. To investigate this, we computed average distances between images in Imagenette, a subset of Imagenet consisting of 10 classes (tench, English springer, cassette player, chain saw, church, French horn, garbage truck, gas pump, golf ball, parachute) (Howard) . In detail, we took all the images within the training set of Imagenette with the shorter side resized to 320 pixels and discarded all images that were not in RGB, resulting in 9296 images. Each image had a bit depth of 8. We then cropped the images such that only the top left 320x320 pixels remained. We then computed the average distance measured in p-norms where p ranged from 1 to 5: x∈D y∈D ∥x -y∥ p |D| 2 Figure 3 : Subsampling 320x320 images to smaller sizes Where D is the set of 320x320 images. We also subsampled the images to 160x160, 80x80, 40x40, and 20x20 by taking the top left pixel of each region as the representative sample (see Figure 3 ) and computed the average distances between them too. The results are presented in Figure 4 , and indicate that the average distances between images drawn from natural distributions are not dramatically lower than those between images drawn uniformly if we believe the Imagenette dataset is sufficiently representative of natural image distributions. The average distances between uniformly drawn images were each approximated with 200000 pairs of randomly drawn images, except for the 1-norm where a closed form solution exists: for a pair of nxn images with h channels and bit depth b, the average distance between all pairs of images, measured with the 1-norm, is c b n 2 h, where c b is given by: c b = 2 b -1 i=0 2 b -1 j=0 |i -j| 2 b -1 2 2b

A.6 PLOTS OF THE RELATION BETWEEN ROBUSTNESS AND PERTURBATION SIZE

We plotted the curves defined by the upper bounds in Table 1 to facilitate interpretation. The curves are plotted in Figure 5 . h, b, and q have been fixed at 3, 8, and 1 respectively, so the bounds apply to square RGB images where each channel has a bit depth of 8. Perturbations sizes are then plotted on the y-axis, and a corresponding upper bound on the robustness achievable is plotted on the x-axis on a logarithmic scale. Sizes are measured in 0-norms, 1-norms, and 2-norms, and we plot these curves for n = 32, n = 256, and n = 1024. 

B APPENDIX

B.1 ADAPTING OF LEMMA 3 FROM (HARPER, 1999) In this section we will show how we adapt Theorem 3 from (Harper, 1999) in its exact form into the form given in Lemma 3. B.1.1 PRELIMINARIES Some additional terminology is required to parse Theorem 3 from (Harper, 1999) in its original form. We note that the terminology concern elements from the unit hypercube [0, 1] d . H(x, y) is used to denote the Hamming distance between two elements x, y ∈ [0, 1] d , in other words the number of coordinates at which they differ. A lower set is a set S ⊆ [0, 1] d such that if x ∈ S, then any vector y ∈ [0, 1] d with the property y i ≤ x i for all i where 1 ≤ i ≤ d is also in S. A weighting t is a vector in (0, 1) d that splits the unit hypercube [0, 1] d into 2 d distinct regions. The set of regions is × d i=1 {[0, t i ], (t i , 1]}, and it is said that t is constant if t i = t j for all i, j where 1 ≤ i ≤ d and 1 ≤ j ≤ d. Here we use × to denote the Cartesian product. HB(r, d) with weighting t is a subset of [0, 1] d , and is referred to as a Hamming ball of radius r. A Hamming ball of radius r is usually used to denote a subset of {0, 1} d that contains all vectors where the sum of the entries does not exceed r. This concept can be naturally be extended to [0, 1] d with weighting t: if we define F i as a map that maps 0 to [0, t i ] and 1 to (t i , 1], we have: HB(r, d) = v∈Hr × d i=1 F i (v i ) Where H r is a Hamming ball of radius r in the traditional sense.

B.1.2 ADAPTING THE NOTATION

We are now ready to give Theorem 3 as it is stated in (Harper, 1999) . Theorem. ∀v, 0 < v < 1, ∃r such that HB(r, d) minimizes To bring the theorem statement closer to our terminology, we have EXP h (S) = S ∪ {y / ∈ S : ∃x ∈ S, H(x, y) ≤ h}. Since the volume of S is constant, HB(r, d) also minimizes |EXP h (S)| over all lower sets S of a given volume. Note that for our expansion notation to make sense here, we act as though [0, 1] d is an infinite graph where there exists an edge between any pair of vectors that differ at exactly one index. Since the theorem states that the weighting t should be constant, we have an exact form for HB(r, d). Let p denote the value of all entires of t. We can then restate Theorem 3 from (Harper, 1999) in the following form: Proof. Let S ⊆ [w] n . Let i be an index such that there exists some x ∈ S and y / ∈ S such that y i < x i , and for all j y j ≤ x j . If not such i exists, then we are done, since S must be a lower set. Group together all vertices that are equal on all coordinates except i. Each group contains exactly w elements. For each group G, reorganize which elements belong to S by moving them to the bottom. Denote the new set by S * . For example, suppose that G ∩ S = {(1, 2, 2), (1, 2, 3), (1, 2, 5)}, and i = 3. Then we will shuffle those elements downwards such that we get G ∩ S * = {(1, 2, 1), (1, 2, 2), (1, 2, 3)}. Note that this action cannot increase the size of the expansion. Given a group G, consider the number of elements that are not in EXP k (S). Denote it by z. If z = 0, then that number clearly cannot decrease after reorganization. If z ̸ = 0, then no vertex that is within k steps of any element of G is a member of a group G ′ such that |G ′ ∩ S| > w -z. Otherwise, G must have more than w -z elements that can be reached by those elements in k steps, which means G must have less than z elements that are not in EXP k (S). But then that means EXP k (S * ) cannot reach more than w -z elements of G, specifically the ones where the ith coordinate is at most w -z. Therefore, G has at least z elements that are not in EXP k (S * ). Therefore We can keep iterating this. With each iteration, the sum of all coordinates of all elements in S will strictly decrease, and since this sum is positive, this process will eventually terminate, leaving us with a lower set S ′ that has equal size and with an expansion that is no larger than that of the original set. We can map elements from H(n, w) to subsets of [0, 1] n via the following map: F ((x 1 , x 2 , ..., x n )) = [ x 1 -1 w , x 1 w ] × [ x 2 -1 w , x 2 w ] × ... × [ x n -1 w , x n w ] This mapping preserves expansion for any set S in the following sense: The second and third relations are properties of the mapping F , the fourth relation holds because we are expanding the set of sets that we consider, and the last relation follows from our derivation in the previous section. The gives the statement of Lemma 3, which we restate here for convenience: Lemma (Isoperimetric Theorem on Hamming graphs). Let S ⊊ H(n, w). Then: 



Goodfellow et al. (2014) demonstrate how linearity can allow imperceptible changes across a large amount of dimensions to accumulate to something significant,Gilmer et al. (2018) derive a relation between classification error rate and the distance to the closest misclassification on a specific dataset of concentric spheres, andTsipras et al. (2018) derive a fundamental tradeoff between accuracy and robustness. Consisting of at most half the images in the image space. f (n) ∈ O(g(n)) ⇐⇒ lim sup n→∞ f (n) g(n) ≤ ∞ and f (n) ∈ o(g(n)) ⇐⇒ limn→∞ f (n) g(n) = 0 Other work, such as that ofDiochnos et al. (2018), does analyze discrete input spaces, but does not investigate the relation between discretization and robustness. It is still possible adapt our result to account for image distribution using their techniques. See our discussion for additional details. We opt to use qn rather than a separate value for image width to suggest that the height and width of an image should have similar magnitudes. This is because our results depend only on the size of the image tensor rather than its shape. In Einstein notation, the algorithm returns Ix,y,a1 x,y,a ≥ n 2 qh/2 on an image tensor I, where 1 is shaped like I and has 1 at each entry. We spell out the algorithm in pseudocode for clarity.



Figure2: Small perturbations can be semantically salient A small perturbation (a) when applied to a uniformly randomly drawn image (b) can add meaning to it (c). Conversely the meaning present in (c) can be removed with a small perturbation to attain (b). Such a perturbation can also add information to a natural image a (d, e) -although the information present in the natural image largely remains after applying the perturbation, new information was still added by the perturbation.

-I ′ ∥ p ) p (30) Where the second and third relation follows from the fact that channel values are contained in [0, 1]. Therefore, I / ∈ S 2 either since ∥I -I ′ ∥ p ≤ d 1/p . Taking the contraposition yields S 2 ⊆ S 1 . Setting d = c √ qh * n + 2 and applying Lemma 5 gives the desired result.

and k b-a 2k-1 = c. Then for any t > 0, we have:

∥I -I ′ ∥ 1 (53) Where the second and third relations hold since channel values lie in [0, 1].

If I is drawn uniformly from C, then Pr(FINDPERTURBATION(I, c) = ⊥) < 2e -c 2 /2 . Proof. Let F (C) denote the image of C under F . Let F (C) denote the union of all elements in F (C).

Figure 4: Comparison of average distances between images from natural distributions with average distances between uniformly drawn images on various p-norms

|Φ h (r, d)| = |{y / ∈ S : ∃x ∈ S, H(x, y) ≤ h}| over all lower sets, S ⊆ [0, 1] d , with |S| = v. The optimal weighting for HB(r, d) is constant

We then have |HB(r, d)| = U d,p (r) and |EXP h (HB(r, d))| = U d,p (r + h).Therefore, for any lower set S, it must be the case that|EXP h (S)| ≥ min{|EXP h (HB(r, d))| | t ∈ (0, 1) d , r ∈ [0, d -h), |HB(r, d)| = |S|} (90) = min{U d,p (r + h) | r ∈ [0, d -h), p ∈ (0, 1), U d,p (r) = |S|} (91)

Let S ⊊ [0, 1] n be a lower set. Then:|EXP k (S)| ≥ min{U n,p (r + k) |U n,p (r) = |S|, p ∈ (0, 1), r ∈ [0, n -k)}Note that we have replaced d with n and h with k to match our notation. This is now nearly in the form of Lemma 3. All that remains is to show that this inequality implies an analogous one for Hamming graphs. B.1.3 EMBEDDING A HAMMING GRAPH Let [w] be the set integers from 1 to w inclusive and let H(n, w) be the Hamming graph with the vertex set [w] n . If S ⊆ [w] n , we call it a lower set if x ∈ S implies that any y with the property that y i ≤ x i for all i where 1 ≤ i ≤ n is also in S. Lemma 21. For all S ⊆ [w] n , there exists a lower set S ′ where |S ′ | = |S| and |EXP k (S ′ )| ≤ |EXP k (S)|.

|EXP k (S * ∩ G)| ≤ |EXP k (S ∩ G)|,and since every vertex is covered by exactly one group, we have |EXP k (S * )| ≤ |EXP k (S)|.

have for any set S:|S| |V (H(n, q))| = | x∈S F (x)|(94)Finally, this mapping also preserves the property of being a lower set.Therefore, for any set S ⊆ H(n, w), we have:|EXP k (S)| |V (H(n, w))| ≥ min{ |EXP k (S ′ )| |V (H(n, w))| |S ′ ⊆ H(n, w) is a lower set, |S ′ | = |S|} (95) = min{| x∈EXP k (S ′ ) F (x)| |S ′ ⊆ H(n, w) is a lower set, |S ′ | = |S|} (96) = min{|EXP k ( x∈S ′ (x))| |S ′ ⊆ H(n, w) is a lower set, |S ′ | = |S|} (97) ≥ min{|EXP k (S ′ )| |S ′ ⊆ [0, 1] n is a lower set, |S ′ | = |S| |V (H(n, w))| } (98) ≥ min{U n,p (r + k) |U n,p (r) = |S| |V (H(n, w))|, p ∈ (0, 1), r ∈ [0, n -k)} (99)

|EXP k (S)| |V (H(n, w))| ≥ min{U n,p (r + k) |U n,p (r) = |S| |V (H(n, w))| , p ∈ (0, 1), r ∈ [0, n -k)}

annex

Published as a conference paper at 2023Where µ(.) denotes the Lebesgue measure.The last inequality comes from Theorem 5, which is given in the next section. The statement applies for any set S formed from a union of elements of T N whose measure is no larger than 1/2. F (C) satisfies these criteria since C is an interesting class and each element of T N is of equal measure and disjoint, so we attain the desired statement.Proof. Let I be drawn uniformly from C. Let C r be the set of images that are robust to L 2perturbations of size c + 22 b , which implies that I / ∈ C r . By contraposition, I ∈ C r implies that FINDPERTURBATION(I, c) = ⊥. Therefore:By Lemma 13, Pr(I ′ = ⊥) < 2e -c 2 /2 . Thus, |Cr| |C| < 2e -c 2 /2 , which yields the desired statement.Lemma 15. C is not 2e -c 2 /2 -robust to L p -perturbations of size c + 2Proof. We use the identical argument from Lemma 6.Let S 1 be the set of images that are r-robust to L 2 -perturbations of size d, and let S 2 be the set of images that are r-robust to L p -perturbations of size d 2/p , where p ≥ 2.Suppose I / ∈ S 1 . Then there exists some image I ′ in a different class from I such that ∥I -I ′ ∥ 2 ≤ d. Therefore, for all p ≥ 2, we have: Similarly to the discrete case, our main result relies on an isoperimetry statement, this time on the unit hypercube (Barthe & Maurey, 2000) .Lemma 17 (Isoperimetric Theorem on the Unit Hypercube). For any n, let A ⊂ [0, 1] n be a Borel set. Let A ϵ = {x ∈ [0, 1] n ∃x ′ ∈ A : ∥x -x ′ ∥ ≤ ϵ}. Then we have the following:Proof. Let z = Φ -1 (µ(C)) and let f (x) = Φ(x + z). Let v(.) be a Lebesgue integrable function such that the following holds:This exists since C is a regular set. Since V (x) results from integration, it is also a continuous function.It then suffices to show that V (x) ≤ f (x) for all x, since V (x) corresponds to the left hand side of the theorem statement and f (x) corresponds to the right hand side. Suppose this is not the case. We know that V (x) ≤ f (x) for all x ≥ 0, so if this is violated it must happen when x < 0. Since V ( This gives us the following:Where Z is the set of values where the limit in Equation 78 is not equal to v(t), which by the Lebesgue differentiation theorem is a set of measure 0. Equation 80 is an application of Lemma 17, which is applicable since C -t is a Borel set because C is a regular set. Equation 81 follows from the fact that f (x) ≤ V (x) for all x ∈ [a, b] and the fact that Φ ′ (Φ -1 (.)) is monotonically increasing if the input is no greater than 1/2.We also have V (a) > f (a) and V (b) = f (b), so it must be the case that V (b) -V (a) < f (b) -f (a). This contradicts the above, so it must be the case that V (x) ≤ f (x) for all x.Lemma 19. µ(C c ) < 2e -c 2 /2 µ(C)

