IMPROVED ESTIMATION OF CONCENTRATION UNDER p -NORM DISTANCE METRICS USING HALF SPACES

Abstract

Concentration of measure has been argued to be the fundamental cause of adversarial vulnerability. Mahloujifar et al. (2019b) presented an empirical way to measure the concentration of a data distribution using samples, and employed it to find lower bounds on intrinsic robustness for several benchmark datasets. However, it remains unclear whether these lower bounds are tight enough to provide a useful approximation for the intrinsic robustness of a dataset. To gain a deeper understanding of the concentration of measure phenomenon, we first extend the Gaussian Isoperimetric Inequality to non-spherical Gaussian measures and arbitrary p -norms (p ≥ 2). We leverage these theoretical insights to design a method that uses half-spaces to estimate the concentration of any empirical dataset under p -norm distance metrics. Our proposed algorithm is more efficient than Mahloujifar et al. (2019b)'s, and our experiments on synthetic datasets and image benchmarks demonstrate that it is able to find much tighter intrinsic robustness bounds. These tighter estimates provide further evidence that rules out intrinsic dataset concentration as a possible explanation for the adversarial vulnerability of state-of-the-art classifiers.

1. INTRODUCTION

Despite achieving exceptional performance in benign settings, modern machine learning models have been shown to be highly vulnerable to inputs, known as adversarial examples, crafted with targeted but imperceptible perturbations (Szegedy et al., 2014; Goodfellow et al., 2015) . This discovery has prompted a wave of research studies to propose defense mechanisms, including heuristic approaches (Papernot et al., 2016; M ądry et al., 2018; Zhang et al., 2019) and certifiable methods (Wong & Kolter, 2018; Gowal et al., 2019; Cohen et al., 2019) . Unfortunately, none of these methods can successfully produce adversarially-robust models, even for classification tasks on toy datasets such as CIFAR-10. To explain the prevalence of adversarial examples, a line of theoretical works (Gilmer et al., 2018; Fawzi et al., 2018; Shafahi et al., 2019; Dohmatob, 2019; Bhagoji et al., 2019) have proven upper bounds on the maximum achievable adversarial robustness by imposing different assumptions on the underlying metric probability space. In particular, Mahloujifar et al. (2019a) generalized the previous results showing that adversarial examples are inevitable as long as the input distributions are concentrated with respect to the perturbation metric. Thus, the question of whether or not natural image distributions are concentrated is highly relevant, as if they are it would rule out any possibility of there being adversarially robust image classifiers. Recently, Mahloujifar et al. (2019b) proposed an empirical method to measure the concentration of an arbitrary distribution using data samples, then employed it to estimate a lower bound on intrinsic robustness (see Definition 2.2 for its formal definition) for several image benchmarks. By demonstrating the gap between the estimated bounds of intrinsic robustness and the robustness performance achieved by the best current models, they further concluded concentration of measure is not the sole reason behind the adversarial vulnerability of existing classifiers for benchmark image distributions. However, due to the heuristic nature of the proposed algorithm, it remains elusive whether the estimates it produces can serve as useful approximations of the underlying intrinsic robustness limits, thus hindering understanding of how much of the actual adversarial risk can be explained by the concentration of measure phenomenon. In this work, we address this issue by first characterizing the optimum of the actual concentration problem for general Gaussian spaces, then using our theoretical insights to develop an alternative algorithm for measuring concentration empirically that significantly improves both the accuracy and efficiency of estimates of intrinsic robustness. While we do not demonstrate a specific classifier which achieves this robustness upper bound, our results rule out inherent image distribution concentration as the reason for our current inability to find adversarially robust models. Contributions. We generalize the Gaussian Isoperimetric Inequality to non-spherical Gaussian distributions and p -norm distance metrics with p ≥ 2 (including ∞ ) (Theorem 3.3). Motivated by the optimal concentration results for special Gaussian spaces (Remark 3.4), we develop a samplebased algorithm to estimate the concentration of measure using half spaces that works for arbitrary distribution and any p -norm distance (Section 4). Compared with prior approaches, we empirically demonstrate the significant increase in efficacy of our method under ∞ -norm distance metric (Section 6). Not only does the proposed method converge to its limit with an order of magnitude fewer data (Section 6.2), it also finds a much tighter lower bound of intrinsic robustness for both simulated datasets whose underlying concentration function is analytically derivable and various benchmark image datasets (Section 6.1). In particular, we improve the best current estimated lower bound of intrinsic robustness from approximately 82% to above 93% for CIFAR-10 under ∞ -norm bounded perturbations with = 8/255. These tighter concentration estimates produced by our algorithm provide strong evidence that concentration of measure should not be considered as the main cause of adversarial vulnerability, at least for the image benchmarks evaluated in our experiments. Related Work. Several prior works have sought to empirically estimate lower bounds on intrinsic robustness using data samples. The pioneering work of Gilmer et al. (2018) introduced the connection between adversarial examples and the concentration phenomenon for uniform n-spheres, then proposed a simple heuristic to find a half space that expands slowly under Euclidean distance for the MNIST dataset. Our work can be seen as a strict generalization of Gilmer et al. ( 2018)'s, which applies to arbitrary p -norm distance metrics (including ∞ ). By characterizing the optimal transport cost between conditional distributions, Bhagoji et al. ( 2019) estimated a lower bound on the best possible adversarial robustness for several image datasets. However, when applied to adversaries beyond 2 , such as ∞ , the lower bound produced by their method is not informative (that is, it is close to zero). The most relevant previous work is Mahloujifar et al. (2019b) , which proposed a general method for measuring concentration using special collections of subsets. Although the optimal value of the considered empirical concentration problem is proven to asymptotically converge to the actual concentration, there is no guarantee that the proposed searching algorithm for solving the empirical problem finds the optimum. Our approach follows the framework introduced by Mahloujifar et al. (2019b)'s, but considers a different collection of subsets for the empirical concentration problem. This not only results in optimality for theoretical Gaussian distributions, but also significantly improves the estimation performance for typical image benchmarks. Another line of work attempts to provide estimates of intrinsic robustness upper bounds based on generative assumptions. In order to justify the theoretically-derived impossibility results, Fawzi et al. ( 2018) estimated the smoothness parameters of the state-of-the-art generative models on CIFAR-10 and SVHN datasets, which yield approximated upper bounds on adversarial robustness for any classifiers. Zhang et al. (2020) generalized their results to non-smoothed data manifolds, such as datasets that can be captured by a conditional generative model. However, these methods only work for simulated generative distributions, which may deviate from the actual distributions they are intended to understand. Notation. For any n ∈ Z + , denote by [n] the set {1, 2, . . . , n}. Lowercase boldface letters denote vectors and uppercase boldface letters represent matrices. For any vector x and p ∈ [1, ∞), let x j , x p and x ∞ be the j-th element, the p -norm and the ∞ -norm of x. For any matrix A, B is said to be a square root of A if A = BB, and the induced matrix p-norm of A is defined as A p = sup x =0 { Ax p / x p }. Denote by N (θ, Σ) the Gaussian distribution with mean θ and covariance matrix Σ. Let γ n be the probability measure of N (0, I n ), where I n denotes the identity matrix. Let Φ(•) be the cumulative distribution function of N (0, 1) and Φ -1 (•) be its inverse. For any set A, let pow(A) and 1 A (•) be all measurable subsets and the indicator function of A. Let (X , µ, ∆) be a metric probability space, where ∆ : X × X → R ≥0 denotes a distance metric on X . Define the empirical measure with respect to a sample set {x i } i∈[m] as µ m (A) = 1 m i∈[m] 1 A (x i ),

