IMPROVED ESTIMATION OF CONCENTRATION UNDER p -NORM DISTANCE METRICS USING HALF SPACES

Abstract

Concentration of measure has been argued to be the fundamental cause of adversarial vulnerability. Mahloujifar et al. (2019b) presented an empirical way to measure the concentration of a data distribution using samples, and employed it to find lower bounds on intrinsic robustness for several benchmark datasets. However, it remains unclear whether these lower bounds are tight enough to provide a useful approximation for the intrinsic robustness of a dataset. To gain a deeper understanding of the concentration of measure phenomenon, we first extend the Gaussian Isoperimetric Inequality to non-spherical Gaussian measures and arbitrary p -norms (p ≥ 2). We leverage these theoretical insights to design a method that uses half-spaces to estimate the concentration of any empirical dataset under p -norm distance metrics. Our proposed algorithm is more efficient than Mahloujifar et al. (2019b)'s, and our experiments on synthetic datasets and image benchmarks demonstrate that it is able to find much tighter intrinsic robustness bounds. These tighter estimates provide further evidence that rules out intrinsic dataset concentration as a possible explanation for the adversarial vulnerability of state-of-the-art classifiers.

1. INTRODUCTION

Despite achieving exceptional performance in benign settings, modern machine learning models have been shown to be highly vulnerable to inputs, known as adversarial examples, crafted with targeted but imperceptible perturbations (Szegedy et al., 2014; Goodfellow et al., 2015) . This discovery has prompted a wave of research studies to propose defense mechanisms, including heuristic approaches (Papernot et al., 2016; M ądry et al., 2018; Zhang et al., 2019) and certifiable methods (Wong & Kolter, 2018; Gowal et al., 2019; Cohen et al., 2019) . Unfortunately, none of these methods can successfully produce adversarially-robust models, even for classification tasks on toy datasets such as CIFAR-10. To explain the prevalence of adversarial examples, a line of theoretical works (Gilmer et al., 2018; Fawzi et al., 2018; Shafahi et al., 2019; Dohmatob, 2019; Bhagoji et al., 2019) have proven upper bounds on the maximum achievable adversarial robustness by imposing different assumptions on the underlying metric probability space. In particular, Mahloujifar et al. (2019a) generalized the previous results showing that adversarial examples are inevitable as long as the input distributions are concentrated with respect to the perturbation metric. Thus, the question of whether or not natural image distributions are concentrated is highly relevant, as if they are it would rule out any possibility of there being adversarially robust image classifiers. Recently, Mahloujifar et al. (2019b) proposed an empirical method to measure the concentration of an arbitrary distribution using data samples, then employed it to estimate a lower bound on intrinsic robustness (see Definition 2.2 for its formal definition) for several image benchmarks. By demonstrating the gap between the estimated bounds of intrinsic robustness and the robustness performance achieved by the best current models, they further concluded concentration of measure is not the sole reason behind the adversarial vulnerability of existing classifiers for benchmark image distributions. However, due to the heuristic nature of the proposed algorithm, it remains elusive whether the estimates it produces can serve as useful approximations of the underlying intrinsic robustness limits, thus hindering understanding of how much of the actual adversarial risk can be explained by the concentration of measure phenomenon.

