A LAW OF ADVERSARIAL RISK, INTERPOLATION, AND LABEL NOISE

Abstract

In supervised learning, it has been shown that label noise in the data can be interpolated without penalties on test accuracy. We show that interpolating label noise induces adversarial vulnerability, and prove the first theorem showing the relationship between label noise and adversarial risk for any data distribution. Our results are almost tight if we do not make any assumptions on the inductive bias of the learning algorithm. We then investigate how different components of this problem affect this result including properties of the distribution. We also discuss non-uniform label noise distributions; and prove a new theorem showing uniform label noise induces nearly as large an adversarial risk as the worst poisoning with the same noise rate. Then, we provide theoretical and empirical evidence that uniform label noise is more harmful than typical real-world label noise. Finally, we show how inductive biases amplify the effect of label noise and argue the need for future work in this direction.

1. INTRODUCTION

Label noise is ubiquitous in data collected from the real world. Such noise can be a result of both malicious intent as well as human error. The well-known work of Zhang et al. (2017) observes that training overparameterised neural networks with gradient descent can memorize large amounts of label noise without increased test error. Recently, Bartlett et al. (2020) investigated this phenomenon and termed it benign overfitting: perfect interpolation of the noisy training dataset still leads to satisfactory generalization for overparameterized models. A long series of works (Donhauser et al., 2022; Hastie et al., 2022; Muthukumar et al., 2020) focused on providing generalization guarantees for models that interpolate data under uniform label noise. This suggests that noisy training data does not hurt the test error of overparameterized models. However, when deploying machine learning systems in the real world, it is not enough to guarantee low test error. Adversarial vulnerability is a practical security threat (Kurakin et al., 2016; Sharif et al., 2016; Eykholt et al., 2018) for deploying machine learning algorithms in critical environments. An adversarially vulnerable classifier, that is accurate on the test distribution, can be forced to err on carefully perturbed inputs even when the perturbations are small. This has motivated a large body of work towards improving the adversarial robustness of neural networks (Goodfellow et al., 2014; Papernot et al., 2016; Tramèr et al., 2018; Sanyal et al., 2018; Cisse et al., 2017) . Despite the empirical advances, the theoretical guarantees on robust defenses are still poorly understood. Consider the setting of uniformly random label noise. Under certain distributional assumptions, Sanyal et al. (2021) claim that with moderate amount of label noise, when training classifiers to zero training error, the adversarial risk is always large, even when the test error is low. Experimentally, this is supported by Zhu et al. (2021) , who showed that common methods for reducing adversarial risk like adversarial training in fact does not memorise label noise. However, it is not clear whether their distributional assumptions are realistic, or if their result is tight. To deploy machine learning models responsibly, it is important to understand the extent to which a common phenomenon like label noise can negatively impact adversarial robustness. In this work, we improve upon previous theoretical results, proving a stronger result on how label noise guarantees adversarial risk for large enough sample size. On the other hand, existing experimental results (Sanyal et al., 2021) seem to suggest that neural networks suffer from large adversarial risk even in the small sample size regime. Our results show that this phenomenon cannot be explained without further assumptions on the data distributions, the learning algorithm, or the machine learning model. While specific biases of machine learning models and algorithms (referred to as inductive bias) have usually played a "positive" role in machine learning literature (Vaswani et al., 2017; van Merriënboer et al., 2017; Mingard et al., 2020) , we show how some biases can make the model more vulnerable to adversarial risks under noisy interpolation. Apart from the data distribution and the inductive biases, we also investigate the role of the label noise model. Uniform label noise, also known as random classification noise (Angluin and Laird, 1988) , is a natural choice for modeling label noise, but it is neither the most realistic nor the most adversarial noise model. Yet, our results show that when it comes to guaranteeing a lower bound on adversarial risk for interpolating models, uniform label noise model is not much weaker than the optimal poisoning adversary. Our experiments indicate that natural label noise (Wei et al., 2022) is not as bad for adversarial robustness as uniform label noise. Finally, we also attempt to understand the conditions under which such benign (natural) label noise arises. Overview First, we introduce notation necessary to understand the rest of the paper. Then, we prove a theoretical result (Theorem 2) on adversarial risk caused by label noise, significantly improving upon previous results (Theorem 1 from Sanyal et al. ( 2021)). In fact, our Theorem 2 gives the first theoretical guarantee that adversarial risk is large for all compactly supported input distributions and all interpolating classifiers, in the presence of label noise. Our theorem does not rely on the particular function class or the training method. Then, in Section 3, we show Theorem 2 is tight without further assumptions, but does not accurately reflect empirical observations on standard datasets. Our hypothesis is that the experimentally shown effect of label noise depends on properties of the distribution and the inductive bias of the function class. In Section 4, we prove (Theorem 5) that uniform label noise is on the same order of harmfmul as worst case data poisoning, given a slight increase in dataset size and adversarial radius. We also run experiments in Figure 3 , showing that mistakes done by human labelers are more benign than the same rate of uniform noise. Finally, in Section 5, we show that the inductive bias of the function class makes the impact of label noise on adversarial vulnerability much stronger and provide an example in Theorem 7.

2. GUARANTEEING ADVERSARIAL RISK FOR NOISY INTERPOLATORS

Our setting Choose a norm ∥•∥ on R d , for example ∥•∥ 2 or ∥•∥ ∞ . For x ∈ R d , let B r (x) denote the ∥•∥-ball of radius r around x. Let µ be a distribution on R d and let f * : C → {0, 1} be a measurable ground truth classifier. Then we can define the adversarial risk of any classifier f with respect to f * , µ, given an adversary with perturbation budget ρ > 0 under the norm ∥•∥, as Further, suppose that each of these balls contains points from a single class. Then for δ > 0, when the number of samples m ≥ |ζ| ηc2 log |ζ| δ , with probability 1 -δ R Adv,ρ (f, µ) = P x∼µ [∃z ∈ B ρ (x), f * (x) ̸ = f (z))] .



)Next, consider a training set ((z 1 , y 1 ), . . . , (z m , y m )) in R d ×{0, 1}, where the z i are independently sampled from µ, and each y i equals f * (z i ) with probability 1 -η, where η > 0 is the label noise rate. Let f be any classifier which correctly interpolates the training set. We now state the main theoretical result of Sanyal et al. (2021) so that we can compare our result with it. Theorem 1 ( Sanyal et al. (2021)). Suppose that there exist c 1 ≥ c 2 > 0, ρ > 0, and a finite set ζ ⊂ R d satisfying µ 1 and ∀s ∈ ζ, µ B ρ/2 (s)

