A LAW OF ADVERSARIAL RISK, INTERPOLATION, AND LABEL NOISE

Abstract

In supervised learning, it has been shown that label noise in the data can be interpolated without penalties on test accuracy. We show that interpolating label noise induces adversarial vulnerability, and prove the first theorem showing the relationship between label noise and adversarial risk for any data distribution. Our results are almost tight if we do not make any assumptions on the inductive bias of the learning algorithm. We then investigate how different components of this problem affect this result including properties of the distribution. We also discuss non-uniform label noise distributions; and prove a new theorem showing uniform label noise induces nearly as large an adversarial risk as the worst poisoning with the same noise rate. Then, we provide theoretical and empirical evidence that uniform label noise is more harmful than typical real-world label noise. Finally, we show how inductive biases amplify the effect of label noise and argue the need for future work in this direction.

1. INTRODUCTION

Label noise is ubiquitous in data collected from the real world. Such noise can be a result of both malicious intent as well as human error. The well-known work of Zhang et al. (2017) observes that training overparameterised neural networks with gradient descent can memorize large amounts of label noise without increased test error. Recently, Bartlett et al. (2020) investigated this phenomenon and termed it benign overfitting: perfect interpolation of the noisy training dataset still leads to satisfactory generalization for overparameterized models. A long series of works (Donhauser et al., 2022; Hastie et al., 2022; Muthukumar et al., 2020) focused on providing generalization guarantees for models that interpolate data under uniform label noise. This suggests that noisy training data does not hurt the test error of overparameterized models. However, when deploying machine learning systems in the real world, it is not enough to guarantee low test error. Adversarial vulnerability is a practical security threat (Kurakin et al., 2016; Sharif et al., 2016; Eykholt et al., 2018) for deploying machine learning algorithms in critical environments. An adversarially vulnerable classifier, that is accurate on the test distribution, can be forced to err on carefully perturbed inputs even when the perturbations are small. This has motivated a large body of work towards improving the adversarial robustness of neural networks (Goodfellow et al., 2014; Papernot et al., 2016; Tramèr et al., 2018; Sanyal et al., 2018; Cisse et al., 2017) . Despite the empirical advances, the theoretical guarantees on robust defenses are still poorly understood. Consider the setting of uniformly random label noise. Under certain distributional assumptions, Sanyal et al. (2021) claim that with moderate amount of label noise, when training classifiers to zero training error, the adversarial risk is always large, even when the test error is low. Experimentally, this is supported by Zhu et al. (2021) , who showed that common methods for reducing adversarial risk like adversarial training in fact does not memorise label noise. However, it is not clear whether their distributional assumptions are realistic, or if their result is tight. To deploy machine learning models responsibly, it is important to understand the extent to which a common phenomenon like label noise can negatively impact adversarial robustness. In this work, we improve upon previous theoretical results, proving a stronger result on how label noise guarantees adversarial risk for large enough sample size.

