ON THE (NON-)ROBUSTNESS OF TWO-LAYER NEURAL NETWORKS IN DIFFERENT LEARNING REGIMES

Abstract

Neural networks are known to be highly sensitive to adversarial examples. These may arise due to different factors, such as random initialization, or spurious correlations in the learning problem. To better understand these factors, we provide a precise study of the adversarial robustness in different scenarios, from initialization to the end of training in different regimes, as well as intermediate scenarios, where initialization still plays a role due to "lazy" training. We consider overparameterized networks in high dimensions with quadratic targets and infinite samples. Our analysis allows us to identify new tradeoffs between approximation (as measured via test error) and robustness, whereby robustness can only get worse when test error improves, and vice versa. We also show how linearized lazy training regimes can worsen robustness, due to improperly scaled random initialization. Our theoretical results are illustrated with numerical experiments.

1. INTRODUCTION

Deep neural networks have enjoyed tremendous practical success in many applications involving highdimensional data, such as images. Yet, such models are highly sensitive to small perturbations known as adversarial examples (Szegedy et al., 2013) , which are often imperceptible by humans. While various strategies such as adversarial training (Madry et al., 2018) can mitigate this vulnerability empirically, the situation remains highly problematic for many safety-critical applications like autonomous vehicles and health, and motivates a better theoretical understanding of what mechanisms may be causing this. Various factors are known to contribute to adversarial examples. In linear models, features that are only weakly correlated with the label, possibly in a spurious manner, may improve prediction accuracy but induce large sensitivity to adversarial perturbations (Tsipras et al., 2019; Sanyal et al., 2021) . On the other hand, common neural networks may exhibit high sensitivity to adversarial perturbations at random initialization (Simon-Gabriel et al., 2019; Daniely & Shacham, 2020; Bubeck et al., 2021) . While such settings already capture interesting phenomena behind adversarial examples, they are restricted to either trained linear models, or nonlinear networks at initialization. Trained, nonlinear networks may thus involve multiple sources of vulnerability arising from initialization, training algorithms, as well as the data distribution. Capturing the interaction between these different components is of crucial importance for a more complete understanding of adversarial robustness. In this paper, we study the interplay between these different factors by analyzing approximation (i.e how well the model fits the data) and robustness properties (i.e the sensitivity of the model's outputs w.r.t perturbations in test data) of two-layer neural networks in different learning regimes. We consider two-layer finite-width networks in high dimensions with infinite training data, in asymptotic regimes inspired by Ghorbani et al. (2019) . This allows us to focus on the effects inherent to the data distribution and the inductive bias of architecture (choice of activation function, number of hidden neurons per input dimension, etc.) and training algorithms, while side-stepping issues due to finite samples. Following Ghorbani et al. ( 2019), we focus on nonlinear regression settings with structured quadratic target functions, and consider commonly studied training regimes for two-layer networks, namely (i) neural networks with quadratic activations trained with stochastic gradient descent on the population risk which finds the global optimum; (ii) random features (RF, Rahimi & Recht, 2008) , (iii) neural tangent (NT, Jacot et al., 2018) , as well as (iv) "lazy" training (Chizat et al., 2019) regimes for basic RF and NT, where we consider a first-order Taylor expansion of the network around initialization, including the initialization term itself (in contrast to the standard RF and NT regimes which ignore the offset due to initialization). Note that, though the theoretical setting is inspired by Ghorbani et al. (2019) , our work differs from theirs in its focus and scope. Indeed, we are concerned with robustness and its interplay with approximation, in different learning regimes, while they are only concerned with approximation. We also note that the lazy/linearized regimes we study as part of this work were not considered by Ghorbani et al. (2019) , and help us highlight the impact of initialization on robustness. Note that unlike the other regimes, SGD exhibits a kind of feature learning, whereby the first layer weights are learning specific directions by approximating the matrix B. In particular, this involves non-trivial feature selection via non-linear learning, while the other regimes (RF and NT) are linear estimators on top of non-linear but fixed features. Main contributions. Our work establishes theoretical results which uncover novel tradeoffs between approximation (as measured via test error) and robustness that are inherent to all the regimes considered. These tradeoffs appear to be due to misalignment between the target function and the input distribution (weight distribution) for random features (Section 4), or to the inductive bias of fully-trained networks (Section 3 and Appendix E). We also show that improperly scaled random initialization can further degrade robustness in lazy/linearized models (Section 5), since the resulting models might inherit the nonrobustness inherent to random initialization. This raises the question of how small should the initialization be in order to enhance the robustness of the trained model. Our theoretical results are empirically verified with extensive numerical experiments on simulated data. The setting of our work is regression in a student-teacher setup where the student is a two-layer feedforward neural network and the teacher is a quadratic form. We assume access to infinite training data. Thus, the only complexity parameters are the coefficient matrix of the teacher model, the input dimension d and the width of the neural network m, assumed to both "large" but proportional to one another. Refer to Section 2 for details. In Appendix I, we also show similar but weaker trade-offs for arbitrary student and teacher models. The infinite-sample setting allows us to focus on the effects inherent to the data distribution and the inductive bias of architecture (choice of activation function) and different learning regimes, while side-stepping issues due to finite samples and label noise. Also note that in this infinite-data setting, label noise provably has no influence on the learned model, in all the learning regimes considered. The observation that there is a tradeoff between robustness and approximation, even in this infinite-sample setting, is one of the surprising findings of our work. This complements related works such as (Bubeck et al., 2020b; Bubeck & Sellke, 2021) , which show that finite training samples with label noise is a possible source of nonrobustness in neural networks. 2020) observed empirically that natural images are well-separated, and so locally-lipschitz classifies shouldn't suffer any kind of test error vs robustness tradeoff. However, gradient-descent is not likely to find such models. Our work studies regression problems with quadratic targets, and shows that there are indeed tradeoffs between test error and robustness which are controlled by the learning algorithm / regime and model. (2021) show that over-parameterization may be necessary for robust interpolation in the presence of noise. In contrast, our paper considers a structured problem with noiseless signal and infinite training data, where the network width m and the input dimension d tend to infinity proportionately. In this under-complete asymptotic setting, our results show a systematic and precise tradeoff between approximation (test error) and robustness in different learning regimes. Thus, our work nuances the picture presented by previous works by exhibiting a nontrivial interplay between robustness and test error, which persists even in the case of infinite training data where the resulting model isn't affected by label noise.



Related works. Various works have theoretically studied adversarial examples and robustness in supervised learning, and the relationship to ordinary test error / accuracy. Tsipras et al. (2019) considers a specific data distribution where good test error implies poor robustness. Shafahi et al. (2018); Mahloujifar et al. (2018); Gilmer et al. (2018); Dohmatob (2019) show that for high-dimensional data distributions which have concentration property (e.g., multivariate Gaussians, distributions satisfying log-Sobolev inequalities, etc.), an imperfect classifier will admit adversarial examples. On the other hand, Yang et al. (

Schmidt et al. (2018); Khim & Loh (2018); Yin et al. (2019); Bhattacharjee et al. (2021); Min et al. (2021b;a) study the sample complexity of robust learning. In contrasts, our work focuses on the case of infinite data, so that the only complexity parameters are the input dimension d and the network width m. Gao et al. (2019); Bubeck et al. (2020b); Bubeck & Sellke

