MALIGN OVERFITTING: INTERPOLATION CAN PROV-ABLY PRECLUDE INVARIANCE

Abstract

Learned classifiers should often possess certain invariance properties meant to encourage fairness, robustness, or out-of-distribution generalization. However, multiple recent works empirically demonstrate that common invariance-inducing regularizers are ineffective in the over-parameterized regime, in which classifiers perfectly fit (i.e. interpolate) the training data. This suggests that the phenomenon of "benign overfitting," in which models generalize well despite interpolating, might not favorably extend to settings in which robustness or fairness are desirable. In this work, we provide a theoretical justification for these observations. We prove that-even in the simplest of settings-any interpolating learning rule (with an arbitrarily small margin) will not satisfy these invariance properties. We then propose and analyze an algorithm that-in the same setting-successfully learns a non-interpolating classifier that is provably invariant. We validate our theoretical observations on simulated data and the Waterbirds dataset.

1. INTRODUCTION

Modern machine learning applications often call for models which are not only accurate, but which are also robust to distribution shifts or satisfy fairness constraints. For example, we might wish to avoid using hospital-specific traces in X-ray images (DeGrave et al., 2021; Zech et al., 2018) , as they rely on spurious correlations that will not generalize to a new hospital, or we might seek "Equal Opportunity" models attaining similar error rates across protected demographic groups, e.g., in the context of loan applications (Byanjankar et al., 2015; Hardt et al., 2016) . A developing paradigm for fulfilling such requirements is learning models that satisfy some notion of invariance (Peters et al., 2016; 2017) across environments or sub-populations. For example, in the X-ray case, spurious correlations can be formalized as relationships between a feature and a label which vary across hospitals (Zech et al., 2018 ). Equal Opportunity (Hardt et al., 2016) can be expressed as a statistical constraint on the outputs of the model, where the false negative rate is invariant to membership in a protected group. Many techniques for learning invariant models have been proposed including penalties that encourage invariance (Arjovsky et al., 2019; Krueger et al., 2021; Veitch et al., 2021; Wald et al., 2021; Puli et al., 2021; Makar et al., 2022; Rame et al., 2022; Kaur et al., 2022) , data re-weighting (Sagawa et al., 2020a; Wang et al., 2021; Idrissi et al., 2022) , causal graph analysis (Subbaswamy et al., 2019; 2022), and more (Ahuja et al., 2020) . While the invariance paradigm holds promise for delivering robust and fair models, many current invariance-inducing methods often fail to improve over naive approaches. This is especially noticeable when these methods are used with overparameterized deep models capable of interpolating, i.e., perfectly fitting the training data (Gulrajani & Lopez-Paz, 2021; Dranker et al., 2021; Guo et al., 2022; Zhou et al., 2022; Menon et al., 2021; Veldanda et al., 2022; Cherepanova et al., 2021) . Existing theory explains why overparameterization hurts invariance for standard interpolating learning rules, such as empirical risk minimization and max-margin classification (Sagawa et al., 2020b; Nagarajan et al., 2021; D'Amour et al., 2022) , and also why reweighting and some types of distributionally robust optimization face challenges when used with overparameterized models (Byrd & Lipton, 2019; Sagawa et al., 2020a) . In contrast, training overparameterized models to interpolate the training data typically results in good in-distribution generalization, and such "benign overfitting" (Kini et al., 2021; Wang et al., 2021) is considered a key characteristic of modern deep learning (Cao et al., 2021; Wang & Thrampoulidis, 2021; Shamir, 2022) . Consequently, a num-ber of works attempt to extend benign overfitting to robust or fair generalization by designing new interpolating learning rules (Cao et al., 2019; Kini et al., 2021; Wang et al., 2021; Lu et al., 2022) . In this paper, we demonstrate that such attempts face a fundamental obstacle, because all interpolating learning rules (and not just maximum-margin classifiers) fail to produce invariant models in certain high-dimensional settings where invariant learning (without interpolation) is possible. This does not occur because there are no invariant models that separate the data, but because interpolating learning rules cannot find them. In other words, beyond identically-distributed test sets, overfitting is no longer benign. More concretely, we consider linear classification in a basic overparameterized Gaussian mixture model with invariant "core" features as well as environment-dependent "spurious" features, similar to models used in previous work to gain insight into robustness and invariance (Schmidt et al., 2018; Rosenfeld et al., 2021; Sagawa et al., 2020b) . We show that any learning rule producing a classifier that separates the data with non-zero margin must necessarily rely on the spurious features in the data, and therefore cannot be invariant. Moreover, in the same setting we analyze a simple two-stage algorithm that can find accurate and nearly invariant linear classifiers, i.e., with almost no dependence on the spurious feature. Thus, we establish a separation between the level of invariance attained by interpolating and non-interpolating learning rules. We believe that learning rules which fail in the simple overparameterized linear classification setting we consider are not likely to succeed in more complicated, real-world settings. Therefore, our analysis provides useful guidance for future research into robust and fair machine learning models, as well as theoretical support for the recent success of noninterpolating robust learning schemes (Rosenfeld et al., 2022; Veldanda et al., 2022; Kirichenko et al., 2022; Menon et al., 2021; Kumar et al., 2022; Zhang et al., 2022; Idrissi et al., 2022; Chatterji et al., 2022) . Paper organization. The next section formally states our full result (Theorem 1). In Section 3 we outline the arguments leading to the negative part of Theorem 1, i.e., the failure of interpolating classifiers to be invariant in our model. In Section 4 we establish the positive part Theorem 1, by providing and analyzing a non-interpolating algorithm that, in our model, achieves low robust error. We validate our theoretical findings with simulations and experiments on the Waterbirds dataset in Section 5, and conclude with a discussion of additional related results and directions for future research in Section 6.

2. STATEMENT OF MAIN RESULT

2.1 PRELIMINARIES Data model. Our analysis focuses on learning linear models over covariates x distributed as a mixture of two Gaussian distributions corresponding to the label y. Definition 1. An environment is a distribution parameterized by (µ c , µ s , d, , ✓) where ✓ 2 [ 1, 1] and µ c , µ s 2 R d satisfy µ c ? µ s and with samples generated according to: P ✓ (y) = Unif{ 1, 1}, P ✓ (x|y) = N (yµ c + y✓µ s , 2 I). (1) Our goal is to find a (linear) classifier that predicts y from x and is robust to the value of ✓ (we discuss the specific robustness metric below). To do so, the classifier will need to have significant inner product with the "core" signal component µ c and be approximately orthogonal to the "spurious" component µ s . We focus on learning problems where we are given access to samples from two environments that share all their parameters other than ✓, as we define next. We illustrate our setting with Figure 3 in Appendix A. Definition 2 (Linear Two Environment Problem). In a Linear Two Environment Problem we have datasets S 1 = {x (1) i , y i } N1 i=1 and S 2 = {x (2) i , y i } N2 i=1 of sizes N 1 , N 2 drawn from P ✓1 and P ✓2 respectively. A learning algorithm is a (possibly randomized) mapping from the tuple (S 1 , S 2 ) to a linear classifier w 2 R d . We let S = {x i , y i } N i=1 denote that dataset pooled from S 1 and S 2 where N = N 1 + N 2 . Finally we let r c := kµ c k and r s := kµ s k. We study settings where ✓ 1 , ✓ 2 are fixed and d is large compared to N , i.e. the overparameterized regime. We refer to the two distributions P ✓e for e 2 {1, 2} as "training environments", following Peters et al. (2016); Arjovsky et al. (2019) . In the context of Out-of-Distribution (OOD) generalization, environments correspond to different experimental conditions, e.g., collection of medical data

