MALIGN OVERFITTING: INTERPOLATION CAN PROV-ABLY PRECLUDE INVARIANCE

Abstract

Learned classifiers should often possess certain invariance properties meant to encourage fairness, robustness, or out-of-distribution generalization. However, multiple recent works empirically demonstrate that common invariance-inducing regularizers are ineffective in the over-parameterized regime, in which classifiers perfectly fit (i.e. interpolate) the training data. This suggests that the phenomenon of "benign overfitting," in which models generalize well despite interpolating, might not favorably extend to settings in which robustness or fairness are desirable. In this work, we provide a theoretical justification for these observations. We prove that-even in the simplest of settings-any interpolating learning rule (with an arbitrarily small margin) will not satisfy these invariance properties. We then propose and analyze an algorithm that-in the same setting-successfully learns a non-interpolating classifier that is provably invariant. We validate our theoretical observations on simulated data and the Waterbirds dataset.

1. INTRODUCTION

Modern machine learning applications often call for models which are not only accurate, but which are also robust to distribution shifts or satisfy fairness constraints. For example, we might wish to avoid using hospital-specific traces in X-ray images (DeGrave et al., 2021; Zech et al., 2018) , as they rely on spurious correlations that will not generalize to a new hospital, or we might seek "Equal Opportunity" models attaining similar error rates across protected demographic groups, e.g., in the context of loan applications (Byanjankar et al., 2015; Hardt et al., 2016) . A developing paradigm for fulfilling such requirements is learning models that satisfy some notion of invariance (Peters et al., 2016; 2017) across environments or sub-populations. For example, in the X-ray case, spurious correlations can be formalized as relationships between a feature and a label which vary across hospitals (Zech et al., 2018 ). Equal Opportunity (Hardt et al., 2016) can be expressed as a statistical constraint on the outputs of the model, where the false negative rate is invariant to membership in a protected group. Many techniques for learning invariant models have been proposed including penalties that encourage invariance (Arjovsky et al., 2019; Krueger et al., 2021; Veitch et al., 2021; Wald et al., 2021; Puli et al., 2021; Makar et al., 2022; Rame et al., 2022; Kaur et al., 2022) , data re-weighting (Sagawa et al., 2020a; Wang et al., 2021; Idrissi et al., 2022) , causal graph analysis (Subbaswamy et al., 2019; 2022) , and more (Ahuja et al., 2020) . While the invariance paradigm holds promise for delivering robust and fair models, many current invariance-inducing methods often fail to improve over naive approaches. This is especially noticeable when these methods are used with overparameterized deep models capable of interpolating, i.e., perfectly fitting the training data (Gulrajani & Lopez-Paz, 2021; Dranker et al., 2021; Guo et al., 2022; Zhou et al., 2022; Menon et al., 2021; Veldanda et al., 2022; Cherepanova et al., 2021) . Existing theory explains why overparameterization hurts invariance for standard interpolating learning rules, such as empirical risk minimization and max-margin classification (Sagawa et al., 2020b; Nagarajan et al., 2021; D'Amour et al., 2022) , and also why reweighting and some types of distributionally robust optimization face challenges when used with overparameterized models (Byrd & Lipton, 2019; Sagawa et al., 2020a) . In contrast, training overparameterized models to interpolate the training data typically results in good in-distribution generalization, and such "benign overfitting" (Kini et al., 2021; Wang et al., 2021) is considered a key characteristic of modern deep learning (Cao et al., 2021; Wang & Thrampoulidis, 2021; Shamir, 2022) . Consequently, a num-

