STRONG INDUCTIVE BIASES PROVABLY PREVENT HARMLESS INTERPOLATION

Abstract

Classical wisdom suggests that estimators should avoid fitting noise to achieve good generalization. In contrast, modern overparameterized models can yield small test error despite interpolating noise -a phenomenon often called "benign overfitting" or "harmless interpolation". This paper argues that the degree to which interpolation is harmless hinges upon the strength of an estimator's inductive bias, i.e., how heavily the estimator favors solutions with a certain structure: while strong inductive biases prevent harmless interpolation, weak inductive biases can even require fitting noise to generalize well. Our main theoretical result establishes tight non-asymptotic bounds for high-dimensional kernel regression that reflect this phenomenon for convolutional kernels, where the filter size regulates the strength of the inductive bias. We further provide empirical evidence of the same behavior for deep neural networks with varying filter sizes and rotational invariance.

1. INTRODUCTION

According to classical wisdom (see, e.g., Hastie et al. (2001) ), an estimator that fits noise suffers from "overfitting" and cannot generalize well. A typical solution is to prevent interpolation, that is, stopping the estimator from achieving zero training error and thereby fitting less noise. For example, one can use ridge regularization or early stopping for iterative algorithms to obtain a model that has training error close to the noise level. However, large overparameterized models such as neural networks seem to behave differently: even on noisy data, they may achieve optimal test performance at convergence after interpolating the training data (Nakkiran et al., 2021; Belkin et al., 2019a ) -a phenomenon referred to as harmless interpolation (Muthukumar et al., 2020) or benign overfitting (Bartlett et al., 2020) and often discussed in the context of double descent (Belkin et al., 2019a) . To date, we lack a general understanding of when interpolation is harmless for overparameterized models. In this paper, we argue that the strength of an inductive bias critically influences whether an estimator exhibits harmless interpolation. An estimator with a strong inductive bias heavily favors "simple" solutions that structurally align with the ground truth (such as sparsity or rotational invariance). Based on well-established high-probability recovery results of sparse linear regression (Tibshirani, 1996; Candes, 2008; Donoho & Elad, 2006) , we expect that models with a stronger inductive bias generalize better than ones with a weaker inductive bias, particularly from noiseless data. In contrast, the effects of inductive bias are much less studied for interpolators of noisy data. Recently, Donhauser et al. ( 2022) provided a first rigorous analysis of the effects of inductive bias strength on the generalization performance of linear max-ℓ p -margin/min-ℓ p -norm interpolators. In particular, the authors prove that a stronger inductive bias (small p → 1) not solely enhances a model's ability to generalize on noiseless data, but also increases a model's sensitivity to noiseeventually harming generalization when interpolating noisy data. As a consequence, their result suggests that interpolation might not be harmless when the inductive bias is too strong. In this paper, we confirm the hypothesis and show that strong inductive biases indeed prevent harmless interpolation, while also moving away from sparse linear models. As one example, we consider data where the true labels nonlinearly only depend on input features in a local neighborhood, and vary the strength of the inductive bias via the filter size of convolutional kernels or shallow convolutional neural networks -small filter sizes encourage functions that depend nonlinearly only on local

