THE EIGENLEARNING FRAMEWORK: A CONSERVATION LAW PERSPECTIVE ON KERNEL REGRESSION AND WIDE NEURAL NETWORKS Anonymous

Abstract

We derive a simple unified framework giving closed-form estimates for the test risk and other generalization metrics of kernel ridge regression (KRR). Relative to prior work, our derivations are greatly simplified and our final expressions are more readily interpreted. In particular, we show that KRR can be interpreted as an explicit competition among kernel eigenmodes for a fixed supply of a quantity we term "learnability." These improvements are enabled by a sharp conservation law which limits the ability of KRR to learn any orthonormal basis of functions. Test risk and other objects of interest are expressed transparently in terms of our conserved quantity evaluated in the kernel eigenbasis. We use our improved framework to: i) provide a theoretical explanation for the "deep bootstrap" of Nakkiran et al. ( 2020), ii) generalize a previous result regarding the hardness of the classic parity problem, iii) fashion a theoretical tool for the study of adversarial robustness, and iv) draw a tight analogy between KRR and a well-studied system in statistical physics.

1. INTRODUCTION

Kernel ridge regression (KRR) is a popular, tractable learning algorithm that has seen a surge of attention due to equivalences to infinite-width neural networks (NNs) (Lee et al., 2018; Jacot et al., 2018) . In this paper, we derive a simple theory of the generalization of KRR that yields estimators for many quantities of interest, including test risk and the covariance of the predicted function. Our framework is consistent with other recent works, such as those of Canatar et al. (2021) and Jacot et al. (2020) , but is simpler and easier to derive. Our framework paints a new picture of KRR as an explicit competition between eigenmodes for a fixed budget of a quantity we term "learnability," and downstream generalization metrics can be expressed entirely in terms of the learnability received by each mode (Equations 7-14). This picture stems from a conservation law latent in KRR which limits any kernel's ability to learn any complete basis of target functions. The conserved quantity, learnability, is the inner product of the target and predicted functions and, as we show, can be interpreted as a measure of how well the target function can be learned by a particular kernel given n training examples. We prove that the total learnability, summed over a complete basis of target functions (such as the kernel eigenbasis), is no greater than the number of training samples, with equality at zero ridge parameter. The conservation of this quantity suggests that it will prove useful for understanding the generalization of KRR. This intuition is borne out by our subsequent analysis: we derive a set of simple, closed-form estimates for test risk and other objects of interest and find that all of them can be transparently expressed in terms of eigenmode learnabilities. Our expressions are more compact and readily interpretable than those of prior work and constitute a major simplification. Our derivation of these estimators is significantly simpler and more accessible than those of prior work, which relied on the heavy mathematical machinery of replica calculations and random matrix theory to obtain comparable results. By contrast, our approach requires only basic linear algebra, leveraging our conservation law at a critical juncture to bypass the need for advanced techniques. We use our improved framework to shed light on several topics of interest: 1

