ON THE SATURATION EFFECT OF KERNEL RIDGE REGRESSION

Abstract

The saturation effect refers to the phenomenon that the kernel ridge regression (KRR) fails to achieve the information theoretical lower bound when the smoothness of the underground truth function exceeds certain level. The saturation effect has been widely observed in practices and a saturation lower bound of KRR has been conjectured for decades. In this paper, we provide a proof of this longstanding conjecture.

1. INTRODUCTION

Suppose that we have observed n i.i.d. samples {(x i , y i )} n i=1 from an unknown distribution ρ supported on X × Y where X ⊆ R d and Y ⊆ R. One of the central problems in the statistical learning theory is to find a function f based on these observations such that the generalization error E (x,y)∼ρ f (x) -y 2 (1) is small. It is well known that the conditional mean f * ρ (x) := E ρ [ y | x ] = Y ydρ(y|x) minimizes the square loss E(f ) = E ρ (f (x) -y) 2 where ρ(y|x) is the distribution of y conditioning on x. Thus, this question is equivalent to looking for an f such that the generalization error E x∼µ f (x) -f * ρ (x) 2 (2) is small, where µ is the marginal distribution of ρ in X . In other words, f can be viewed as an estimator of f * ρ . When there is no explicit parametric assumption made on the distribution ρ or the function f * ρ , researchers often assumed that f * ρ falls into a class of certain functions and developed lots of non-parametric methods to estimate f * ρ (e.g., Györfi (2002) ; Tsybakov ( 2009)). The kernel method, one of the most widely applied non-parametric regression methods (e.g. 2020)), assumes that f * ρ belongs to certain reproducible kernel Hilbert space (RKHS) H, a separable Hilbert space associated to a kernel function k defined on X . The kernel ridge regression (KRR), which is also known as the Tikhonov regularization or regularized least squares, estimates f * ρ by solving the penalized least square problem: f KRR λ = arg min f ∈H 1 n n i=1 (y i -f (x i )) 2 + λ∥f ∥ 2 H ,



, Kohler & Krzyzak (2001); Cucker & Smale (2001); Caponnetto & De Vito (2007); Steinwart et al. (2009); Fischer & Steinwart (

