MINIMAX OPTIMAL KERNEL OPERATOR LEARNING VIA MULTILEVEL TRAINING

Abstract

Learning mappings between infinite-dimensional function spaces have achieved empirical success in many disciplines of machine learning, including generative modeling, functional data analysis, causal inference, and multi-agent reinforcement learning. In this paper, we study the statistical limit of learning a Hilbert-Schmidt operator between two infinite-dimensional Sobolev reproducing kernel Hilbert spaces (RKHSs). We establish the information-theoretic lower bound in terms of the Sobolev Hilbert-Schmidt norm and show that a regularization that learns the spectral components below the bias contour and ignores the ones above the variance contour can achieve the optimal learning rate. At the same time, the spectral components between the bias and variance contours give us flexibility in designing computationally feasible machine learning algorithms. Based on this observation, we develop a multilevel kernel operator learning algorithm that is optimal when learning linear operators between infinite-dimensional function spaces.

1. INTRODUCTION

Supervised learning of operators between two infinite-dimensional spaces has attracted attention in several areas of application of machine learning, including, scientific computing (Lu et al., 2019; Li et al., 2020; de Hoop et al., 2021; Li et al., 2018; 2021b) , functional data analysis (Crambes & Mas, 2013; Hörmann & Kidziński, 2015; Wang et al., 2020a) , mean-field games (Guo et al., 2019; Wang et al., 2020b) , conditional kernel mean embedding (Song et al., 2009; 2013; Muandet et al., 2017) and econometrics (Singh et al., 2019; Muandet et al., 2020; Dikkala et al., 2020; Singh et al., 2020) . Despite the empirical success of operator learning, the statistical limit of learning an infinite-dimensional operator has not been investigated studied. In this paper, we study the problem of learning Hilbert Schmidt operators between infinite-dimensional Sobolev RKHSs H β K and H γ L with given kernels k and l, respectively with β, γ ∈ [0, 1) (Adams & Fournier, 2003; Christmann & Steinwart, 2008; Fischer & Steinwart, 2020) . Our goal is to derive the optimal sample complexity for linear operator learning, i.e. how much data is required to achieve a certain performance level. We first establish an information-theoretic lower bound for learning a Hilbert-Schmidt operator between Sobolev spaces with respect to a general Sobolev norm. Our information-theoretic lower bound indicates that the optimal learning rate is determined by the minimum of two polynomial rates: one is purely decided by the input Sobolev reproducing kernel Hilbert space and its evaluating norm, while the other one is purely determined by the output space along with its evaluating norm. The rate is novel in that all existing results (Fischer & Steinwart, 2020; Li et al., 2022; de Hoop et al., 2021) only establish rates that depend on the parameter of input space. The reason is all previous

