A SCALABLE AND EXACT GAUSSIAN PROCESS SAM-PLER VIA KERNEL PACKETS

Abstract

In view of the widespread use of Gaussian processes (GPs) in machine learning models, generating random sample paths of GPs is crucial for many machine learning applications. Sampling from a GP essentially requires generating highdimensional Gaussian random vectors, which is computationally challenging if a direct method, such as the one based on Cholesky decomposition, is implemented. We develop a scalable algorithm to sample random realizations of the prior and the posterior of GP models with Matérn correlation functions. Unlike existing scalable sampling algorithms, the proposed approach draws samples from the theoretical distributions exactly. The algorithm exploits a novel structure called the kernel packets (KP), which gives an exact sparse representation of the dense covariance matrices. The proposed method is applicable for one-dimensional GPs, and multi-dimensional GPs under some conditions such as separable kernels with full grid designs. Via a series of experiments and comparisons with other recent works, we demonstrate the efficiency and accuracy of the proposed method.

1. INTRODUCTION

Gaussian processes (GPs) have been widely used in statistical and machine learning applications (Rasmussen, 2003; Cressie, 2015; Santner et al., 2003) . The relevant areas and topics include regression (O'Hagan, 1978; Bishop et al., 1995; Rasmussen, 2003; MacKay et al., 2003 ), classification (Kuss et al., 2005; Nickisch & Rasmussen, 2008; Hensman et al., 2015) , Bayesian networks (Neal, 2012) , optimization (Srinivas et al., 2009) , and so on. GP modeling proceeds by imposing a GP as the prior of an underlying continuous function, which provides a flexible nonparametric framework for prediction and inference problems. When the sample size is large, the basic framework for GP regression suffers from the computational challenge of inverting large covariance matrices. A lot of work has been done to address this issue. Recent advances in scalable GP regression include Nyström approximation (Quinonero-Candela & Rasmussen, 2005; Titsias, 2009; Hensman et al., 2013) , random Fourier features (Rahimi & Recht, 2007) In this article, we focus on the sampling of random GP realizations. Such GPs can be either the prior stochastic processes, or the posterior processes in GP regression. Generating random sample paths of the GP prior or the posterior of the GP regression is crucial in machine learning areas such as Bayesian Optimization (Snoek et al., 2012; Frazier, 2018a; b) , reinforcement learning (Kuss & Rasmussen, 2003; Engel et al., 2005; Grande et al., 2014) , and inverse problems in uncertainty quantification (Murray-Smith & Pearlmutter, 2004; Marzouk & Najm, 2009; Teckentrup, 2020) . To generate the function of a random GP sample, a common practice is to discretize the input space, and the problem becomes the sampling of a high-dimensional multivariate normal vector. Sampling high-dimensional multivariate normal vectors, however, is computationally challenging as well, as we need to factorize the large covariance matrices. Despite the vast literature of the scalable GP regression, the sampling methodologies are still underdeveloped. Existing scalable sampling algorithms for GPs are scarce. A recent prominent work is done by Wilson et al. (2020) . They proposed an efficient sampling approach called decoupled sampling by exploiting Matheron's rule and combining Nyström approximation and random Fourier feature.



, local approximation (Gramacy & Apley, 2015), structured kernel interpolation (Wilson & Nickisch, 2015), state-space formulation (Grigorievskiy et al., 2017; Nickisch et al., 2018), Vecchia approximation (Katzfuss & Guinness, 2021), sparse representation (Chen et al., 2022; Ding et al., 2021), etc.

