NON-GAUSSIAN PROCESS REGRESSION

Abstract

Standard GPs offer a flexible modelling tool for well-behaved processes. However, deviations from Gaussianity are expected to appear in real world datasets, with structural outliers and shocks routinely observed. In these cases GPs can fail to model uncertainty adequately and may over-smooth inferences. Here we extend the GP framework into a new class of time-changed GPs that allow for straightforward modelling of heavy-tailed non-Gaussian behaviours, while retaining a tractable conditional GP structure through an infinite mixture of nonhomogeneous GPs representation. The conditional GP structure is obtained by conditioning the observations on a latent transformed input space and the random evolution of the latent transformation is modelled using a Lévy process which allows Bayesian inference in both the posterior predictive density and the latent transformation function. We present Markov chain Monte Carlo inference procedures for this model and demonstrate the potential benefits compared to a standard GP.

1. INTRODUCTION

Gaussian processes (GPs) are stochastic processes which are widely used in nonparametric regression and classification problems to represent probability distributions over functions (Rasmussen & Williams (2006) ). They allow Bayesian inference in a space of functions such that consistent uncertainty measures over predictions are obtained rather than only point estimates. In its simplest form a GP defines a distribution over functions through its particular mean and covariance (kernel) functions which determine the smoothness, stationarity and periodicity of a random realisation in the function space. As a prior distribution in Bayesian inference, using a zero mean GP reflects the lack of information in the values and trend of the function. In this case the covariance function, which defines the similarity between any two points in the input space, fully characterises the properties of the random function space. The design of kernel functions that are able to represent a wide range of characteristics and make consistent generalisations is a fundamental area of research. Some recent work in this area include modelling the kernel via spectral densities that are scale-location mixtures of Gaussians (Wilson & Adams (2013)), and similarly using Lévy process priors over adaptive basis expansions for the spectral density (Jang et al. ( 2017)). Furthermore, extensions to the standard GP model can be made by directly modelling the covariance matrix as a stochastic process (Wilson & Ghahramani (2011)), assuming heteroscedastic noise on the observations and carrying out variational inference (Lázaro-Gredilla & Titsias (2011)), or learning nonlinear transformations of the observations such that the latent transformed observations are modelled well by a GP (Snelson et al. (2003) ; Lázaro-Gredilla ( 2012)). Nonstationarity in the measurement process can be expressed as a product of multiple GPs (Adams & Stegle (2008) ) and heavy-tailed observations may be modelled through the Student-t process (Shah et al. (2014) ). Particularly relevant extensions of GP models are presented in (Rasmussen & Ghahramani (2001) ) where the input space is locally modelled by separate GPs, and string GPs (Samo & Roberts (2016) ) introduce link functions between local GPs such that the global process is still a GP and provides efficient inference methods on large data sets. In (Schmidt & O'Hagan (2003) ; Snoek et al. ( 2014)) a latent space is defined between the inputs and observations through a separate GP and a class of bounded functions in [0, 1], respectively. By designing expressive covariance functions or stacking multiple GPs in structured arrangements, the GP framework produces accurate predictive models in numerous application domains. However, these models are limited by their Gaussianity assumption such that the local patterns learned

