SPARSE GAUSSIAN PROCESS VARIATIONAL AUTOEN-CODERS

Abstract

Large, multi-dimensional spatio-temporal datasets are omnipresent in modern science and engineering. An effective framework for handling such data are Gaussian process deep generative models (GP-DGMs), which employ GP priors over the latent variables of DGMs. Existing approaches for performing inference in GP-DGMs do not support sparse GP approximations based on inducing points, which are essential for the computational efficiency of GPs, nor do they handle missing data -a natural occurrence in many spatio-temporal datasets -in a principled manner. We address these shortcomings with the development of the sparse Gaussian process variational autoencoder (SGP-VAE), characterised by the use of partial inference networks for parameterising sparse GP approximations. Leveraging the benefits of amortised variational inference, the SGP-VAE enables inference in multi-output sparse GPs on previously unobserved data with no additional training. The SGP-VAE is evaluated in a variety of experiments where it outperforms alternative approaches including multi-output GPs and structured VAEs.

1. INTRODUCTION

Increasing amounts of large, multi-dimensional datasets that exhibit strong spatio-temporal dependencies are arising from a wealth of domains, including earth, social and environmental sciences (Atluri et al., 2018) . For example, consider modelling daily atmospheric measurements taken by weather stations situated across the globe. Such data are (1) large in number; (2) subject to strong spatio-temporal dependencies; (3) multi-dimensional; and (4) non-Gaussian with complex dependencies across outputs. There exist two venerable approaches for handling these characteristics: Gaussian process (GP) regression and deep generative models (DGMs). GPs provide a framework for encoding high-level assumptions about latent processes, such as smoothness or periodicity, making them effective in handling spatio-temporal dependencies. Yet, existing approaches do not support the use of flexible likelihoods necessary for modelling complex multi-dimensional outputs. In contrast, DGMs support the use of flexible likelihoods; however, they do not provide a natural route through which spatio-temporal dependencies can be encoded. The amalgamation of GPs and DGMs, GP-DGMs, use latent functions drawn independently from GPs, which are then passed through a DGM at each input location. GP-DGMs combine the complementary strengths of both approaches, making them naturally suited for modelling spatio-temporal datasets. Intrinsic to the application of many spatio-temporal datasets is the notion of tasks. For instance: medicine has individual patients; each trial in a scientific experiment produces an individual dataset; and, in the case of a single large dataset, it is often convenient to split it into separate tasks to improve computational efficiency. GP-DGMs support the presence of multiple tasks in a memory efficient way through the use of amortisation, giving rise to the Gaussian process variational autoencoder (GP-VAE), a model that has recently gained considerable attention from the research community (Pearce, 2020; Fortuin et al., 2020; Casale et al., 2018; Campbell & Liò, 2020; Ramchandran et al., 2020) . However, previous work does not support sparse GP approximations based on inducing points, a necessity for modelling even moderately sized datasets. Furthermore, many spatio-temporal datasets contain an abundance of missing data: weather measurements are often absent due to sensor failure, and in medicine only single measurements are taken at any instance. Handling partial observations in a principled manner is essential for modelling spatio-temporal data, but is yet to be considered.

