MARTINGALE POSTERIOR NEURAL PROCESSES

Abstract

A Neural Process (NP) estimates a stochastic process implicitly defined with neural networks given a stream of data, rather than pre-specifying priors already known, such as Gaussian processes. An ideal NP would learn everything from data without any inductive biases, but in practice, we often restrict the class of stochastic processes for the ease of estimation. One such restriction is the use of a finite-dimensional latent variable accounting for the uncertainty in the functions drawn from NPs. Some recent works show that this can be improved with more "data-driven" source of uncertainty such as bootstrapping. In this work, we take a different approach based on the martingale posterior, a recently developed alternative to Bayesian inference. For the martingale posterior, instead of specifying prior-likelihood pairs, a predictive distribution for future data is specified. Under specific conditions on the predictive distribution, it can be shown that the uncertainty in the generated future data actually corresponds to the uncertainty of the implicitly defined Bayesian posteriors. Based on this result, instead of assuming any form of the latent variables, we equip a NP with a predictive distribution implicitly defined with neural networks and use the corresponding martingale posteriors as the source of uncertainty. The resulting model, which we name as Martingale Posterior Neural Process (MPNP), is demonstrated to outperform baselines on various tasks.

1. INTRODUCTION

A Neural Process (NP) (Garnelo et al., 2018a; b) meta-learns a stochastic process describing the relationship between inputs and outputs in a given data stream, where each task in the data stream consists of a meta-training set of input-output pairs and also a meta-validation set. The NP then defines an implicit stochastic process whose functional form is determined by a neural network taking the meta-training set as an input, and the parameters of the neural network are optimized to maximize the predictive likelihood for the meta-validation set. This approach is philosophically different from the traditional learning pipeline where one would first elicit a stochastic process from the known class of models (e.g., Gaussian Processes (GPs)) and hope that it describes the data well. An ideal NP would assume minimal inductive biases and learn as much as possible from the data. In this regard, NPs can be framed as a "data-driven" way of choosing proper stochastic processes. An important design choice for a NP model is how to capture the uncertainty in the random functions drawn from stochastic processes. When mapping the meta-training set into a function, one might employ a deterministic mapping as in Garnelo et al. (2018a) . However, it is more natural to assume that there may be multiple plausible functions that might have generated the given data, and thus encode the functional (epistemic) uncertainty as a part of the NP model. Garnelo et al. (2018b) later proposed to map the meta-training set into a fixed dimensional global latent variable with a Gaussian posterior approximation. While this improves upon the vanilla model without such a latent variable (Le et al., 2018) , expressing the functional uncertainty only through the Gaussian approximated latent variable has been reported to be a bottleneck (Louizos et al., 2019) . To this end, Lee et al. (2020) and Lee et al. (2022) propose to apply bootstrap to the meta-training set to use the uncertainty arising from the population distribution as a source for the functional uncertainty. In this paper, we take a rather different approach to define the functional uncertainty for NPs. Specifically, we utilize the martingale posterior distribution (Fong et al., 2021) , a recently developed alternative to conventional Bayesian inference. In the martingale posterior, instead of eliciting a

