NEURAL DIFFUSION PROCESSES

Abstract

Gaussian processes provide an elegant framework for specifying prior and posterior distributions over functions. They are, however, also computationally expensive, and limited by the expressivity of their covariance function. We propose Neural Diffusion Processes (NDPs), a novel approach based upon diffusion models, that learns to sample from distributions over functions. Using a novel attention block we are able to incorporate properties of stochastic processes, such as exchangeability, directly into the NDP's architecture. We empirically show that NDPs are able to capture functional distributions that are close to the true Bayesian posterior. This enables a variety of downstream tasks, including hyperparameter marginalisation, non-Gaussian posteriors and global optimisation.

1. INTRODUCTION

Gaussian processes (GPs) offer a powerful framework for defining distributions over functions [26] . It is an appealing framework because Bayes rule allows one to reason consistently about the predictive distribution, allowing the model to be data efficient. However, for many problems GPs are not an appropriate prior. Consider, for example, a function that has a discontinuity at some unknown location. This cannot be expressed in terms of a GP, because it is impossible to express such behaviour by the first two moments of a multivariate normal distribution [23] . One popular approach to these problems is to abandon GPs, in favour of Neural network (NN) based generative models. Successful methods include the meta-learning approaches of Neural Processes (NPs) [8; 12; 2; 21], and VAE-based models [22; 6] . By leveraging a large number of small datasets during training, they are able to transfer knowledge across datasets at prediction time. Using NNs is appealing since most of the computational effort is expended during the training process, while the task of prediction usually becomes more straightforward. A further major advantage of a NN-based approach is that they are not restricted by the Gaussian assumption. We seek to improve upon these methods by extending an existing state-of-the-art NN-based generative model. In terms of sample quality, the so-called probabilistic denoising diffusion model [31; 32; 10] has recently been shown to outperform existing methods on tasks such as image [24; 25], molecular structure [40; 11], point cloud [20] and audio signal [14] generation. However, the Bayesian inference of functions poses a fundamentally different challenge, one which has not been tackled previously by diffusion models.

Contributions

We propose a novel model, the Neural Diffusion Process (NDP), which extends the use case of diffusion models to Stochastic Processes (SPs) and is able to describe a rich distribution over functions. NDPs generalise diffusion models to infinite-dimensional function spaces by allowing the indexing of random variables onto which the model diffuses. We take particular care to enforce known symmetries and properties of SPs, including exchangeability, and marginal consistency into the model, facilitating the training process. These properties are enforced with the help of a novel attention block, namely the bi-dimensional attention block, which guarantees equivariance over the ordering of (1) the input dimensionality and (2) the sequence (i.e., datapoints). From the experiments we draw the following two conclusions: firstly, NDPs are a clear improvement over existing NN-based generative models for functions such as Neural Processes (NPs). Secondly, NDPs are an attractive alternative to GPs for specifying appropriate (i.e., non-Gaussian) priors over functions. Finally, we present a novel global optimisation method using NDPs.

