DENOISING DIFFUSION SAMPLERS

Abstract

Denoising diffusion models are a popular class of generative models providing state-of-the-art results in many domains. One adds gradually noise to data using a diffusion to transform the data distribution into a Gaussian distribution. Samples from the generative model are then obtained by simulating an approximation of the time-reversal of this diffusion initialized by Gaussian samples. Practically, the intractable score terms appearing in the time-reversed process are approximated using score matching techniques. We explore here a similar idea to sample approximately from unnormalized probability density functions and estimate their normalizing constants. We consider a process where the target density diffuses towards a Gaussian. Denoising Diffusion Samplers (DDS) are obtained by approximating the corresponding time-reversal. While score matching is not applicable in this context, we can leverage many of the ideas introduced in generative modeling for Monte Carlo sampling. Existing theoretical results from denoising diffusion models also provide theoretical guarantees for DDS. We discuss the connections between DDS, optimal control and Schrödinger bridges and finally demonstrate DDS experimentally on a variety of challenging sampling tasks.

1. INTRODUCTION

Let π be a probability density on R d of the form π(x) = γ(x) Z , Z = R d γ(x)dx, where γ : R d → R + can be evaluated pointwise but the normalizing constant Z is intractable. We are here interested in both estimating Z and obtaining approximate samples from π. A large variety of Monte Carlo techniques has been developed to address this problem. In particular Annealed Importance Sampling (AIS) (Neal, 2001) and its Sequential Monte Carlo (SMC) extensions (Del Moral et al., 2006) are often regarded as the gold standard to compute normalizing constants. Variational techniques are a popular alternative to Markov Chain Monte Carlo (MCMC) and SMC where one considers a flexible family of easy-to-sample distributions q θ whose parameters are optimized by minimizing a suitable metric, typically the reverse Kullback-Leibler discrepancy KL(q θ ||π). Typical choices for q θ include mean-field approximation (Wainwright & Jordan, 2008) or normalizing flows (Papamakarios et al., 2021) . To be able to model complex variational distributions, it is often useful to model q θ (x) as the marginal of an auxiliary extended distribution; i.e. q θ (x) = q θ (x, u)du. As this marginal is typically intractable, θ is then learned by minimizing a discrepancy measure between q θ (x, u) and an extended target p θ (x, u) = π(x)p θ (u|x) where p θ (u|x) is an auxiliary conditional distribution (Agakov & Barber, 2004) . Over recent years, Monte Carlo techniques have also been fruitfully combined to variational techniques. For example, AIS can be thought of a procedure where q θ (x, u) is the joint distribution of a Markov chain defined by a sequence of MCMC kernels whose final state is x while p θ (x, u) is the corresponding AIS extended target (Neal, 2001) . The parameters θ of these kernels can then be optimized by minimizing KL(q θ ||p θ ) using stochastic gradient descent (Wu et al., 2020; Geffner & Domke, 2021; Thin et al., 2021; Zhang et al., 2021; Doucet et al., 2022; Geffner & Domke, 2022) . Instead of following an AIS-type approach to define a flexible variational family, we follow here an approach inspired by Denoising Diffusion Probabilistic Models (DDPM), a powerful class of * Work done while at DeepMind 1

