ITERATIVE α-(DE)BLENDING: LEARNING A DETER-MINISTIC MAPPING BETWEEN ARBITRARY DENSITIES

Abstract

We present a learning method that produces a mapping between arbitrary densities, such that random samples of a density can be mapped to random samples of another. In practice, our method is similar to deterministic diffusion processes where samples of the target density are blended with Gaussian noise. The originality of our approach is that, in contrast to several recent works, we do not rely on Langevin dynamics or score-matching concepts. We propose a simpler take on the topic, which is based solely on basic sampling concepts. By studying blended samples and their posteriors, we show that iteratively blending and deblending samples produces random paths between arbitrary densities. We prove that, for finite-variance densities, these paths converge towards a deterministic mapping that can be learnt with a neural network trained to deblend samples. Our method can thus be seen as a generalization of deterministic denoising diffusion where, instead of learning to denoise Gaussian noise, we learn to deblend arbitrary data.



We provide a short video overview of the paper in our supplementary material. 

1. INTRODUCTION

Diffusion models have recently become one of the most popular generative modeling tools (Ramesh et al., 2022) . They have outperformed state-of-the-art GANs (Karras et al., 2020; 2021) and been applied to many applications such as image generation (Rombach et al., 2021; Dhariwal & Nichol, 2021) , image processing (Saharia et al., 2021; Kawar et al., 2022; Whang et al., 2022 ), text-toimage (Saharia et al., 2022b ), video (Ho et al., 2022) or audio (Kong et al., 2020) . First, there were stochastic diffusion models... These diffusion models have in common that they can be formulated as a Stochastic Differential Equations (SDEs) (Song et al., 2021b) ) are the ODE variants of DDPMs. These ODEs provide a smooth deterministic mapping between the Gaussian noise density and the true density. Deterministic diffusion models have been motivated recently because an ODE requires much fewer solver iterations than its SDE counterpart. Furthermore, a deterministic mapping presents multiple practical advantages because samples are uniquely determined by their prior Gaussian noise, can be interpolated via the Gaussian noise, etc. Is there a simpler approach to deterministic diffusion? The point of the above story is that, in the recent line of work on diffusion models, stochastic diffusion models came first and deterministic diffusion models came after, framed as special cases of the stochastic ones. They hence inherited the underlying mindset and mathematical framework. As a result, advanced concepts such as Langevin dynamics, score matching, how they relate to Gaussian noise, etc. appear to be necessary background to grasp recent deterministic diffusion models. We argue that this is a significant detour to something that can be framed in a much simpler and more general way. We propose a fresh take on deterministic diffusion with another mindset, using only basic sampling concepts. • We derive a deterministic diffusion-like model based on the sampling interpretation of blending and deblending. We call it Iterative α-(de)Blending (IADB) in reference to the Computer Graphics α-blending technique that composes images with a transparency parameter Porter & Duff (1984) . Our model defines a mapping between arbitrary densities (of finite-variance). • We show that when the initial density is Gaussian, the mappings defined by IADB are exactly the same as the ones defined by DDIM (Song et al., 2021a) . On the theoretical side, our model can thus be seen as a generalization of DDIM to arbitrary sampling densities rather than just Gaussian. Furthermore, our alternative derivation leads to a more numerically stable sampling formulation. Our experiments show that IADB consistently outperforms DDIM in terms of final FID on several datasets and is more stable with small number of steps in the sampling stage. • We explore the generalization to arbitrary non-Gaussian densities provided by our model. We report that, although this generalization seems promising on the theoretical side, the application possibilities were disappointing in our experiments in image generation. We found that sampling with non-Gaussian densities can significantly lower the quality of the generated samples and that the mappings are not always interesting for image processing applications.



Figure1: Iterative α-blending and deblending. We train a neural network to deblend blended inputs. By deblending and reblending iteratively we obtain a mapping between arbitrary densities.

such as Langevin dynamics. Langevin's equation models a random walk that obeys a balance between two operations related to Gaussian noise: increasing noise by adding more noise and decreasing noise by climbing the gradient of the log density. Increasing noise performs large steps but puts the samples away from the true density. Decreasing noise projects the samples back on the true density. Carefully tracking and controlling this balance allows to perform efficient random walk and provides a sampling procedure for the true density. This is the core of denoising diffusion approaches. Noise Conditional Score Networks (NCSNs) (Song & Ermon, 2019; 2020) use Langevin's equation directly by leveraging the fact that the score (the gradient of the log density in Langevin's equation) can be learnt via a denoiser when the samples are corrupted with Gaussian noise(Vincent, 2011). Denoising Diffusion Probabilistic Models (DDPMs)(Ho et al., 2020; Nichol & Dhariwal, 2021) use a Markov chain formalism with a Gaussian prior that provides an SDE similar to Langevin dynamics where the score is also implicitly learnt with a denoiser.

