GENERATIVE MODELLING WITH INVERSE HEAT DISSIPATION

Abstract

While diffusion models have shown great success in image generation, their noise-inverting generative process does not explicitly consider the structure of images, such as their inherent multi-scale nature. Inspired by diffusion models and the empirical success of coarse-to-fine modelling, we propose a new diffusion-like model that generates images through stochastically reversing the heat equation, a PDE that locally erases fine-scale information when run over the 2D plane of the image. We interpret the solution of the forward heat equation with constant additive noise as a variational approximation in the diffusion latent variable model. Our new model shows emergent qualitative properties not seen in standard diffusion models, such as disentanglement of overall colour and shape in images. Spectral analysis on natural images highlights connections to diffusion models and reveals an implicit coarse-to-fine inductive bias in them.

1. INTRODUCTION

Diffusion models have recently become highly successful in generative modelling tasks (Ho et al., 2020; Song et al., 2021d; Dhariwal & Nichol, 2021) . They are defined by a forward process that erases the original image information content and a reverse process that generates images iteratively. The forward and reverse processes of standard diffusion models do not explicitly consider the inductive biases of natural images, such as their multi-scale nature. In other successful generative modelling settings, such as in GANs (Goodfellow et al., 2014) , taking multiple resolutions explicitly into account has resulted in dramatic improvements (Karras et al., 2018; 2021) . This paper investigates how to incorporate the inductive biases of natural images, particularly their multi-resolution nature, into the generative sequence of diffusion-like iterative generative models. The concept of resolution itself in deep learning methods has received less attention, and usually, scaling is based on simple pixel sub-sampling pyramids, halving the resolution per step. In classical computer vision, another approach is the so-called Gaussian scale-space (Iijima, 1962; Witkin, 1987; Babaud et al., 1986; Koenderink, 1984) , where lower-resolution versions of an image are obtained by running the heat equation, a partial differential equation (PDE, see Fig. 1 ) that describes the dissipation of heat, over the image.

Information melting forward process

Similarly to subsampling, the heat equation averages out the images and removes fine detail, but an arbitrary amount of effective resolutions is allowed without explicitly decreasing the number of pixels. The scale-space adheres to a set of scale-space axioms, such as rotational symmetry, invariance to shifts in the input image, and scale invariance (Koenderink, 1984; Babaud et al. ,

Standard diffusion model

Non-invertible forward process Generative reverse process

Inverse heat dissipation model

Non-invertible forward process Generative reverse process We investigate inductive biases in diffusion-type generative models by proposing a generative model based on directly reversing the heat equation and thus increasing the effective image resolution, illustrated in Fig. 2 . We call it the inverse heat dissipation model (IHDM). The intuition is that as the original image information content is erased in the forward process, a corresponding stochastic reverse process produces multiple plausible reconstructions, defining a generative model. Samples from the prior distribution are easy obtain due to the low dimensionality of averaged images, and we adopt a training data based kernel density estimate. Our main contributions are: (i) We show how to realise the idea of generative modelling with inverse heat dissipation by interpreting a solution of the heat equation with small additive noise an inference process in a diffusion-like latent variable model. We investigate emergent properties of the heat equation-based model: (a) disentanglement of overall colour and image shape (b) smooth interpolation, (c) the forward process inducing simplicity to the learned neural net function, and (d) potential for data efficiency. (iii) By analysing the power spectral density of natural images, we show that standard diffusion models implicitly perform a different type of coarse-to-fine generation, shedding light on their inductive biases, and highlighting connections and differences between our model and standard diffusion models. Code for the methods in this paper is available at: https://github.com/AaltoML/generative-inverse-heat-dissipation.

2. METHODS

The main characteristic of the forward process is that it averages out the images in the data set, contracting them into a lower-dimensional subspace (see Fig. 2 right ). We define it with the heat equation, a linear partial differential equation (PDE) that describes the dissipation of heat: Forward PDE model: ∂ ∂t u(x, y, t) = ∆u(x, y, t), where u : R 2 × R + → R is the idealized, continuous 2D plane of one channel of the image, and ∆ = ∇ 2 is the Laplace operator. The process is run for each colour channel separately. We use Neumann boundary conditions ( ∂u /∂x = ∂u /∂y = 0) with zero-derivatives at boundaries of the image bounding box. This means that as t → ∞, each colour channel is averaged out to the mean of the original colour intensities in the image. Thus, the image is projected to R 3 . In principle, the heat equation could be exactly reversible with infinite numerical precision, but this is not the case in practice with finite numerical accuracy due to the fundamental ill-posed nature of the inverse heat equation. Another way to view it is that with any amount of observation noise added on top of the averaged image, the original image cannot be recovered exactly (see Kaipio & Somersalo, 2006, for discussion) . The PDE model in Eq. ( 1) can be formally written in evolution equation form as u(x, y, t) = F(t) u(x, y, t)| t=t0 , where F(t) = exp[(tt 0 ) ∆] is an evolution operator given in terms of the operator exponential function (see, e.g., Da Prato & Zabczyk, 1992) . We can use this general formulation to efficiently solve the equation using the eigenbasis of the Laplace operator. Since we



Figure 1: Example of the forward process (during training) and the generative inverse process (for sample generation).

Figure 2: Comparison of generation by generative denoising and inverse heat diffusion, where the focus of the forward process is in the pixel space in the left and the 2D image plane on the right.

