f -DM: A MULTI-STAGE DIFFUSION MODEL VIA PRO-GRESSIVE SIGNAL TRANSFORMATION

ABSTRACT

Diffusion models (DMs) have recently emerged as SoTA tools for generative modeling in various domains. Standard DMs can be viewed as an instantiation of hierarchical variational autoencoders (VAEs) where the latent variables are inferred from input-centered Gaussian distributions with fixed scales and variances. Unlike VAEs, this formulation constrains DMs from changing the latent spaces and learning abstract representations. In this work, we propose f -DM, a generalized family of DMs, which allows progressive signal transformation. More precisely, we extend DMs to incorporate a set of (hand-designed or learned) transformations, where the transformed input is the mean of each diffusion step. We propose a generalized formulation of DMs and derive the corresponding de-noising objective together with a modified sampling algorithm. As a demonstration, we apply f -DM in image generation tasks with a range of functions, including down-sampling, blurring, and learned transformations based on the encoder of pretrained VAEs. In addition, we identify the importance of adjusting the noise levels whenever the signal is sub-sampled and propose a simple rescaling recipe. f -DM can produce high-quality samples on standard image generation benchmarks like FFHQ, AFHQ, LSUN and ImageNet with better efficiency and semantic interpretation. Please check our videos at http://jiataogu.me/fdm/. 



Figure1: Visualization of reverse diffusion from f -DMs with various signal transformations. x t is the denoised output, and z s is the input to the next diffusion step. We plot the first three channels of VQVAE latent variables. Low-resolution images are resized to 256 2 for ease of visualization.

