ACCELERATING GUIDED DIFFUSION SAMPLING WITH SPLITTING NUMERICAL METHODS

Abstract

Guided diffusion is a technique for conditioning the output of a diffusion model at sampling time without retraining the network for each specific task. However, one drawback of diffusion models, whether they are guided or unguided, is their slow sampling process. Recent techniques can accelerate unguided sampling by applying high-order numerical methods to the sampling process when viewed as differential equations. On the contrary, we discover that the same techniques do not work for guided sampling, and little has been explored about its acceleration. This paper explores the culprit of this problem and provides a solution based on operator splitting methods, motivated by our key finding that classical highorder numerical methods are unsuitable for the conditional function. Our proposed method can re-utilize the high-order methods for guided sampling and can generate images with the same quality as a 250-step DDIM baseline using 32-58% less sampling time on ImageNet256. We also demonstrate usage on a wide variety of conditional generation tasks, such as text-to-image generation, colorization, inpainting, and super-resolution.

1. INTRODUCTION

A family of generative models known as diffusion models has recently gained a lot of attention with state-of-the-art image generation quality (Dhariwal & Nichol, 2021) . Guided diffusion is an approach for controlling the output of a trained diffusion model for conditional generation tasks without retraining its network. By engineering a task-specific conditional function and modifying only the sampling procedure, guided diffusion models can be used in a variety of applications, such as class-conditional image generation (Dhariwal & Nichol, 2021; Kawar et al., 2022) , text-to-image generation (Nichol et al., 2022) , image-to-image translation (Zhao et al., 2022 ), inpainting (Chung et al., 2022a ), colorization (Song et al., 2020b ), image composition (Sasaki et al., 2021 ), adversarial purification (Wang et al., 2022; Wu et al., 2022) and super-resolution (Choi et al., 2021) . One common drawback of both guided and regular "unguided" diffusion models is their slow sampling processes, usually requiring hundreds of iterations to produce a single image. Recent speedup attempts include improving the noise schedule (Nichol & Dhariwal, 2021; Watson et al., 2021) , redefining the diffusion process to be non-Markovian, thereby allowing a deterministic sampling process Song et al. 2022) with impressive results on unguided diffusion models. However, when applied to guided diffusion models, these methods produce surprisingly poor results (see Figure 1 )-given a few number of steps, those high-order numerical methods actually perform worse than low-order methods. Guided sampling differs from the unguided one by the addition of the gradients of the conditional function to its sampling equation. The observed performance decline thus suggests that classical high-order methods may not be suitable for the conditional function and, consequently, the guided sampling equation as a whole. Our paper tests this hypothesis and presents an approach to accelerating guided diffusion sampling. The key idea is to use an operator splitting method to split the less well-behaved conditional function term from the standard diffusion term and solve them separately. This approach not only allows re-utilizing the successful high-order methods on the diffusion term but also provides us with options to combine different specialized methods for each term to maximize performance. Note that splitting methods have also been explored by Dockhorn et al. (2022) to solve unguided diffusion SDEs, but our work focuses on accelerating guided diffusion ODEs. Our design process includes comparing different splitting methods and numerical methods for each split term. When tested on ImageNet, our approach achieves the same level of image quality as a DDIM baseline while reducing the sampling time by approximately 32-58%. Compared with other sampling methods using the same sampling time, our approach provides better image quality as measured by LPIPS, FID, and Perception/Recall. With only minimal modifications to the sampling equation, we also show successful acceleration on various conditional generation tasks.

2. BACKGROUND

This section provides a high-level summary of the theoretical foundation of diffusion models as well as numerical methods that have been used for diffusion models. Here we briefly explain a few that contribute to our method.

2.1. DIFFUSION MODELS

Assuming that x 0 is a random variable from the data distribution we wish to reproduce, diffusion models define a sequence of Gaussian noise degradation of x 0 as random variables x 1 , x 2 , ..., x T , where x t ∼ N ( √ 1 -β t x t-1 , β t I) and β t ∈ [0, 1] are parameters that control the noise levels. With a property of Gaussian distribution, we can express x t directly as a function of x 0 and noise ϵ ∼ N (0, I) by x t = √ ᾱt x 0 + √ 1 -ᾱt ϵ, where ᾱt = t i=1 (1 -β i ) . By picking a sufficiently large T (e.g., 1,000) and an appropriate set of β t , we can assume x T is a standard Gaussian distribution. The main idea of diffusion model generation is to sample a Gaussian noise x T and use it to reversely sample x T -1 , x T -2 , ... until we obtain x 0 , which belongs to our data distribution. Ho et al. (2020) propose Denoising Diffusion Probabilistic Model (DDPM) and explain how to employ a neural network ϵ θ (x t , t) to predict the noise ϵ that is used to compute x t . To train the network, we sample a training image x 0 , t, and ϵ to compute x t using the above relationship. Then, we optimize our network ϵ θ to minimize the difference between the predicted and real noise, i.e., ∥ϵ -ϵ θ (x t , t)∥ 2 .



(2020a), network distillation that teaches a student model to simulate multiple sampling steps of a teacher model Salimans & Ho (2022); Luhman & Luhman (2021), among others. Song et al. (2020a) show how each sampling step can be expressed as a first-order numerical step of an ordinary differential equation (ODE). Similarly, Song et al. (2020b) express the sampling of a score-based model as solving a stochastic differential equation (SDE). By regarding the sampling process as an ODE/SDE, many high-order numerical methods have been suggested, such as Liu et al. (2022), Zhang & Chen (2022), and Zhang et al. (

Figure 1: Generated samples of a classifier-guided diffusion model trained on ImageNet256 using 8-256 sampling steps from different sampling methods. Our technique, STSP4, produces highquality results in a fewer number of steps.

