APPROXIMATED ANOMALOUS DIFFUSION: GAUSSIAN MIXTURE SCORE-BASED GENERATIVE MODELS

Abstract

Score-based generative models (SGMs) can generate high-quality samples via Langevin dynamics with a drift term and a diffusion term (Gaussian noise) iteratively calculated and added to a sample until convergence. In biological systems, it is observed that the neural population can conduct heavy-tailed Lévy dynamics for sampling-based probabilistic representation through neural fluctuations. Critically, unlike the existing sampling process of SGMs, heavy-tailed Lévy dynamics can produce both large jumps and small roaming to explore the sampling space, resulting in better sampling results than Langevin dynamics with a lacking of large jumps. Motivated by this contrast, we explore a new class of SGMs with the sampling based on the Lévy dynamics. However, exact numerical simulation of the Lévy dynamics is significantly more challenging and intractable. We hence propose an alternative solution by leveraging Gaussian mixture noises during training to mimic the desired large jumps and small roaming properties. Theoretically, GM-SGMs conduct a probabilistic graphical model used by empirical Bayes for sampling, expanding the maximum a posteriori (MAP) estimation applied by conventional SGMs. Expensive experiments on the challenging image generation tasks show that our GM-SGMs exhibit superior sampling quality over prior art SGMs across various sampling iterations.

1. INTRODUCTION

Score-based generative models (SGMs) (Song and Ermon, 2019; 2020; Song et al., 2021b; a; Dockhorn et al., 2022; Karras et al., 2022) have recently demonstrated tremendous performance in data synthesis, especially high-quality images, along with easier model optimization (Song and Ermon, 2019) , richer generative diversity (Xiao et al., 2022) , and solid theories (De Bortoli et al., 2021) . During optimization, SGMs learn to fit a score function by predicting the Gaussian noises added to a sample drawn from a target dataset. To generate a sample in the target distribution, the SGMs conduct Langevin dynamics constructed from the score function. This process reverses a Brownian motion starting from the dataset distribution with i.i.d. Gaussian increments. As a special case of Monte Carlo Markov Chain methods, the Langevin dynamics has been widely applied for constructing sampling-based algorithms (Rey-Bellet and Spiliopoulos, 2015) . However, increasing evidence from experimental observation suggests that in biological systems, the neural population implements sampling-based probabilistic representations through a heavy-tailed Lévy dynamics (He, 2014; Donoghue et al., 2020; Townsend and Gong, 2018; Muller et al., 2018) , which instead reverses an anomalous diffusion process with heavy-tailed increments (Fig. 2 (Left)). The neural coding benefits from Lévy dynamics, since it can implement large jumps that facilitates the process to escape from local minimal and explore the sampling space more thoroughly (Ye and Zhu, 2018; Qi and Gong, 2022) . A natural question arises: can we apply the Lévy dynamics instead of the Langevin dynamics for better sampling performance of SGMs? Inspired by this insight, we explore a novel class of SGMs that reverse the anomalous diffusion for sampling. Nonetheless, exact numerical simulation of the Lévy dynamics (i.e., reversing the anomalous diffusion) is drastically more challenging and intractable, especially for high-dimensional data such as images. To tackle this challenge, we consider Brownian motion with Gaussian mixture as an approximation of the anomalous diffusion. To tackle this challenge, we train the SGMs with Gaussian mixture noises to enable both large jumps and small roaming during the sampling phase, reminiscent of the Lévy dynamics. Concretely, we construct a novel variant of SGMs, namely Gaussian Mixture SGMs (GM-SGMs), that learn to denoise Gaussian mixture noises; In doing so, our model is enabled to reverse the Gaussian mixture Brownian motion by switching between large jump and small roaming during sampling, resembling the merits of Lévy dynamics. Theoretically, our GM-SGMs conduct the probabilistic graphical model (PGM) of an empirical Bayes (EB) to reverse a Gaussian mixture Brownian motion; Instead, conventional SGMs perform a PGM of maximum a posteriori (MAP) estimation to reverse a Brownian motion. Empirically, extensive experiments on several challenging image generation tasks verify the ability of our GM-SGMs to automatically select large jump or small roaming during sampling and the promising ability of our GM-SGMs over state-of-the-art SGMs under different sampling budgets.

2. PRELIMINARY: SCORE-BASED GENERATIVE MODELS

Score matching was originally developed for non-normalized statistical learning (Hyvärinen and Dayan, 2005) . By observing i.i.d. samples of an unknown (target) distribution p * in d dimensions, score matching directly approximates the score function s(x) := ▽ x log p * (x) via a model s θ parameterized by θ, for x ∈ R d . Score-based generative models (SGMs) aim to generate samples in the distribution p * via score matching through the following iterations x T ∼ N (0, I), x t-1 = x t + ϵ 2 t 2 s θ (x t , t) + ϵ t z t , t = T, . . . , 1, where ϵ t is the step size and z t ∼ i.i.d. N (0, I). This process transforms a Gaussian noise x T towards a sample x 0 obeying p * . Eq.( 1) can be considered as the reverse of a corrupting process where the noises are gradually added to a datum x 0 x 0 ∼ p * , x t+1 = x t + ϵ t z t , t = 0, . . . , T -1, The first SGM, noise conditional score network (NCSN) (Song and Ermon, 2019), is trained by fitting the score function s(x) via minimizing the weighted explicit score matching (ESM) objective L(θ; {σ t } T t=1 ) = T t=1 λ(σ t )E x∼p * ,η∼N (0,σ 2 t I) 1 2 ∥s θ (x + η, t) -s(x + η)∥ 2 , where σ 2 t is the noise variance at the time step t, and λ(σ t ) the weights for each time step t. Discarding the constant part independent from θ, the ESM can be rewritten as a tractable denoising score matching (DSM) objective (Vincent, 2011) L(θ; {σ t } T t=1 ) = T t=1 λ(σ t )E x∼p * ,η∼N (0,σ 2 t I) 1 2 ∥s θ (x + η, t) -▽ x+η log p σt (x + η | x)∥ 2 = T t=1 λ(σ t )E x∼p * ,η∼N (0,σ 2 t I) 1 2 s θ (x + η, t) + η σ 2 t 2 , where p σt (• | x) := N (•; x, σ 2 t I). Considering the sampling process as a stochastic differential equation (SDE), Song et al. (2021b) further proposed an improved version NCSN++ that utilizes an existing numerical solver of SDEs to enhance the sampling quality.

3.1. ANOMALOUS DIFFUSION PERSPECTIVE

In this section, we introduce a wider class of corrupting processes as well as their reverses (i.e., the corresponding sampling process), followed by analyzing their fundamental properties. We first write Eq.(1-2) in a continuous formula. Eq.( 2) is the discretized version of a Brownian motion x 0 ∼ p * , dx = dw, where the stochastic increment is Gaussian and satisfies ∆w t = w t+∆t -w t ∼ N (0, ∆tI), t, ∆t > 0.

