BUILDING NORMALIZING FLOWS WITH STOCHASTIC INTERPOLANTS

Abstract

A generative model based on a continuous-time normalizing flow between any pair of base and target probability densities is proposed. The velocity field of this flow is inferred from the probability current of a time-dependent density that interpolates between the base and the target in finite time. Unlike conventional normalizing flow inference methods based the maximum likelihood principle, which require costly backpropagation through ODE solvers, our interpolant approach leads to a simple quadratic loss for the velocity itself which is expressed in terms of expectations that are readily amenable to empirical estimation. The flow can be used to generate samples from either the base or target, and to estimate the likelihood at any time along the interpolant. In addition, the flow can be optimized to minimize the path length of the interpolant density, thereby paving the way for building optimal transport maps. In situations where the base is a Gaussian density, we also show that the velocity of our normalizing flow can also be used to construct a diffusion model to sample the target as well as estimate its score. However, our approach shows that we can bypass this diffusion completely and work at the level of the probability flow with greater simplicity, opening an avenue for methods based solely on ordinary differential equations as an alternative to those based on stochastic differential equations. Benchmarking on density estimation tasks illustrates that the learned flow can match and surpass conventional continuous flows at a fraction of the cost, and compares well with diffusions on image generation on CIFAR-10 and ImageNet 32×32. The method scales ab-initio ODE flows to previously unreachable image resolutions, demonstrated up to 128 × 128.

1. INTRODUCTION

Contemporary generative models have primarily been designed around the construction of a map between two probability distributions that transform samples from the first into samples from the second. While progress has been from various angles with tools such as implicit maps (Goodfellow et al., 2014; Brock et al., 2019) , and autoregressive maps (Menick & Kalchbrenner, 2019; Razavi et al., 2019; Lee et al., 2022) , we focus on the case where the map has a clear associated probability flow. Advances in this domain, namely from flow and diffusion models, have arisen through the introduction of algorithms or inductive biases that make learning this map, and the Jacobian of the associated change of variables, more tractable. The challenge is to choose what structure to impose on the transport to best reach a complex target distribution from a simple one used as base, while maintaining computational efficiency. In the continuous time perspective, this problem can be framed as the design of a time-dependent map, X t (x) with t ∈ [0, 1], which functions as the push-forward of the base distribution at time t = 0 onto some time-dependent distribution that reaches the target at time t = 1. Assuming that these distributions have densities supported on Ω ⊆ R d , say ρ 0 for the base and ρ 1 for the target, this amounts to constructing X t : Ω → Ω such that if x ∼ ρ 0 then X t (x) ∼ ρ t for some density ρ t such that ρ t=0 = ρ 0 and ρ t=1 = ρ 1 . (1)

