PROBABILITY FLOW SOLUTION OF THE FOKKER-PLANCK EQUATION

Abstract

The method of choice for integrating the time-dependent Fokker-Planck equation in high-dimension is to generate samples from the solution via integration of the associated stochastic differential equation. Here, we introduce an alternative scheme based on integrating an ordinary differential equation that describes the flow of probability. Acting as a transport map, this equation deterministically pushes samples from the initial density onto samples from the solution at any later time. Unlike integration of the stochastic dynamics, the method has the advantage of giving direct access to quantities that are challenging to estimate from trajectories alone, such as the probability current, the density itself, and its entropy. The probability flow equation depends on the gradient of the logarithm of the solution (its "score"), and so is a-priori unknown. To resolve this dependence, we model the score with a deep neural network that is learned on-the-fly by propagating a set of samples according to the instantaneous probability current. We consider several high-dimensional examples from the physics of interacting particle systems to highlight the efficiency and precision of the approach; we find that the method accurately matches analytical solutions computed by hand and moments computed via Monte-Carlo.

1. INTRODUCTION

The time evolution of many dynamical processes occurring in the natural sciences, engineering, economics, and statistics are naturally described in the language of stochastic differential equations (SDE) (Gardiner, 2009; Oksendal, 2003; Evans, 2012) . Typically, one is interested in the probability density function (PDF) of these processes, which describes the probability that the system will occupy a given state at a given time. The density can be obtained as the solution to a Fokker-Planck equation (FPE), which can generically be written as (Risken, 1996; Bass, 2011)  ∂ t ρ * t (x) = -∇ • (b t (x)ρ * t (x) -D t (x)∇ρ * t (x)) , x ∈ Ω ⊆ R d , (FPE) where ρ * t (x) ∈ R ≥0 denotes the value of the density at time t, b t (x) ∈ R d is a vector field known as the drift, and D t (x) ∈ R d×d is a positive-semidefinite tensor known as the diffusion matrix. (FPE) must be solved for t ≥ 0 from some initial condition ρ * t=0 (x) = ρ 0 (x), but in all but the simplest cases, the solution is not available analytically and can only be approximated via numerical integration. High-dimensionality. For many systems of interest -such as interacting particle systems in statistical physics (Chandler, 1987; Spohn, 2012) , stochastic control systems (Kushner et al., 2001) , and models in mathematical finance (Oksendal, 2003) -the dimensionality d can be very large. This renders standard numerical methods for partial differential equations inapplicable, which become infeasible for d as small as five or six due to an exponential scaling of the computational complexity with d. The standard solution to this problem is a Monte-Carlo approach, whereby the SDE associated with (FPE) dx t = b t (x t )dt + ∇ • D t (x t )dt + √ 2σ t (x t )dW t , is evolved via numerical integration to obtain a large number n of trajectories (Kloeden & Platen, 1992) . In (1), σ t (x) satisfies σ t (x)σ T t (x) = D t (x) and W t is a standard Brownian motion on R d . Assuming that we can draw samples {x i 0 } n i=1 from the initial PDF ρ 0 , simulation of (1) enables the estimation of expectations via empirical averages Ω ϕ(x)ρ * t (x)dx ≈ 1 n n i=1 ϕ(x i t ), where ϕ : Ω → R is an observable of interest. While widely used, this method only provides samples from ρ * t , and hence other quantities of interest like the value of ρ * t itself or the time-dependent differential entropy of the system H t = -Ω log ρ * t (x)ρ * t (x)dx require sophisticated interpolation methods that typically do not scale well to high-dimension. A transport map approach. Another possibility, building on recent theoretical advances that connect transportation of measures to the Fokker-Planck equation (Jordan et al., 1998) , is to recast (FPE) as the transport equation (Villani, 2009; Santambrogio, 2015 ) ∂ t ρ * t (x) = -∇ • (v * t (x)ρ * t (x)) where we have defined the velocity field v * t (x) = b t (x) -D t (x)∇ log ρ * t (x). This formulation reveals that ρ * t can be viewed as the pushforward of ρ 0 under the flow map X * τ,t (•) of the ordinary differential equation d dt X * τ,t (x) = v * t (X * τ,t (x)), X * τ,τ (x) = x, t, τ ≥ 0. Equation ( 5) is known as the probability flow equation, and its solution has the remarkable property that if x is a sample from ρ 0 , then X * 0,t (x) will be a sample from ρ * t . Viewing X * τ,t : Ω → Ω as a transport map, ρ * t = X * 0,t ♯ρ 0 can be evaluated at any position in Ω via the change of variables formula (Villani, 2009; Santambrogio, 2015 ) ρ * t (x) = ρ 0 (X * t,0 (x)) exp - t 0 ∇ • v * τ (X * t,τ (x))dτ where X * t,0 (x) is obtained by solving (5) backward from some given x. Importantly, access to the PDF as provided by ( 6) immediately gives the ability to compute quantities such as the probability current or the entropy; by contrast, this capability is absent when directly simulating the SDE. Learning the flow. The simplicity of the probability flow equation ( 5) is somewhat deceptive, because the velocity v * t depends explicitly on the solution ρ * t to the Fokker-Planck equation (FPE). Nevertheless, recent work in generative modeling via score-based diffusion (Song & Ermon, 2020a; b; Song & Kingma, 2021 ) has shown that it is possible to use deep neural networks to estimate v * t , or equivalently the so-called score ∇ log ρ * t of the solution density. Here, we introduce a variant of score-based diffusion modeling in which the score is learned on-the-fly over samples generated by the probability flow equation itself. The method is self-contained and enables us to bypass simulation of the SDE entirely; moreover, we provide both empirical and theoretical evidence that the resulting self-consistent training procedure offers improved performance when compared to training via samples produced from simulation of the SDE. 1.1 CONTRIBUTIONS Our contributions are both theoretical and computational: • We provide a bound on the Kullback-Leibler divergence from the estimate ρ t produced via an approximate velocity field v t to the target ρ * t . This bound motivates our approach, and shows that minimizing the discrepancy between the learned score and the score of the push-forward distribution systematically improves the accuracy of ρ t . • Based on this bound, we introduce two optimization problems that can be used to learn the velocity field (4) in the transport equation (3) so that its solution coincides with that of the Fokker Planck equation (FPE). Due to its similarities with score-based diffusion approaches in generative modeling (SBDM), we call the resulting method score-based transport modeling (SBTM). • We provide specific estimators for quantities that can be computed via SBTM but are not directly available from samples alone, like point-wise evaluation of ρ t itself, the differential entropy, and the probability current.

