ACTION MATCHING: A VARIATIONAL METHOD FOR LEARNING STOCHASTIC DYNAMICS FROM SAMPLES

Abstract

Stochastic dynamics are ubiquitous in many fields of science, from the evolution of quantum systems in physics to diffusion-based models in machine learning. Existing methods such as score matching can be used to simulate these physical processes by assuming that the dynamics is a diffusion, which is not always the case. In this work, we propose a method called "Action Matching" that enables us to learn a much broader family of stochastic dynamics. Our method requires access only to samples from different time-steps, makes no explicit assumptions about the underlying dynamics, and can be applied even when samples are uncorrelated (i.e., are not part of a trajectory). Action Matching directly learns an underlying mechanism to move samples in time without modeling the distributions at each time-step. In this work, we showcase how Action Matching can be used for several computer vision tasks such as generative modeling, super-resolution, colorization, and inpainting; and further discuss potential applications in other areas of science.

1. INTRODUCTION

The problem of learning stochastic dynamics is one of the most fundamental problems in many different fields of science. In physics, porous medium equations (Vázquez, 2007) describe many natural phenomena from this perspective, such as Fokker Planck equation in statistical mechanics, Vlasov equation for plasma, and Nonlinear heat equation. Another prominent example is from Quantum Mechanics where the state of physical systems is a distribution whose evolution is described by the Schrödinger equation. Recently, stochastic dynamics have achieved very promising results in machine learning applications. The most promising examples of this approach are the diffusion-based generative models (Song et al., 2020b; Ho et al., 2020) . Informal Problem Setup In this paper we approach the problem of Learning Stochastic Dynamics from their samples. Suppose we observe the time evolution of some random variable X t with the density q t , from t 0 to t 1 . Having access to samples from the density q t at different points in time t ∈ [t 0 , t 1 ], we want to build a model of the dynamics by learning how to move samples in time such that they respect the marginals q t . In this work, we propose a method called "Action Matching" as a solution to this problem.

Learning Stochastic Dynamics vs. Time-Series

There is an important distinction between the problem of learning stochastic dynamics and time-series modeling (e.g., language, speech or video modeling). In time-series, the samples come in trajectories, where the samples in each trajectory are usually highly correlated. However, in learning stochastic dynamics, we only have access to independent samples at any given time-step (i.e., uncorrelated samples through time). This degree of freedom allows us to solve different types of problems that can not be approached by time-series modeling. We provide several examples in our experiment section, but also point out that sometimes it is even physically impossible to obtain samples along trajectories. For example, in Quantum Mechanics, the act of measurement at a given point collapses the wave function which prevents us from obtaining further samples along that trajectory. Generative Modeling with Action Matching From the Machine Learning perspective, the problem of learning stochastic dynamics is a generalization of generative modeling. One way so solve generative modeling is to first construct a distributional path (stochastic dynamics) from the data Figure 1 : Score Matching learns a model for every distribution, while Action Matching learns the transition rule between distributions according to the continuity equation. Here, we illustrate that learning the dynamics might be a much simpler task than learning all the distributions individually. distribution to a tractable prior distribution (e.g., Gaussian or uniform), and then learn to move along this path to generate samples. The most prominent example of this approach is the recent developments in diffusion generative models (Song et al., 2020b; Ho et al., 2020) , where a stochastic differential equation (SDE) is constructed to move the samples from the data distribution to the prior, and the reverse SDE is constructed by learning the score function of the intermediate distributions via Score Matching (Hyvärinen & Dayan, 2005) . Action Matching can be used for generative modeling in a similar way, where we also construct a stochastic dynamics between the data distribution and the prior. However, the important distinction is that this dynamics is constructed solely from samples of the intermediate distributions, rather than analytical SDEs used in diffusion. This heavily relaxes the constraint on the dynamics required in SDEs, and enables Action Matching to learn a much richer family of dynamics between the two distributions. For example, in both widely used VP-SDEs and VE-SDEs (Song et al., 2020b) , the conditionals q t (x t |x 0 ) are tractable Gaussian distributions, while in Action Matching, the dynamics can have any arbitrary conditional q t (x t |x 0 ), as long as it can be sampled from. We can also use Action Matching to learn the dynamics constructed by SDEs as SDEs can be sampled from. In Section 5.1, we provide a rich family of dynamics, that can be learned with Action Matching, without the knowledge of the underlying process. Another important distinction between SDEs and Action Matching is that the Action Matching modeling capacity is spent only on learning how to move the samples (in a consistent way with the marginals), and does not make any attempt to learn the marginals themselves. However, in diffusion models such as VP-SDEs or VE-SDEs, all the capacity of the model is spent on learning the score function of the individual densities ∇ log q t (x) for the backward diffusion. This is wasteful if the evolution of the density is simple, but the densities themselves are complicated. An illustrative toy example of this is provided in Fig. 1 , where a complicated density is evolving with a constant velocity through time. In this case, Action Matching only needs to learn a constant velocity vector field, without learning anything about the individual marginals. As a practical example of this, we will consider the colorization task in the experiment section, and argue that moving directly from a grayscale image to the colored image with action matching is much easier than moving from Gaussian noise to a colored image with a conditional diffusion that conditions on the grayscale image. In short, compared to diffusion generative models, Action Matching has the following advantages: 1. Action Matching relies only on samples and does not require any knowledge of the underlying stochastic dynamics, which is essential when we only have access to samples. 2. Action Matching is designed to learn only the dynamics, rather than the individual distributions q t , which is useful when a complicated distribution has a simple dynamics. 3. Action Matching's applicability extends beyond that of diffusion models, as it can learn a much richer class of stochastic dynamics (see Theorem 1). Our contribution is two-fold: 1) In Section 2, we discuss a mathematically rigorous problem formulation for learning stochastic dynamics, why this problem is well-defined, and what types of dynamics we aim to learn. 2) In Section 3, we discuss Action Matching as a variational framework for learning these dynamics. Finally, as some of the possible applications of Action Matching, we discuss several

