Learning Stochastic Behaviour from Aggregate Data

Abstract

Learning nonlinear dynamics from aggregate data is a challenging problem since the full trajectory of each individual is not available, namely, the individual observed at one time point may not be observed at next time point, or the identity of individual is unavailable. This is in sharp contrast to learning dynamics with trajectory data, on which the majority of existing methods are based. We propose a novel method using the weak form of Fokker Planck Equation (FPE) to describe density evolution of data in a sampling form, which is then combined with Wasserstein generative adversarial network (WGAN) in training process. In such a sample-based framework we are able to study nonlinear dynamics from aggregate data without solving the partial differential equation (PDE). The model can also handle high dimensional cases with the help of deep neural networks. We demonstrate our approach in the context of a series of synthetic and real-world data sets.

1. Introduction

In the context of a dynamic system, Aggregate data refers to the data sets that full trajectory of each individual is not available, meaning that there is no known individual level correspondence. Typical examples include data sets collected for DNA evolution, social gathering, density in control problems, bird migration during which it is impossible to identify individual bird, and many more. In those applications, some observed individuals at one time point may be unobserved at the next time spot, or when the individual identities are blocked or unavailable due to various technical and ethical reasons. Rather than inferring the exact information for each individual, the main objective of learning dynamics in aggregate data is to recover and predict the evolution of distribution of all individuals together. Trajectory data, in contrast, is a kind of data that we are able to acquire the information of each individual all the time, although some studies considered the case that some individual trajectories are partially missing. However, the identities of those individuals, whenever they are observable, is always assumed available. For example, stock price, weather, customer behaviors and most training data sets for computer vision and natural language processing. There are many popular models to learn dynamics of full-trajectory data. Typical ones include Hidden Markov Model (HMM) (Alshamaa et al., 2019; Eddy, 1996) , Kalman Filter (KF) (Farahi & Yazdi, 2020; Harvey, 1990; Kalman, 1960) and Particle Filter (PF) (Santos et al., 2019; Djuric et al., 2003) , as well as the models built upon HMM, KF and PF(Deriche et al., 2020; Fang et al., 2019; Hefny et al., 2015; Langford et al., 2009) , they all require full trajectories of each individual, which may not be applicable in the aggregate data situations. On the other side, only a few methods are focused on aggregated data in the recent learning literature. In the work of Hashimoto et al. (2016) , authors assumed that the hidden dynamic of particles follows a stochastic differential equation(SDE), in particular, they use a recurrent neural network to parameterize the drift term. Furthermore, Wang et al. ( 2018) improved traditional HMM model by using an SDE to describe the evolving process of hidden states. To the best of our knowledge, there is no method directly learning the evolution of the density of objects from aggregate data yet. We propose to learn the dynamics of density through the weak form of Fokker Planck Equation (FPE), which is a parabolic partial differential equation (PDE) governing many dynamical systems subject to random noise perturbations, including the typical SDE models in existing studies. Our learning is accomplished by minimizing the Wasserstein distance between predicted distribution given by FPE and the empirical distribution from data samples. Meanwhile we utilize neural networks to handle higher dimensional cases. More importantly, by leveraging the framework of Wasserstein Generative Adversarial Network (WGAN) (Arjovsky et al., 2017) , our model is capable of approximating the distribution of samples at different time points without solving the SDE or FPE. More specifically, we treat the drift coefficient, the goal of learning, in the FPE as a generator, and the test function in the weak form of FPE as a discriminator. In other words, our method can also be regarded as a data-driven method to estimate transport coefficient in FPE, which corresponds to the drift terms in SDEs. Additionally, though we treat diffusion term as a constant in our model, it is straightforward to generalize it to be a neural network as well, which can be an extension of this work. We would like to mention that several methods of solving SDE and FPE (Weinan et al., 2017; Beck et al., 2018; Li et al., 2019) adopt opposite ways to our method, they utilize neural networks to estimate the distribution P(x, t) with given drift and diffusion terms. In conclusion, our contributions are: • We design an algorithm that is able to recover the density evolution of nonlinear dynamics via minimizing the Wasserstein discrepancy between real aggregate data and our generated data. • By leveraging the weak form of FPE, we are able to compute the Wasserstein distance directly without solving the FPE. • Finally, we demonstrate the accuracy and the effectiveness of our algorithm by several synthetic and real-world examples. 2 Proposed Method 2.1 Fokker Planck Equation for the density evolution We assume the individuals evolve in a pattern in the space R D as shown in Figure 1 . One example satisfying such process is the stochastic differential equation(SDE), which is also known as the Itô process (Øksendal, 2003) : dX t = g(X t , t)dt + σdW t . Here dX t represents an infinitesimal change of {X t } along with time increment dt, g(•, t) = (g 1 (•, t), ..., g D (•, t)) T is the drift term (drifting vector field) that drives the dynamics of the SDE, σ is the diffusion constant, {W t } is the standard Brownian Motion. Then the probability density of {X t } is governed by the Fokker Planck Equation(FPE) (Risken & Caugheyz, 1991) , as stated below in Lemma 1. Lemma 1. Suppose {X t } solves the SDE dX t = g(X t , t)dt + σdW t , denote p(•, t) as the probability density of the random variable X t . Then p(x, t) solves the following equation: ∂p(x, t) ∂t = D i=1 - ∂ ∂x i g i (x, t)p(x, t) + 1 2 σ 2 D i=1 ∂ 2 ∂x i 2 p(x, t) As a linear evolution PDE, FPE describes the evolution of density function of the stochastic process driven by a SDE. Due to this reason, FPE plays a crucial role in stochastic calculus, statistical physics and modeling (Nelson, 1985; Qi & Majda, 2016; Risken, 1989) . Its importance is also drawing more attention among statistic and machine learning communities (Liu & Wang, 2016; Pavon et al., 2018; Rezende & Mohamed, 2015) . In this paper, we utilize the weak form of FPE as a basis to study hidden dynamics of the time evolving aggregated data without solving FPE. Our task can be described as: assume that the individuals evolve with the process indicated by Figure 1 , which can be simulated by Itô process. Then given observations x t along time axis, we aim to recover the drift coefficient g(x, t) in FPE, and thus we are able to recover and predict the density evolution of such dynamic. For simplicity we treat g(x, t) as a function uncorrelated to time t, namely,



Figure 1: State model of the stochastic process X t

