VARIATIONAL INFERENCE FOR DIFFUSION MODU-LATED COX PROCESSES Anonymous

Abstract

This paper proposes a stochastic variational inference (SVI) method for computing an approximate posterior path measure of a Cox process. These processes are widely used in natural and physical sciences, engineering and operations research, and represent a non-trivial model of a wide array of phenomena. In our work, we model the stochastic intensity as the solution of a diffusion stochastic differential equation (SDE), and our objective is to infer the posterior, or smoothing, measure over the paths given Poisson process realizations. We first derive a system of stochastic partial differential equations (SPDE) for the pathwise smoothing posterior density function, a non-trivial result, since the standard solution of SPDEs typically involves an Itô stochastic integral, which is not defined pathwise. Next, we propose an SVI approach to approximating the solution of the system. We parametrize the class of approximate smoothing posteriors using a neural network, derive a lower bound on the evidence of the observed point process samplepath, and optimize the lower bound using stochastic gradient descent (SGD). We demonstrate the efficacy of our method on both synthetic and real-world problems, and demonstrate the advantage of the neural network solution over standard numerical solvers.

1. INTRODUCTION

Cox processes (Cox, 1955; Cox & Isham, 1980) , also known as doubly-stochastic Poisson processes, are a class of stochastic point processes wherein the point intensity is itself stochastic and, conditional on a realization of the intensity process, the number of points in any subset of space is Poisson distributed. These processes are widely used in the natural and physical sciences, engineering and operations research, and form useful models of a wide array of phenomena. We model the intensity by a diffusion process that is the solution of a stochastic differential equation (SDE). This is a standard assumption across a range of applications (Susemihl et al., 2011; Kutschireiter et al., 2020) . The measure induced by the solution of the SDE serves as a prior measure over sample paths, and our objective is to infer a posterior measure over the paths of the underlying intensity process, given realizations of the Poisson point process observations over a fixed time horizon. This type of inference problem has been studied in the Bayesian filtering literature (Schuppen, 1977; Bain & Crisan, 2008; Särkkä, 2013) , where it is of particular interest to infer the state of the intensity process at any past time given all count observations till the present time instant (the resulting posterior is called the smoothing posterior measure). In a seminal paper, Snyder (1972) derived a stochastic partial differential equation (SPDE) describing the dynamics of the corresponding posterior density for Cox processes. The solution of this smoothing SPDE requires the computation of an Itô stochastic integral with respect to the counting process. It has long been recognized (Clark, 1978; Davis, 1981; 1982) that for stochastic smoothing (and filtering) theory to be useful in practice, it should be possible to compute smoothing posteriors conditioned on a single observed sample path. However, Itô integrals are not defined pathwise and deriving a pathwise smoothing density is remarkably hard. 30 years after Synder's original work Elliott & Malcolm (2005) derived a pathwise smoothing SPDE in the form of a coupled system of forward and backward pathwise SPDEs. Nonetheless, solving the system of pathwise SPDEs, or sampling from the corresponding SDE, is still challenging and intractable in general. It is well known, for example, that numerical techniques for solving these SPDEs, such as the finite element method (FEM), suffers from the curse of dimensionality (Han et al., 2018) . Therefore, it is of considerable interest to find more efficient methods to solve the smoothing SPDE. We take a variational inference approach to computing an approximate smoothing posterior measure. Variational representations of Bayesian posteriors in stochastic filtering and smoothing theory have been developed in considerable generality; see (Mitter & Newton, 2003) for a rigorous treatment. There are a number of papers that consider the computation of an approximate posterior distribution over the paths of the underlying intensity process that is observed with additive Gaussian noise (Archambeau et al., 2007; 2008; Cseke et al., 2013; Susemihl et al., 2011; Sutter et al., 2016) . Susemihl et al. (2011) studied Bayesian filtering Gaussian processes by deriving a differential equation characterizing the evolution of the mean-square error (MSE) in estimating the underlying Gaussian process. On the other hand, Sutter et al. ( 2016) compute a variational approximation to the smoothing posterior density when the underlying diffusion intensity is observed with additive Brownian noise. They choose their variational family to be a class of SDEs with an analytically computable marginal density. This setting is considerably different from our setting, where the observed process is a point process. Nonetheless, Sutter et al. ( 2016) provides methodological motivation for our current study. In the context of the computation of approximate smoothing/filtering posteriors for point process observations, Harel et al. ( 2015) developed an analytically tractable approximation to the filtering posterior distribution of a diffusion modulated marked point processes under specific modeling assumptions suited for a neural encoding/decoding problem. In general, however, analytical tractability cannot be assured without restrictive assumptions. We present a stochastic variational inference (SVI) (Hoffman et al., 2013) method for computing a variational approximation to the smoothing posterior density. Our approach fixes an approximating family of path measures to those induced by a class of parametrized SPDEs. In particular, we parametrize the drift function of the approximating SPDEs by a neural network with input and output variables matching the theoretical smoothing SPDE. Thereafter, using standard stochastic analysis tools we compute a tractable lower bound to the evidence of observing a sample path of count observations, the so-called evidence lower bound (ELBO). A sample average approximation (SAA) to the ELBO is further computed by simulating sample paths from the stochastic differential equation (SDE) corresponding to the approximating SPDE. Finally, by maximizing the ELBO, the neural network is trained using stochastic gradient descent (SGD) utilizing multiple batches of sample paths of count observations. Note that each sample path of the count observations entails the simulation of a separate SDE. We note that there are many problems in the natural and physical sciences, engineering and operations research where multiple paths of a point process (over a finite time horizon) may be obtained. For instance, we present an example in Section 5 modeling the demand for bikes rented during a 24 hour, one day time period in a bike-sharing platform, where the underlying driving intensity is subject to stochastic variations, and demand information is collected over multiple days. In contrast to the variational algorithm developed in Sutter et al. (2016) , where the variational lower bound must be re-optimized for new sample paths of the observation process, our variational method is more general and our approximation to the smoothing posterior can be used as a map for another (unobserved) sample path of count observations. Our computational approach can also be straightforwardly adapted to solve the problem of interest in Sutter et al. ( 2016). In the subsequent sections, we describe our problem and method in detail and demonstrate the utility of our method with the help of numerical experiments. In particular, we show how the choice of approximating family enables us to use the trained neural network and in turn, the variational Bayesian smoothing posterior (VBSP), to compute smoothing SPDE in almost (3/4) th of the computational time required to compute the original smoothing SPDE using FEM. Moreover, we also efficiently generate Monte Carlo samples from the learned VBSP and use them for inference on the bike-sharing dataset, whereas FEM failed to compute either VBSP or the true smoothing density for the given time-space discretization.

2. PROBLEM DESCRIPTION

Let N t be a Cox process with unknown stochastic intensity {z t ∈ R + , t ∈ [0, T ]}. We use N t ,t to denote a sample path realization of N t restricted to the interval [t , t], and use N t to denote N t -N 0 ; recall that N 0 = 0 by definition. As noted before, a Cox process conditioned on the intensity is a

