NEURAL SPATIO-TEMPORAL POINT PROCESSES

Abstract

We propose a new class of parameterizations for spatio-temporal point processes which leverage Neural ODEs as a computational method and enable flexible, highfidelity models of discrete events that are localized in continuous time and space. Central to our approach is a combination of continuous-time neural networks with two novel neural architectures, i.e., Jump and Attentive Continuous-time Normalizing Flows. This approach allows us to learn complex distributions for both the spatial and temporal domain and to condition non-trivially on the observed event history. We validate our models on data sets from a wide variety of contexts such as seismology, epidemiology, urban mobility, and neuroscience.

1. INTRODUCTION

Modeling discrete events that are localized in continuous time and space is an important task in many scientific fields and applications. Spatio-temporal point processes (STPPs) are a versatile and principled framework for modeling such event data and have, consequently, found many applications in a diverse range of fields. This includes, for instance, modeling earthquakes and aftershocks (Ogata, 1988; 1998) , the occurrence and propagation of wildfires (Hering et al., 2009) , epidemics and infectious diseases (Meyer et al., 2012; Schoenberg et al., 2019) , urban mobility (Du et al., 2016) , the spread of invasive species (Balderama et al., 2012) , and brain activity (Tagliazucchi et al., 2012) . It is of great interest in all of these areas to learn high-fidelity models which can jointly capture spatial and temporal dependencies and their propagation effects. However, existing parameterizations of STPPs are strongly restricted in this regard due to computational considerations: In its general form, STPPs require solving multivariate integrals for computing likelihood values and thus have primarily been studied within the context of different approximations and model restrictions. This includes, for instance, restricting the model class to parameterizations with known closed-form solutions (e.g., exponential Hawkes processes (Ozaki, 1979) ), to restrict dependencies between the spatial and temporal domain (e.g., independent and unpredictable marks (Daley & Vere-Jones, 2003)), or to discretize continuous time and space (Ogata, 1998) . These restrictions and approximations-which can lead to mis-specified models and loss of information-motivated the development of neural temporal point processes such as Neural Hawkes Processes (Mei & Eisner, 2017) and Neural Jump SDEs (Jia & Benson, 2019) . While these methods are more flexible, they can still require approximations such as Monte-Carlo sampling of the likelihood (Mei & Eisner, 2017; Nickel & Le, 2020) and, most importantly, only model restricted spatial distributions (Jia & Benson, 2019) . x 0 5 t * (t) To overcome these issues, we propose a new class of parameterizations for spatio-temporal point processes which leverage Neural ODEs as a computational method and allows us to define flexible, high-fidelity models for spatio-temporal event data. We build upon ideas of Neural Jump SDEs (Jia & Benson, 2019) and Continuous-time Normalizing Flows (CNFs; Chen et al. 2018; Grathwohl et al. 2019; Mathieu & Nickel 2020) to learn parametric models of spatial (or markfoot_0 ) distributions that are defined continuously in time. Normalizing flows are known to be flexible universal density estimators (e.g. Huang et al. 2018; 2020; Teshima et al. 2020; Kong & Chaudhuri 2020) while retaining computational tractability. As such, our approach allows the computation of exact likelihood values even for highly complex spatio-temporal distributions, and our models create smoothly changing spatial distributions that naturally benefits spatio-temporal modeling. Central to our approach, are two novel neural architectures based on CNFs-using either discontinuous jumps in distribution or self-attention-to condition spatial distributions on the event history. To the best of our knowledge, this is the first method that combines the flexibility of neural TPPs with the ability to learn high-fidelity models of continuous marks that can have complex dependencies on the event history. In addition to our modeling contributions, we also construct five new pre-processed data sets for benchmarking spatio-temporal event models.

2. BACKGROUND

In the following, we give a brief overview of two core frameworks which our method builds upon, i.e., spatio-temporal point processes and continuous-time normalizing flows. Event Modeling with Point Processes Spatio-temporal point processes are concerned with modeling sequences of random events in continuous space and time (Moller & Waagepetersen, 2003; Baddeley et al., 2007) . Let H = {(t i , x i )} n i=1 denote the sequence of event times t i ∈ R and their associated locations x i ∈ R d , the number of events n being also random. Additionally, let H t = {(t i , x i ) | t i < t, t i ∈ H} denote the history of events predating time t. A spatio-temporal point process is then fully characterized by its conditional intensity function λ(t, x | H t ) lim ∆t↓0,∆x↓0 P (t i ∈ [t, t + ∆t], x i ∈ B(x, ∆x) | H t ) |B(x, ∆x)|∆t . where B(x, ∆x) denotes a ball centered at x ∈ R d and with radius ∆x. The only condition is that λ(t, x | H t ) ≥ 0 and need not be normalized. Given i -1 previous events, the conditional intensity function describes therefore the instantaneous probability of the i-th event occurring at t and location x. In the following, we will use the common star superscript shorthand λ * (t, x) = λ(t, x | H t ) to denote conditional dependence on the history. The joint log-likelihood of observing H within a time interval of [0, T ] is then given by (Daley & Vere-Jones, 2003, Proposition 7.3.III) log p (H) = n i=1 log λ * (t i , x i ) - T 0 R d λ * (τ, x) dxdτ. Training general STPPs with maximum likelihood is difficult as eq. ( 2) requires solving a multivariate integral. This need to compute integrals has driven research to focus around the use of kernel density estimators (KDE) with exponential kernels that have known anti-derivatives (Reinhart et al., 2018) . Continuous-time Normalizing Flows Normalizing flows (Dinh et al., 2014; 2016; Rezende & Mohamed, 2015) is a class of density models that describe flexible distributions by parameterizing an invertible transformation from a simpler base distribution, which enables exact computation of the probability of the transformed distribution, without any unknown normalization constants. Given a random variable x 0 with known distribution p(x 0 ) and an invertible transformation F (x), the transformed variable F (x 0 ) is a random variable with a probability distribution function that satisfies log p(F (x 0 )) = log p(x 0 ) -log det ∂F ∂x (x 0 ) . (3) There have been many advances in parameterizing F with flexible neural networks that also allow for cheap evaluations of eq. ( 3). We focus our attention on Continuous-time Normalizing Flows (CNFs), which parameterizes this transformation with a Neural ODE (Chen et al., 2018) . CNFs



We regard any marked temporal point process with continuous marks as a spatio-temporal point process.



Figure 1: Color is used to denote p(x|t), which can be evaluated for Neural STPPs. After observing an event in one mode, the model is instantaneously updated as it strongly expects an event in the next mode. After a period of no observations, the model smoothly reverts back to the marginal distribution.

