ADDITIVE POISSON PROCESS: LEARNING INTENSITY OF HIGHER-ORDER INTERACTION IN POINT PROCESSES Anonymous

Abstract

We present the Additive Poisson Process (APP), a novel framework that can model the higher-order interaction effects of the intensity functions in point processes using lower dimensional projections. Our model combines the techniques in information geometry to model higher-order interactions on a statistical manifold and in generalized additive models to use lower-dimensional projections to overcome the effects from the curse of dimensionality. Our approach solves a convex optimization problem by minimizing the KL divergence from a sample distribution in lower dimensional projections to the distribution modeled by an intensity function in the point process. Our empirical results show that our model is able to use samples observed in the lower dimensional space to estimate the higher-order intensity function with extremely sparse observations.

1. INTRODUCTION

Consider two point processes which are correlated with arrival times for an event. For a given time interval, what is the probability of observing an event from both processes? Can we learn the joint intensity function by just using the observations from each individual processes? Our proposed model, the Additive Poisson Process (APP), provides a novel solution to this problem. The Poisson process is a counting process used in a wide range of disciplines such as time-space sequence data including transportation (Zhou et al., 2018) , finance (Ilalan, 2016 ), ecology (Thompson, 1955) , and violent crime (Taddy, 2010) to model the arrival times for a single system by learning an intensity function. For a given time interval of the intensity function, it represents the probability of a point being excited at a given time. Despite the recent advances of modeling of the Poisson processes and its wide applicability, majority of the point processes model do not consider the correlation between two or more point processes. Our proposed approach learns the joint intensity function of the point process which is defined to be the simultaneous occurrence of two events. For example in a spatial-temporal problem, we want to learn the intensity function for a taxi to pick up customers at a given time and location. For this problem, each point is multi-dimensional, that is (x, y, t) N i=1 , where a pair of x and y represents two spatial dimensions and t represents the time dimension. For any given location or time, we can only expect very few pick-up events occurring, therefore making it difficult for any model to learn the low valued intensity function. Previous approaches such as Kernel density estimation (KDE) (Rosenblatt, 1956) are able to learn the joint intensity function. However, KDE suffers from the curse of dimensionality, which means that KDE requires a large size sample or a high intensity function to build an accurate model. In addition, the complexity of the model expands exponentially with respect to the number of dimensions, which makes it infeasible to compute. Bayesian approaches such as using a mixture of beta distributions with a Dirichlet prior (Kottas, 2006) and Reproducing Kernel Hilbert Space (RKHS) (Flaxman et al., 2017) have been proposed to quantify the uncertainty with a prior for the intensity function. However, these approaches are often non-convex, making it difficult to obtain the global optimal solution. In addition, if observations are sparse, it is hard for these approaches to learn a reasonable intensity function.

