FADIN: FAST DISCRETIZED INFERENCE FOR HAWKES PROCESSES WITH GENERAL PARAMETRIC KERNELS

Abstract

Temporal point processes (TPP) are a natural tool for modeling event-based data. Among all TPP models, Hawkes processes have proven to be the most widely used, mainly due to their adequate modeling for various applications, in particular when considering exponential or non-parametric kernels. Although nonparametric kernels are an option, such models require large datasets. While exponential kernels are more data efficient and relevant for certain applications where events immediately trigger more events, they are ill-suited for applications where latencies need to be estimated, such as in neuroscience. This work aims to offer an efficient solution to TPP inference using general parametric kernels with finite support. The developed solution consists of a fast L2 gradient-based solver leveraging a discretized version of the events. After supporting the use of discretization theoretically, the statistical and computational efficiency of the novel approach is demonstrated through various numerical experiments. Finally, the effectiveness of the method is evaluated by modeling the occurrence of stimuli-induced patterns from brain signals recorded with magnetoencephalography (MEG). Given the use of general parametric kernels, results show that the proposed approach leads to a more plausible estimation of pattern latency compared to the state-of-the-art.

1. INTRODUCTION

The statistical framework of Temporal Point Processes (TPPs; see e.g., Daley & Vere-Jones 2003) is well adapted for modeling event-based data. It offers a principled way to predict the rate of events as a function of time and the previous events' history. TPPs are historically used to model intervals between events, such as in renewal theory, which studies the sequence of intervals between successive replacements of a component susceptible to failure. TPPs find many applications in neuroscience, in particular, to model single-cell recordings and neural spike trains (Truccolo et al., 2005; Okatan et al., 2005; Kim et al., 2011; Rad & Paninski, 2011) , occasionally associated with spatial statistics (Pillow et al., 2008) or network models (Galves & Löcherbach, 2015) . In the machine learning community, there is a growing interest in these statistical tools (Bompaire, 2019; Shchur et al., 2020; Mei et al., 2020) . Multivariate Hawkes processes (MHP; Hawkes 1971) are likely the most popular, as they can model interactions between each univariate process. They also have the peculiarity that a process can be self-exciting, meaning that a past event will increase the probability of having another event in the future on the same process. The conditional intensity function is the key quantity for TPPs. With MHP, it is composed of a baseline parameter and kernels. It describes the probability of occurrence of an event depending on time. The kernel function represents how processes influence each other or themselves. The most commonly used inference method to obtain the baseline and the kernel parameters of MHP is the maximum likelihood (MLE; see e.g., Daley & Vere-Jones, 2007 or Lewis & Mohler, 2011) . One alternative and often overlooked estimation criterion is the least squares `2 error, inspired by the theory of empirical risk minimization (Reynaud-Bouret & Rivoirard, 2010; Hansen et al., 2015; Bacry et al., 2020) . A key feature of MHP modeling is the choice of kernels. Non-parametric and parametric kernels are the two possibilities. In the non-parametric setting, kernel functions are approximated by histograms (Lewis & Mohler, 2011; Lemonnier & Vayatis, 2014) , by a linear combination of pre-defined functions (Zhou et al., 2013a; Xu et al., 2016) , by functions lying in a RKHS (Yang et al., 2017) or, alternatively, by neural networks (Mei & Eisner, 2017; Shchur et al., 2019; Pan et al., 2021) . In addition to the frequentist approach, many Bayesian approaches, such as Gibbs sampling (Ishwaran & James,

