GRAPH-INFORMED NEURAL POINT PROCESSES WITH MONOTONIC NETS

Abstract

Multi-class event data is ubiquitous in real-world applications. The recent neural temporal point processes (Omi et al., 2019) have used monotonic nets to model the cumulative conditional intensity to avoid an intractable integration in the likelihood. While successful, they are restricted to single-type events and can easily sink to poor learning results. To address these limitations and to exploit valuable structural information within event participants, we develop a Graph-Informed Neural Point Process (GINPP) that can freely handle multiple event types, greatly improve learning efficiency, and effectively integrate the graph information to facilitate training. First, we find the bottleneck of the previous model arises from the standard softplus transformation over the output of the monotonic net, which enlarges the prediction variations of the monotonic net and increases the training challenge. We propose a shift-scale variant that can significantly reduce the variation and promote the learning efficiency. Second, we use a conditional mark distribution to model multiple event types, without the need for explicitly estimating the intensity for each type. The latter can be much more challenging. Third, we use random walk to collect the neighborhood of each event participant, and use an attention mechanism to update the hidden state of each participant according to the observed events of both the participant itself and its neighborhood. In this way, we can effectively leverage the graph knowledge, and scale up to large graphs. We have shown the advantage of our approach in both ablation studies and real-world applications.

1. Introduction

Real-world applications often involve multi-class events. For example, 911 calls seek for a variety of helps, traffic records include different types of accidents, and among social network users are various types of interactions (tweeting, following, poking, etc.) . Neural temporal point processes (e.g., (Du et al., 2016; Mei and Eisner, 2017; Zhang et al., 2020a; Zuo et al., 2020) ) are a family of powerful methods for event modeling and prediction, which use neural networks (NN) to model the intensity of events and can flexibly estimate the complex dependencies among the observed events. However, due to the use of NNs, the cumulative (i.e., integral of) conditional intensity in the point process likelihood is often analytically intractable, and demand a complex, expensive approximation. To bypass this issue, the recent work Omi et al. ( 2019) uses a monotonic net (Sill, 1997; Chilinski and Silva, 2020) to model the monotonically increasing cumulative intensity to avoid the integration, and the intensity is obtained by simply taking the derivative. To ensure the positiveness, a softplus transformation is applied to the output of the monotonic net. Despite the elegance and success, this method only supports single-type events. More important, it often suffers from inefficient learning and easily falls into poor performance. In this paper, we propose GINPP, a graph-informed neural point process model to overcome these problems, and to further utilize the valuable structural knowledge within the event participants, which is often available in practice. The major contributions of our work are listed as follows. • First, we investigate the learning challenge of (Omi et al., 2019) , and find the bottleneck arises from the softplus transformation over the monotonic net prediction to ensure positiveness. To obtain an output slightly above zero, the standard softplus demands the input, i.e., the monotonic net prediction, must be negative and have much greater scales. Hence, a small output range can cause a much wider input (monotonic net prediction) range, which are biased toward the negative domain. The large variation of the prediction scale makes the estimation of the monotonic net much more difficult and inefficient. • Second, we propose a shift-scale variant of the softplus function, where the scale controls the shape and the shift controls the position. By setting these two hyperparameters properly, the required input range can be greatly shrunk, and get close to the output range. Accordingly, the variations of the prediction scales is significantly reduced, and the learning of the monotonic net is much easier and more efficient. • Third, we construct a marked point process for multi-class events. By introducing a conditional mark distribution, we can freely handle different event types and only need a single-output monotonic net, which models the unified cumulative conditional intensity. This is more efficient and convenient than a naive extension that separately estimates the cumulative intensity for each particular event type. • Fourth, to incorporate the graph structure in training, we use random walk to collect the neighborhood for each participant. We use an attention mechanism to update the hidden state of each participant, according to the observed events of not only the participant itself, but also its neighborhood. In this way, the estimation of hidden state can be improved with enriched observations, and the event dependencies can be more comprehensively captured. The random walk further enables us to scale to large graphs. Accordingly, we develop an efficient, scalable stochastic mini-batch learning algorithm. For evaluation, we first examined GINPP in ablation studies. We tested the performance of the monotonic net with our shift-scale softplus transformation in learning two benchmark functions: one is monotonic and the other is not. Our method converges fast, accurately learns the first function, and finds a close monotonic approximation to the second function. By contrast, with the standard softplus, the learning is saturated early at large loss values and the estimation is much worse. Then, we tested on a synthetic bi-type event dataset. GINPP accurately recovered the intensity for each event type via the learned overall intensity and the mark distribution. Next, we evaluated GINPP in six real-world benchmark datasets. We examined the accuracy in predicting the time and type of future events. In both tasks, GINPP consistently outperforms all the competing methods. Even without incorporating the graphs, GINPP still achieves better accuracy. When the graph structure is available, GINPP improves the accuracy further.

2. Background

Temporal Point Process (TPP) is a general mathematical framework for event modeling (Daley and Vere-Jones, 2007). A TPP is specified by the conditional intensity (or rate) of the events. Suppose we have K types of events, and denote by λ k (t) the conditional intensity for event type k. Given a sequence of observed events and their types, Γ = [(t 1 , s 1 ), . . . , (t N , s N )], where t n is the timestamp and s n is the type of each event n (1 ≤ s n ≤ K, t n ≤ t n+1 ). The likelihood of the TPP is given by p(Γ) = K k=1 exp - T 0 λ k (t)dt • N n=1 λ sn (t n ), where T is the entire span of the observed events. One popular TPP is the homogeneous Poisson process, which assumes each conditional intensity λ k (t) is a time-invariant constant λ 0 k , and has nothing to do with previous events {(t n , s n )|t n < t}. While simple and convenient, Poisson processes ignore the complex relationships among the events. The Hawkes process (Hawkes, 1971 ) is more expressive in that it models the excitation effect among the events, λ k (t) = λ 0 k + tn<t ρ sn→k (t -t n ) where λ 0 k ≥ 0 is the background rate, ρ sn→k (∆) > 0 is the triggering kernel, and quantifies how much contribution the past event at t n , of type s n , makes to trigger a new event of type k to occur at t. The most commonly used triggering kernel is an exponential kernel, which assumes an exponential decay of the excitation effect along with the time lag ∆. Neural Temporal Point Process. Hawkes processes only account for additive, excitation effects, and are inadequate to capture various complex event dependencies. To overcome this limitation, recent works (Du et al., 2016; Mei and Eisner, 2017) use neural networks to model the conditional intensity. Typically, a recurrent neural network (RNN) is used to capture the complex event dependencies. For

