ADVERSARIAL DATA GENERATION OF MULTI-CATEGORY MARKED TEMPORAL POINT PROCESSES WITH SPARSE, INCOMPLETE, AND SMALL TRAINING SAMPLES

Abstract

Asynchronous stochastic discrete event based processes are commonplace in application domains such as social science, homeland security, and health informatics. Modeling complex interactions of such event data via marked temporal point processes (MTPPs) provides the ability of detection and prediction of specific interests or profiles. We present a novel multi-category MTPP generation technique for applications where training datasets are inherently sparse, incomplete, and small. The proposed adversarial architecture augments adversarial autoencoder (AAE) with feature mapping techniques, which includes a transformation between the categories and timestamps of marked points and the percentile distribution of the particular category. The transformation of training data to the distribution facilitates the accurate capture of underlying process characteristics despite the sparseness and incompleteness of data. The proposed method is validated using several benchmark datasets. The similarity between actual and generated MTPPs is evaluated and compared with a Markov process based baseline. Results demonstrate the effectiveness and robustness of the proposed technique.

1. INTRODUCTION

Marked Temporal Point Processes (MTPPs) are widely used for modeling and analysis of asynchronous stochastic discrete events in continuous time (Upadhyay et al., 2018; Türkmen et al., 2019; Yan, 2019) with applications in numerous domains such as homeland security, cybersecurity, consumer analytics, health care analytics, and social science. An MTPP models stochastic discrete events as marked points (e i ) defined by its time of the occurrence t i and its category c i . Usually, point processes are characterized using the conditional intensity function, λ * (t) = λ(t|H t ) = P[event ∈ [t, t + dt)|H t ], which given the past H t = {e i = (z i , t i )|t i < t} specifies the probability of an event occurring at future time points. There are many popular intensity functional forms. Hawkes process (self-exciting process) (Hawkes, 1971 ) is a point process used in both statistical and machine learning contexts where the intensity is a linear function of past events (H t ) (Türkmen et al., 2019) . In traditional parametric models, the conditional intensity functions are manually pre-specified (Yan, 2019). Recently, various neural network models (generally called neural TPP) have been used to learn arbitrary and unknown distributions while eliminating the manual intensity function selection. Reinforcement learning (Zhu et al., 2019; Li et al., 2018) , recurrent Neural Networks (RNN) (Du et al., 2016) , and generative neural networks (Xiao et al., 2018) are used to approximate the intensity functions and learn complex MTPP distributions using larger datasets. Recent advances in data collection techniques allow collecting complex event data which form heterogeneous MTTPs where a marked point (e ij ) defines a time of occurrence (t i ) and a category (c j ) separately. Therefore, multi-category MTTPs not only concern about the time of occurrence but also the category of the next marked point. The multi-category MTTPs append extra dimensionality to the distribution which complicates the learning using existing technologies. In fact, multi-category MTPPs are greatly helpful to model the behavioral patterns of suspicious or specific individuals and groups in homeland security (Campedelli et al., 2019b; a; Hung et al., 2018; 2019) , potential malicious network activities in cybersecurity (Peng et al., 2017) , recommendation systems in consumer analytics

