DYNAMIC EMBEDDINGS OF TEMPORAL HIGH-ORDER INTERACTIONS VIA NEURAL DIFFUSION-REACTION PROCESSES Anonymous

Abstract

High-order interactions of multiple entities are ubiquitous in practical applications. The associated data often includes the participants, interaction results, and the timestamps when each interaction occurred. While tensor factorization is a popular tool to analyze such data, it often ignores or underuses the valuable timestamp information. More important, standard tensor factorization only estimates a static representation for each entity, and ignores the temporal variation of the representations. However, such variations might reflect important evolution patterns of the underlying properties of the entities. To address these limitations, we propose Dynamical eMbedIngs of TempoRal hIgh-order interactions (DMITRI). We develop a neural diffusion-reaction process model to estimate the dynamic embeddings for the participant entities. Specifically, based on the observed interactions, we build a multi-partite graph to encode the correlation between the entities. We construct a graph diffusion process to co-evolve the embedding trajectories of the correlated entities, and use a neural network to construct a reaction process for each individual entity. In this way, our model is able to capture both the commonalities and personalities during the evolution of the embeddings for different entities. We then use a neural network to model the interaction result as a nonlinear function of the embedding trajectories. For model estimation, we combine ODE solvers to develop a stochastic mini-batch learning algorithm. We propose a simple stratified sampling method to balance the cost of processing each mini-batch so as to improve the overall efficiency. We show the advantage of our approach in both the ablation study and real-world applications.

1. Introduction

Many real-world applications are about interactions of multiple entities. For example, online shopping and promotion activities are interactions among customers, commodities and online merchants. A commonly used tool to analyze these high-order interactions is tensor factorization, which places the participant entities/objects in different tensor modes (or dimensions), and considers the interaction results as values of the observed tensor entries. Tensor factorization estimates an embedding representation for each entity, with which to reconstruct the observed entries. The learned embeddings can reflect the underlying structures within the entities, such as communities and outliers, and can be used as effective features for predictive tasks, such as recommendation and ads auction. Practical data often includes the timestamps when each multiway interaction occurred. These timestamps imply rich, complex temporal variation patterns. Despite the popularity of tensor factorization, current methods often ignore the timestamps, or simply bin them into crude time steps (e.g., by weeks or months) and jointly estimate embeddings for the time steps (Xiong et al., 2010; Rogers et al., 2013; Zhe et al., 2016a; 2015; Du et al., 2018) . Therefore, the current methods might severely under-use the valuable temporal information in data. More important, standard tensor factorization always estimates a static embedding for each entity. However, as the representation of entities, these embeddings summarize the underlying properties of the entities, and can naturally evolve along with time, such as customer interests and preferences, user income and health, product popularity, and fashion. Learning static embeddings can miss capturing these interesting, important temporal knowledge. To address these issues, we propose DMITRI, a dynamic embedding approach for temporal high-order interactions. We construct a nonlinear diffusion-reaction process in an Ordinary Differential Equation (ODE) framework to jointly estimate embedding trajectories for the participant entities. The ODE framework is known to be flexible and convenient to handle irregularly sampled timestamps and sparsely observed data (Rubanova et al., 2019) , which is often the case in practice. In addition, since ODE models focus on learning the dynamics (i.e., time derivatives) of the target function, they have promising potential for providing robust, accurate long-term predictions (via integration with the dynamics). Specifically, to leverage the structural knowledge within the data, we first build a multi-partite graph based on the observed interactions. The graph encodes the correlations between different types of entities in terms of their interaction history. We then construct a graph diffusion process in the ODE to co-evolve the embedding trajectories of correlated entities. Next, we use a neural network to construct a reaction process to model the individual-specific evolution for each entity. In this way, our neural diffusion-reaction process captures both the commonalities and personalities of the entities in learning their dynamic embeddings. Given the embedding trajectories, we model the interaction result as a latent function of the participants' trajectories. We use another neural network to flexibly estimate the function and to capture the complex relationships of the participant entities. For efficient training, we base on ODE solvers to develop a stochastic mini-batch learning algorithm. We develop a simple stratified sampling scheme, which can balance the cost of executing the ODE solvers in each mini-batch so as to improve the efficiency. We evaluated our method in both simulation and real-world applications. The simulation experiments show that DMITRI can successfully capture the underlying dynamics of the entities from their temporal interactions, and recover the hidden clustering structures within the trajectories. Then in three real-world applications, we tested the accuracy in predicting the interaction results at different time points. DMITRI consistently outperforms the state-of-the-art tensor factorization methods that incorporate temporal information, often by a large margin. We also demonstrated that both the diffusion and reaction processes contribute to the learning and predictive performance. Finally, we investigated the learned embedding trajectories and found interesting evolution paths.

2. Notations and Background

Suppose we have collected data of interactions results between K types of entities (e.g., customers, commodities and merchants). Each type k includes d k entities, and we index these entities by 1, . . . , d k . We then index each interaction by a tuple = (l 1 , . . . , l K ) where for each k, we have 1 ≤ l k ≤ d k . Suppose we observed N interactions, their results and timestamps. The dataset is denoted by D = {( 1 , t 1 , y 1 ), . . . , ( N , t N , y N )} where {t n } and {y n } are the timestamps and interaction results. Our goal is for each entity j of each type k, to estimate a dynamic embedding u k j (t) : R + → R R . That is, the embedding is a time function (trajectory) of R-dimensional outputs. High-order interaction data can be organized as multidimensional arrays or tensors. For example, we can create a K-mode tensor, and place the entities of type k in mode k. Each interaction is considered as an entry of the tensor, and the interaction result as the entry value. Hence, tensor factorization is a popular approach to process and analyze high-order interaction data. Standard tensor factorization introduces a static embedding representation for each entity, namely, u k j is considered as time invariant. Tensor factorization aims to estimate the embeddings (or factors) to reconstruct the tensor. For example, the classical Tucker decomposition (Tucker, 1966) employs a multilinear factorization model, M = W × 1 U 1 × 2 . . . × K U K , where M ∈ R d1×...×d k is the entire tensor, W ∈ R R1×•••×R K is the tensor-core parameter, U k comprises all the embeddings of the entities in mode k, and × k is the tensor-matrix multiplication at mode k (Kolda, 2006) . The popular CANDECOMP/PARAFAC (CP) decomposition (Harshman, 1970) can be viewed as a simplified version of Tucker decomposition, where we set R 1 = . . . = R K = R and the tensor-core W to be diagonal. Hence, each entry value is factorized as m = (u 1 l1 •. . .•u K l K ) λ, where • is the Hadamard (element-wise) product, and λ corresponds to diag(W). While CP and Tucker decomposition are popular and elegant, their multilinear modeling can be oversimplistic for complex applications. To estimate nonlinear relationships of the entities, Xu et al. (2012); Zhe et al. (2015; 2016a) used a Gaussian process (GP) (Rasmussen and Williams, 2006) to model the entry value as a random function of the embeddings, m = g(u 1 l1 , . . . , u K l K ), where g ∼ GP (0, κ(x , x )), x = {u 1 l1 , . . . , u K l K } and x = {u 1 l 1 , . . . , u K l K } are the embeddings of the entities in entry and , respectively, and κ(•, •) is the covariance (kernel) function. Given the GP prior, any finite set of N entry values follow a multi-variate Gaussian distribution, m ∼ N (0, K), where m = [m 1 , . . . , m N ], K is the N × N kernel matrix, and each [K] i,j = κ(x , x ). Suppose we have collected continuous observations

