TIME-VARYING GRAPH REPRESENTATION LEARNING VIA HIGHER-ORDER SKIP-GRAM WITH NEGATIVE SAMPLING Anonymous

Abstract

Representation learning models for graphs are a successful family of techniques that project nodes into feature spaces that can be exploited by other machine learning algorithms. Since many real-world networks are inherently dynamic, with interactions among nodes changing over time, these techniques can be defined both for static and for time-varying graphs. Here, we show how the skip-gram embedding approach can be used to perform implicit tensor factorization on different tensor representations of time-varying graphs. We show that higher-order skip-gram with negative sampling (HOSGNS) is able to disentangle the role of nodes and time, with a small fraction of the number of parameters needed by other approaches. We empirically evaluate our approach using time-resolved face-to-face proximity data, showing that the learned representations outperform state-of-the-art methods when used to solve downstream tasks such as network reconstruction. Good performance on predicting the outcome of dynamical processes such as disease spreading shows the potential of this new method to estimate contagion risk, providing early risk awareness based on contact tracing data.

1. INTRODUCTION

A great variety of natural and artificial systems can be represented as networks of elementary structural entities coupled by relations between them. The abstraction of such systems as networks helps us understand, predict and optimize their behaviour (Newman, 2003; Albert & Barabási, 2002) . In this sense, node and graph embeddings have been established as standard feature representations in many learning tasks (Cai et al., 2018; Goyal & Ferrara, 2018) . Node embedding methods map nodes into low-dimensional vectors that can be used to solve downstream tasks such as edge prediction, network reconstruction and node classification. Node embeddings have proven successful in achieving low-dimensional encoding of static network structures, but many real-world networks are inherently dynamic (Holme & Saramäki, 2012) . Timeresolved networks are also the support of important dynamical processes, such as epidemic or rumor spreading, cascading failures, consensus formation, etc. (Barrat et al., 2008) . Time-resolved node embeddings have been shown to yield improved performance for predicting the outcome of dynamical processes over networks, such as information diffusion and disease spreading (Sato et al., 2019) , providing estimation of infection and contagion risk when used with contact tracing data. Since we expect having more data on proximity networks being used for contact tracing and as proxies for epidemic risk (Alsdurf et al., 2020) , learning meaningful representations of time-resolved proximity networks can be of extreme importance when facing events such as epidemic outbreaks (Kapoor et al., 2020; Gao et al., 2020) . The manual and automatic collection of time-resolved proximity graphs for contact tracing purposes presents an opportunity for quick identification of possible infection clusters and infection chains. Even before the COVID-19 pandemic, the use of wearable proximity sensors for collecting time-resolved proximity networks has been largely discussed in the literature and many approaches have been used to describe patterns of activity and community structure, and to study spreading patterns of infectious diseases (Sapienza et al., 2015; Gauvin et al., 2014; Génois et al., 2015) . Here we propose a representation learning model that performs implicit tensor factorization on different higher-order representations of time-varying graphs. The main contributions are as follows: Given that the skip-gram embedding approach implicitly performs a factorization of the shifted pointwise mutual information matrix (PMI) (Levy & Goldberg, 2014), we generalize it to perform implicit factorization of a shifted PMI tensor. We then define the steps to achieve this factorization using higher-order skip-gram with negative sampling (HOSGNS) optimization. We show how to apply 3rd-order and 4th-order SGNS on different higher-order representations of time-varying graphs. Finally, we show that time-varying graph representations learned via HOSGNS outperform stateof-the-art methods when used to solve downstream tasks, even using a fraction of the number of embedding parameters. We report the results of learning embeddings on empirical time-resolved face-to-face proximity data and using such representations as predictors for solving two different tasks: network reconstruction and predicting the outcomes of a SIR spreading process over the time-varying graph. We compare these results with state-of-the art methods for time-varying graph representation learning.

2. PRELIMINARIES AND RELATED WORK

Skip-gram representation learning. The skip-gram model was designed to compute word embeddings in WORD2VEC (Mikolov et al., 2013) , and afterwards extended to graph node embeddings (Perozzi et al., 2014; Tang et al., 2015; Grover & Leskovec, 2016) . Levy & Goldberg (2014) established the relation between skip-gram trained with negative sampling (SGNS) and traditional low-rank approximation methods (Kolda & Bader, 2009; Anandkumar et al., 2014) , showing the equivalence of SGNS optimization to factorizing a shifted PMI matrix (Church & Hanks, 1990) . This equivalence was later retrieved from diverse assumptions (Assylbekov & Takhanov, 2019; Allen et al., 2019; Melamud & Goldberger, 2017; Arora et al., 2016; Li et al., 2015) , and exploited to compute closed form expressions approximated in different graph embedding models (Qiu et al., 2018) . In this work, we refer to the shifted PMI matrix also as SPMI κ = PMIlog κ, where κ is the number of negative samples. Random walk based graph embeddings. Given an undirected, weighted and connected graph G = (V, E) with nodes i, j ∈ V, edges (i, j) ∈ E and adjacency matrix A, graph embedding methods are unsupervised models designed to map nodes into dense d-dimensional representations (d |V|) (Hamilton et al., 2017) . A well known family of approaches based on the skip-gram model consists in sampling random walks from the graph and processing node sequences as textual sentences. In DEEPWALK (Perozzi et al., 2014) and NODE2VEC (Grover & Leskovec, 2016) , the skip-gram model is used to obtain node embeddings from co-occurrences in random walk realizations. Although the original implementation of DEEPWALK uses hierarchical softmax to compute embeddings, we will refer to the SGNS formulation given by Qiu et al. (2018) . Since SGNS can be interpreted as a factorization of the word-context PMI matrix (Levy & Goldberg, 2014) , the asymptotic form of the PMI matrix implicitly decomposed in DEEPWALK can be derived (Qiu et al., 2018) . Given the 1-step transition matrix P = D -1 A, where D = diag(d 1 , . . . , d |V| ) and d i = j∈V A ij is the (weighted) node degree, the expected PMI for a node-context pair (i, j) occurring in a T -sized window is: E[ PMI DEEPWALK (i, j) | T ] = log 1 2T T r=1 [p * (i)(P r ) ij + p * (j)(P r ) ji ] p * (i) p * (j) (2.1) where p * (i) = di vol(G) is the unique stationary distribution for random walks (Masuda et al., 2017) and vol(G) = i,j∈V A ij . We will use this expression in Section 3.2 to build PMI tensors from higher-order graph representations. Time-varying graphs and their algebraic representations. Time-varying graphs (Holme & Saramäki, 2012) are defined as triples H = (V, E, T ) , i.e. collections of events (i, j, k) ∈ E, representing undirected pairwise relations among nodes at discrete times (i, j ∈ V, k ∈ T ). H can be seen as a temporal sequence of static graphs {G (k) } k∈T , each of those with adjacency matrix A (k) such that A (k) ij = ω(i, j, k) ∈ R is the weight of the event (i, j, k) ∈ E. We can concatenate the list of time-stamped snapshots [A (1) , . . . , A (|T |) ] to obtain a single 3rd-order tensor

availability

The source code and data are publicly available at [link to anonymized

