TIME-VARYING GRAPH REPRESENTATION LEARNING VIA HIGHER-ORDER SKIP-GRAM WITH NEGATIVE SAMPLING Anonymous

Abstract

Representation learning models for graphs are a successful family of techniques that project nodes into feature spaces that can be exploited by other machine learning algorithms. Since many real-world networks are inherently dynamic, with interactions among nodes changing over time, these techniques can be defined both for static and for time-varying graphs. Here, we show how the skip-gram embedding approach can be used to perform implicit tensor factorization on different tensor representations of time-varying graphs. We show that higher-order skip-gram with negative sampling (HOSGNS) is able to disentangle the role of nodes and time, with a small fraction of the number of parameters needed by other approaches. We empirically evaluate our approach using time-resolved face-to-face proximity data, showing that the learned representations outperform state-of-the-art methods when used to solve downstream tasks such as network reconstruction. Good performance on predicting the outcome of dynamical processes such as disease spreading shows the potential of this new method to estimate contagion risk, providing early risk awareness based on contact tracing data.

1. INTRODUCTION

A great variety of natural and artificial systems can be represented as networks of elementary structural entities coupled by relations between them. The abstraction of such systems as networks helps us understand, predict and optimize their behaviour (Newman, 2003; Albert & Barabási, 2002) . In this sense, node and graph embeddings have been established as standard feature representations in many learning tasks (Cai et al., 2018; Goyal & Ferrara, 2018) . Node embedding methods map nodes into low-dimensional vectors that can be used to solve downstream tasks such as edge prediction, network reconstruction and node classification. Node embeddings have proven successful in achieving low-dimensional encoding of static network structures, but many real-world networks are inherently dynamic (Holme & Saramäki, 2012) . Timeresolved networks are also the support of important dynamical processes, such as epidemic or rumor spreading, cascading failures, consensus formation, etc. (Barrat et al., 2008) . Time-resolved node embeddings have been shown to yield improved performance for predicting the outcome of dynamical processes over networks, such as information diffusion and disease spreading (Sato et al., 2019) , providing estimation of infection and contagion risk when used with contact tracing data. Since we expect having more data on proximity networks being used for contact tracing and as proxies for epidemic risk (Alsdurf et al., 2020) , learning meaningful representations of time-resolved proximity networks can be of extreme importance when facing events such as epidemic outbreaks (Kapoor et al., 2020; Gao et al., 2020) . The manual and automatic collection of time-resolved proximity graphs for contact tracing purposes presents an opportunity for quick identification of possible infection clusters and infection chains. Even before the COVID-19 pandemic, the use of wearable proximity sensors for collecting time-resolved proximity networks has been largely discussed in the literature and many approaches have been used to describe patterns of activity and community structure, and to study spreading patterns of infectious diseases (Sapienza et al., 2015; Gauvin et al., 2014; Génois et al., 2015) .

availability

The source code and data are publicly available at [link to anonymized

