DYG2VEC: REPRESENTATION LEARNING FOR DY-NAMIC GRAPHS WITH SELF-SUPERVISION Anonymous

Abstract

The challenge in learning from dynamic graphs for predictive tasks lies in extracting fine-grained temporal motifs from an ever-evolving graph. Moreover, task labels are often scarce, costly to obtain, and highly imbalanced for large dynamic graphs. Recent advances in self-supervised learning on graphs demonstrate great potential, but focus on static graphs. State-of-the-art (SoTA) models for dynamic graphs are not only incompatible with the self-supervised learning (SSL) paradigm but also fail to forecast interactions beyond the very near future. To address these limitations, we present DyG2Vec, an SSL-compatible, efficient model for representation learning on dynamic graphs. DyG2Vec uses a window-based mechanism to generate task-agnostic node embeddings that can be used to forecast future interactions. DyG2Vec significantly outperforms SoTA baselines on benchmark datasets for downstream tasks while only requiring a fraction of the training/inference time. We adapt two SSL evaluation mechanisms to make them applicable to dynamic graphs and thus show that SSL pre-training helps learn more robust temporal node representations, especially for scenarios with few labels.

1. INTRODUCTION

Graph Neural Networks (GNNs) have recently found great success in representation learning for complex networks of interactions, as present in recommendation systems, transaction networks, and social media (Wu et al., 2020; Zhang et al., 2019; Qiu et al., 2018) . However, most approaches ignore the dynamic nature of graphs encountered in many real-world domains. Dynamic graphs model complex, time-evolving interactions between entities (Kazemi et al., 2020; Skarding et al., 2021; Xue et al., 2022) . Multiple works have revealed that real-world dynamic graphs possess finegrained temporal patterns known as temporal motifs (Toivonen et al., 2007; Paranjape et al., 2017) . For example, a simple pattern in social networks specifies that two users who share many friends are likely to interact in the future. A robust representation learning approach must be able to extract such temporal patterns from an ever-evolving dynamic graph in order to make accurate predictions. Self-Supervised Representation Learning (SSL) has shown promise in achieving competitive performance for different data modalities on multiple predictive tasks (Liu et al., 2021) . Given a large corpus of unlabelled data, SSL postulates that unsupervised pre-training is sufficient to learn robust representations that are predictive for downstream tasks with minimal fine-tuning. However, it is important to specify a pre-training objective function that induces good performance for the downstream tasks. Contrastive SSL methods, despite their early success, rely heavily on negative samples, extensive data augmentation, and large batch sizes (Jing et al., 2022; Garrido et al., 2022) . Non-contrastive methods address these shortcomings, incorporating information theoretic principles through architectural innovations or regularization methods. These closely resemble strategies employed in manifold learning and spectral embedding methods (Balestriero & LeCun, 2022) . The success of such SSL methods on sequential data (Tong et al., 2022; Eldele et al., 2021; Patrick et al., 2021) suggests that one can learn rich temporal node embeddings from dynamic graphs without direct supervision. SSL methods are attractive for dynamic graphs because it is often costly to generate ground truth labels. Contrastive approaches are very sensitive to the quality of the negative samples, and these are challenging to identify in dynamic graphs due to the temporal evolution of interactions and the 1

