ONLINE LEARNING OF GRAPH NEURAL NETWORKS: WHEN CAN DATA BE PERMANENTLY DELETED?

Abstract

Online learning of graph neural networks (GNNs) faces the challenges of distribution shift and ever gbv rowing and changing training data, when temporal graphs evolve over time. This makes it inefficient to train over the complete graph whenever new data arrives. Deleting old data at some point in time may be preferable to maintain a good performance and to account for distribution shift. We systematically analyze these issues by incrementally training and evaluating GNNs in a sliding window over temporal graphs. We experiment with three representative GNN architectures and two scalable GNN techniques, on three new datasets. In our experiments, the GNNs face the challenge that new vertices, edges, and even classes appear and disappear over time. Our results show that no more than 50% of the GNN's receptive field is necessary to retain at least 95% accuracy compared to training over a full graph. In most cases, i. e., 14 out 18 experiments, we even observe that a temporal window of size 1 is sufficient to retain at least 90%.

1. INTRODUCTION

Training of Graph Neural Networks (GNNs) on temporal graphs has become a hot topic. Recent works include combining GNNs with recurrent modules (Seo et al., 2018; Manessi et al., 2020; Sankar et al., 2020; Pareja et al., 2020) and vertex embeddings as a function of time to cope with continuous-time temporal graphs (da Xu et al., 2020; Rossi et al., 2020a) . Concurrently, other approaches have been proposed to improve the scalability of GNNs. Those include sampling-based techniques (Chiang et al., 2019; Zeng et al., 2020) and shifting expensive neighborhood aggregation into pre-processing (Wu et al., 2019; Rossi et al., 2020b) or post-processing (Bojchevski et al., 2020) . However, there are further fundamental issues with temporal graphs that are not properly answered yet. First, as new vertices and edges appear (and disappear) over time, so can new classes. This results in a distribution shift, which is particularly challenging in an online setting, as there is no finite, a-priori known set of classes that can be used for training and it is not known when a new class appears. Second, scalable techniques for GNNs address the increased size of the graph, but always operate on the entire graph and thus on the entire temporal duration the graph spans. However, training on the entire history of a temporal graph (even in the context of scaling techniques like sampling (Chiang et al., 2019; Zeng et al., 2020) ) may actually not be needed to perform tasks like vertex classification. Thus, it is important to investigate if, at some point in time, one can actually "intentionally forget" old data and still retain the same predictive power for the given task. In fact, is has been observed in other tasks such as stock-market prediction that too much history can even be counterproductive (Ersan et al., 2020) . Proposed Solution and Research Questions While we do not suggest to use an entirely new GNN architecture, we propose to adapt existing GNN architectures and scalable GNN techniques to the problem of distribution shift in temporal graphs. In essence, we propose a new evaluation procedure for online learning on the basis of the distribution of temporal differences, which assesses the nature of how vertices are connected in a temporal graph by enumerating the temporal differences of connected vertices along k-hop paths. This information is crucial for balancing between capturing the distribution shift while having sufficient vertices within the GNN's receptive field. In summary, the central question we aim to answer is, whether we can intentionally forget old data without losing predictive power in an online learning scenario under presence of distribution shift.

