SUFFICIENT SUBGRAPH EMBEDDING MEMORY FOR CONTINUAL GRAPH REPRESENTATION LEARNING Anonymous

Abstract

Memory replay, which constructs a buffer to store representative samples and retrain the model over the buffer to maintain its performance over existing tasks, has shown great success for continual learning with Euclidean data. Directly applying it to graph data, however, can lead to the memory explosion problem due to the necessity to consider explicit topological connections of representative nodes. To this end, we present Parameter Decoupled Graph Neural Networks (PDGNNs) with Sufficient Subgraph Embedding Memory (SSEM) to fully utilize the explicit topological information for memory replay and reduce the memory space complexity from O(nd L ) to O(n), where n is the memory buffer size, d is the average node degree, and L is the range of neighborhood aggregation. Specifically, PDGNNs decouple trainable parameters from the computation subgraphs via Sufficient Subgraph Embeddings (SSEs), which compress subgraphs into vectors (i.e., SSEs) to reduce the memory consumption. Besides, we discover a pseudo-training effect in memory based continual graph learning, which does not exist in continual learning on Euclidean data without topological connection (e.g., individual images). Based on the discovery, we develop a novel coverage maximization sampling strategy to enhance the performance when the memory budget is tight. Thorough empirical studies demonstrate that PDGNNs with SSEM outperform state-of-the-art techniques for both class-incremental and task-incremental settings.

1. INTRODUCTION

Continual graph representation learning (Liu et al., 2021; Zhou & Cao, 2021; Zhang et al., 2021) , which aims to accommodate new types of emerging nodes in a graph and their associated edges without interfering with the model performance over existing nodes, is an emerging area that attracts increasingly more attention recently. It exhibits enormous value in various practical applications, especially in the case where graphs are relatively large and retraining a new model over the entire graph is computationally infeasible. For instance, in a social network, a community detection model has to keep adapting its parameters based on nodes from newly emerged communities; in a citation network, a document classifier needs to continuously update its parameters to distinguish the documents of newly emerged research fields. Memory replay (Rebuffi et al., 2017; Lopez-Paz & Ranzato, 2017; Aljundi et al., 2019; Shin et al., 2017) , which stores representative samples in a buffer for retraining the model to maintain its performance over existing tasks, exhibits great success in preventing catastrophic forgetting for various continual learning tasks, e.g., computer vision and reinforcement learning (Kirkpatrick et al., 2017; Li & Hoiem, 2017; Aljundi et al., 2018; Rusu et al., 2016) . Directly applying memory replay to graph data with message passing based graph neural networks (GNNs) (Gilmer et al., 2017; Kipf & Welling, 2016; Veličković et al., 2017) , however, could give rise to the memory explosion problem. Specifically, due to the message passing over the topological connections in graphs, retraining an L-layer GNN (Figure 1 a ) with n buffered nodes would require storing O(nd L ) nodes (Chiang et al., 2019; Chen et al., 2017) (the number of edges is not counted yet) in the buffer, where d is the average node degree. Take the Reddit dataset (Hamilton et al., 2017) for an example, its average node degree is 492, the buffer size will easily be intractable even with a 2 layer GNN. To overcome this issue, Experience Replay based GNN (ER-GNN) (Zhou & Cao, 2021) stores representative nodes in the buffer but completely ignores the topological information (Figure 1 b ). Feature graph network (FGN) (Wang et al., 2020a) implicitly encodes node proximity with the inner products between the features of the target node and its neighbors. However, the explicit topological connections are completely ignored and message passing is no longer feasible on the graph. To this end, we propose Parameter Decoupled Graph Neural Networks (PDGNNs) with Sufficient Subgraph Embedding Memory (SSEM) for continual graph learning. Since the key challenge lies in the unbounded sizes of the computation subgraphs, we introduce the concept of Sufficient Subgraph Embedding (SSE) with a fixed size but contains all necessary information of a computation subgraph for model optimization. Such SSEs can be surrogates of computation subgraphs in memory replay. Next, we found that it is infeasible to derive SSEs from MPNNs since their trainable parameters and individual nodes/edges are entangled. To this end, we formulate the PDGNNs framework to decouple them and enable memory replay only based on buffered SSEs (without the computation subgraphs). Since the size of an SSE is fixed, the memory space complexity of a buffer with size n can be dramatically reduced from O(nd L ) to O(n). Moreover, different from traditional continual learning on data without topology (e.g., images), we discover that replaying an SSE incurs a pseudotraining effect on the neighbor nodes, which strengthens the prediction of the other nodes in the same computation subgraph. This effect is unique in continual graph learning and takes place due to the neighborhood aggregation in GNNs. We further analyze that in homophilous graphs (prevalent in real-world data), the pseudo-training effect makes the SSEs corresponding to larger computation subgraphs (quantitatively measured by coverage ratio) more beneficial to continual learning. Inspired by this, we develop a novel coverage maximization sampling, which enlarges the coverage ratio of the selected SSEs and empirically enhances the performance without consuming additional memory. In experiments, we adopt both the class-incremental (class-IL) continual learning scenario (Rebuffi et al., 2017) (rarely studied for node classification under the continual learning setting) and the task-incremental (task-IL) scenario (Liu et al., 2021; Zhou & Cao, 2021) . Thorough empirical studies demonstrate that PDGNNs with SSEM outperform state-of-the-art continual graph representation learning techniques for both class-IL and task-IL settings. Our contributions are summarized below: • We formulate the framework of PDGNNs-SSEM, which successfully enable memory replay with topological information for continual graph representation learning, and reduce the memory space complexity from O(nd L ) to O(n). • PDGNNs-SSEM obtain superior performance especially in the challenging class-IL scenario. • We theoretically reveal a unique phenomenon in continual graph learning (i.e. the pseudotraining effect) when applying memory replay, and accordingly develop the coverage maximization sampling strategy to leverage this effect for improving the performance.

2. RELATED WORKS

Our proposed PDGNNs-SSEM is closely related to continual learning, continual graph learning, and decoupled graph neural networks.

2.1. CONTINUAL LEARNING & CONTINUAL GRAPH LEARNING

To alleviate the catastrophic forgetting problem encountered by machine learning models, i.e., drastic performance decrease on previous tasks after learning new tasks, existing approaches can be categorized into three types. Regularization based methods apply different constraints to prevent drastic modification of model parameters that are important for previous tasks (Farajtabar et al., 2020; Kirkpatrick et al., 2017; Li & Hoiem, 2017; Aljundi et al., 2018; Hayes & Kanan, 2020) . Parametric isolation methods adaptively allocate new parameters for the new tasks to protect the parameters for



Figure 1: (a) Directly storing computation subgraphs for replay in a multi-layer MPNN. (b) The strategy to store single nodes proposed in ER-GNN (Zhou & Cao, 2021). (c) Our PDGNNs with SSEM. The incoming computation subgraphs are first embedded as SSEs and then fed into the trainable function. The SSEs are sampled and stored with the probability computed based on their coverage ratio, i.e., the ratio of nodes covered by their computation subgraphs (Section 3.6).

