GRAPHEDITOR: AN EFFICIENT GRAPH REPRESENTA-TION LEARNING AND UNLEARNING APPROACH

Abstract

As graph representation learning has received much attention due to its widespread applications, removing the effect of a specific node from the pre-trained graph representation learning model due to privacy concerns has become equally important. However, due to the dependency between nodes in the graph, graph representation unlearning is notoriously challenging and still remains less well explored. To fill in this gap, we propose GRAPHEDITOR, an efficient graph representation learning and unlearning approach that supports node/edge deletion, node/edge addition, and node feature update for linear-GNN. Compared to existing unlearning approaches, GRAPHEDITOR requires neither retraining from scratch nor of all data presented during unlearning, which is beneficial for the settings that not all the training data are available to retrain. Besides, since GRAPHEDITOR is exact unlearning, the removal of all the information associated with the deleted nodes/edges can be guaranteed. Empirical results on real-world datasets illustrate the effectiveness of GRAPHEDITOR for both node and edge unlearning tasks. The code can be found in supplementary.

1. INTRODUCTION

In recent years, graph representation learning has been recognized as a fundamental learning problem and has received much attention due to its widespread use in various domains, including social network analysis Kipf & Welling (2017) (2017) ; Ying et al. (2018) . However, due to the increasing concerns on data privacy, removing the effect of a specific data point from the pre-trained model has become equally important. Recently, "Right to be forgotten" Wikipedia contributors (2021) empowers the users the right to request the organizations or companies to have their personal data be deleted in a rigorous manner. For example, when Facebook users deregister their account, users not only can request the company to permanently delete the account's profiles from the social network, but also require the company to eliminate the impact of the deleted data on any machine learning model trained based on the deleted data, which is known as machine unlearning Bourtoule et al. (2021) . One of the most straightforward unlearning approaches is to retrain the model from scratch using the remaining data, which could be computationally prohibitive when the dataset size is large or infeasible if not all the data are available to retrain. Recently, many efforts have been made to achieve efficient unlearning, which can be roughly classified into exact unlearning and approximate unlearning, each of which has its own limitations. 2022) proposes to approximate the unlearned model using first-order Taylor-expansion, Golatkar et al. (2020) proposes to fine-tune with Newton's method on the remaining data, and Wu et al. (2020a) proposes to transfer the gradient computed at one weight to another and retrain the model from scratch with lower computational cost. Since approximate unlearning lacks guarantee on whether all information associated with the deleted data are eliminated, these methods require injecting random noise, which can significantly hurt the model performance. Employing graph representation unlearning is even more challenging due to the dependency between nodes that are connected by edges. We not only need to remove the information related to the deleted nodes, but also need to update its impact on neighboring remaining nodes of multi-hops. Since most of the existing unlearning methods only support data deletion, extending their application to graphs is non-trivial. Motivated by the importance and challenges of graph representation unlearning, we aim at answering the following two questions: Q1: Can approximate unlearning methods remove all information related to the deleted data? To verify this, we introduce "deleted data replay test" to validate the effectiveness of unlearning in Section 5. Specifically, we add an extra-label category and change all deleted nodes to this extra-label category. To better distinguish deleted nodes from others, an extra binary feature is appended to all nodes and set the extra binary feature as "1" for the deleted nodes and as "0" for other nodes. We first pre-train the model on the dataset with extra label and feature, then we evaluate the effectiveness of unlearning method by comparing the number of the deleted nodes that are predicted as the extralabel category before and after the unlearning process. Intuitively, an effective unlearning method should unlearn all the knowledge related to the additional category and binary feature, a model after unlearning should never predict a node as the additional category. However, according to our observation, approximate unlearning fails to remove all information related to the deleted data, which motivates us to design an efficient exact graph representation unlearning method. Q2: If not, can we design an efficient exact graph representation unlearning method? We propose an exact graph learning and unlearning algorithm GRAPHEDITOR which can efficiently update the parameters with provable low time complexity. GRAPHEDITOR not only supports node/edge deletion, but also node/edge addition and node feature update. The key idea of GRAPHEDITOR is to reformulate the ordinary GNN training problem as an alternative problem with the closed-form solution. Upon receiving a deletion request, GRAPHEDITOR takes the closed-form solution as input and quickly updates the model parameters only based on a small fraction of nodes in the neighborhood of the deleted node/edge. Comparing to retraining from the scratch, GRAPHEDITOR only requires less data with a single step of computation, which is more suitable for the online setting that requires the model provider to immediately get the unlearned model or not all the training data are available to retrain. Comparing to existing exact unlearning methods GRAPHERASER Chen et al. ( 2021), GRAPHEDITOR enjoys a better performance since the unlearned model does not suffer from data heterogeneity and lack of training data on each shard model. Comparing to approximate unlearning method INFLUENCE Guo et al. (2020) and FISHER Golatkar et al. (2020) , GRAPHEDITOR guarantees removing all information related to deleted nodes/edges and does not require integrating differential privacy noise to prevent information leakage after unlearning. Contributions. We summarize our contributions as follows: 1 We introduce "deleted data reply test" to validate the effectiveness of unlearning methods and illustrate the insufficiency of approximate unlearning methods for removing all information related to the deleted nodes/edges. 2 We introduce a graph representation learning and unlearning approach GRAPHEDITOR on linear-GNNs, which supports node/edge deletion, node/edge addition, and node feature update. 3 To improve the scalability and expressiveness, we introduce subgraph sampling and the non-linearity extension of GRAPHEDITOR. 4 Empirical studies on real-world datasets that illustrates its effectiveness. 



; Hamilton et al. (2017), traffic prediction Cui et al. (2019); Rahimi et al. (2018), knowledge graphs Wang et al. (2019a;b), and recommender systems Berg et al.

Exact unlearning: Bourtoule et al. (2021) proposes to randomly split the original dataset into multiple disjoint shards and train each shard model independently. Upon receiving a data deletion request, the model provider only needs to retrain the corresponding shard model. Chen et al. (2021) extends Bourtoule et al. (2021) by taking the graph into consideration for data partition. However, splitting too many shards could hurt the model performance due to the data heterogeneity and lack of training data for each shard model Ramezani et al. (2021). On the other hand, too few shards result in retraining on massive data, which is computationally prohibitive; Approximate unlearning: Guo et al. (2020); Chien et al. (

Exact machine unlearning. Exact unlearning aims to produce the performance of the model trained without the deleted data. The most straightforward way is to retrain the model from scratch, which is in general computationally demanding, except for some model-specific or deterministic problems such as SVM Cauwenberghs & Poggio (2001), K-means Ginart et al. (2019), and decision tree Brophy & Lowd (2021). Recently, efforts have been made to reduce the computation cost for general gradient-based training problems. For example, Bourtoule et al. (2021) proposes to split the dataset into multiple shards and train an independent model on each data shard, then aggregate their prediction during inference. The data partition schema allows for an efficient retrain of models on a smaller fragment of data. However, the model performance suffers because each model has fewer data to be trained on and data heterogeneity can also deteriorate the performance. Besides, GRAPHERASER Chen et al. (2021) extends Bourtoule et al. (2021) to graph-structured

