GRAPHEDITOR: AN EFFICIENT GRAPH REPRESENTA-TION LEARNING AND UNLEARNING APPROACH

Abstract

As graph representation learning has received much attention due to its widespread applications, removing the effect of a specific node from the pre-trained graph representation learning model due to privacy concerns has become equally important. However, due to the dependency between nodes in the graph, graph representation unlearning is notoriously challenging and still remains less well explored. To fill in this gap, we propose GRAPHEDITOR, an efficient graph representation learning and unlearning approach that supports node/edge deletion, node/edge addition, and node feature update for linear-GNN. Compared to existing unlearning approaches, GRAPHEDITOR requires neither retraining from scratch nor of all data presented during unlearning, which is beneficial for the settings that not all the training data are available to retrain. Besides, since GRAPHEDITOR is exact unlearning, the removal of all the information associated with the deleted nodes/edges can be guaranteed. Empirical results on real-world datasets illustrate the effectiveness of GRAPHEDITOR for both node and edge unlearning tasks. The code can be found in supplementary.

1. INTRODUCTION

In recent years, graph representation learning has been recognized as a fundamental learning problem and has received much attention due to its widespread use in various domains, including social network analysis Kipf & Welling (2017) (2017); Ying et al. (2018) . However, due to the increasing concerns on data privacy, removing the effect of a specific data point from the pre-trained model has become equally important. Recently, "Right to be forgotten" Wikipedia contributors (2021) empowers the users the right to request the organizations or companies to have their personal data be deleted in a rigorous manner. For example, when Facebook users deregister their account, users not only can request the company to permanently delete the account's profiles from the social network, but also require the company to eliminate the impact of the deleted data on any machine learning model trained based on the deleted data, which is known as machine unlearning Bourtoule et al. (2021) . One of the most straightforward unlearning approaches is to retrain the model from scratch using the remaining data, which could be computationally prohibitive when the dataset size is large or infeasible if not all the data are available to retrain. Recently, many efforts have been made to achieve efficient unlearning, which can be roughly classified into exact unlearning and approximate unlearning, each of which has its own limitations. 2020) proposes to fine-tune with Newton's method on the remaining data, and Wu et al. (2020a) proposes to transfer the gradient computed at one weight to another and retrain the model from scratch with lower computational cost. Since approximate unlearning lacks guarantee on whether all information associated with the deleted data are eliminated, 1



; Hamilton et al. (2017), traffic prediction Cui et al. (2019); Rahimi et al. (2018), knowledge graphs Wang et al. (2019a;b), and recommender systems Berg et al.

Exact unlearning: Bourtoule et al. (2021) proposes to randomly split the original dataset into multiple disjoint shards and train each shard model independently. Upon receiving a data deletion request, the model provider only needs to retrain the corresponding shard model. Chen et al. (2021) extends Bourtoule et al. (2021) by taking the graph into consideration for data partition. However, splitting too many shards could hurt the model performance due to the data heterogeneity and lack of training data for each shard model Ramezani et al. (2021). On the other hand, too few shards result in retraining on massive data, which is computationally prohibitive; Approximate unlearning: Guo et al. (2020); Chien et al. (2022) proposes to approximate the unlearned model using first-order Taylor-expansion, Golatkar et al. (

