FORGET UNLEARNING: TOWARDS TRUE DATA-DELETION IN MACHINE LEARNING

Abstract

Unlearning has emerged as a technique to efficiently erase information of deleted records from learned models. We show, however, that the influence created by the original presence of a data point in the training set can still be detected after running certified unlearning algorithms (which can result in its reconstruction by an adversary). Thus, under realistic assumptions about the dynamics of model releases over time and in the presence of adaptive adversaries, we show that unlearning is not equivalent to data deletion and does not guarantee the "right to be forgotten." We then propose a more robust data-deletion guarantee and show that it is necessary to satisfy differential privacy to ensure true data deletion. Under our notion, we propose an accurate, computationally efficient, and secure data-deletion machine learning algorithm in the online setting based on noisy gradient descent algorithm.

1. INTRODUCTION

Many corporations today collect their customers' private information to train Machine Learning (ML) models that power a variety of services, encompassing recommendations, searches, targeted ads, and more. To prevent any unintended use of personal data, privacy policies, such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA), require that these corporations provide the "right to be forgotten" (RTBF) to their data subjects-if a user wishes to revoke access to their data, an organization must comply by erasing all information about the user without undue delay (which is typically a month). This includes ML models trained in standard ways as model inversion (Fredrikson et al., 2015) and membership inference attacks (Shokri et al., 2017; Carlini et al., 2019) demonstrate that individual training data can be exfiltrated from these models. Periodic retraining of models after excluding deleted users can be costly. So, there is a growing interest in designing computationally cheap Machine Unlearning algorithms as an alternative to retraining for erasing the influence of deleted data from (and registering the influence of added data to) trained models. Since it is generally difficult to tell how a specific data point affects a model, Ginart et al. (2019) propose quantifying the worst-case information leakage from an unlearned model through an unlearning guarantee on the mechanism, defined as a differential privacy (DP) like (ε, δ)-indistinguishability between its output and that of retraining on the updated database. With some minor variations in this definition, several mechanisms have been proposed and certified as unlearning algorithms in literature (Ginart et al., 2019; Izzo et al., 2021; Sekhari et al., 2021; Neel et al., 2021; Guo et al., 2019; Ullah et al., 2021) . However, is indistinguishability to retraining a sufficient guarantee of data deletion? We argue that it is not. In the real world, a user's decision to remove his information is often affected by what a deployed model reveals about him. The same revealed information may also affect other users' decisions. Such adaptive requests make the records in a database interdependent, causing a retrained model to contain influences of a record even if the record is no longer in the training set. We demonstrate on a certified unlearning mechanism that if an adversary is allowed to design an adaptive requester that interactively generates database edit requests as a function of published models, she can re-encode a target record in the curator's database before its deletion. We argue that under adaptive requests, measuring data-deletion via indistinguishability to retraining (as proposed by Gupta et al. ( 2021)) is fundamentally flawed because it does not capture the influence a record might have previously had on the rest of the database. Our example shows a clear violation of the RTBF since even after retraining on the database with the original record removed, a model can reveal substantial information about the deleted record due to the possibility of re-encodings. Is an unlearning guarantee a sound and complete measure of data deletion when requests are non-adaptive? Again, we argue that it is neither. A sound data-deletion guarantee must ensure the non-recovery of deleted records from an infinite number of model releases after deletion. However, approximate indistinguishability to retraining implies an inability to accurately recover deleted data from a singular unlearned model only, which we argue is not sufficient. We show that certain algorithms can satisfy an unlearning guarantee yet blatantly reveal the deleted data eventually over multiple releases. The vulnerability arises in algorithms that maintain partial computations in internal data structures for speeding up subsequent deletions. These internal states can retain information even after record deletion and influence multiple future releases, making the myopic unlearning guarantee unreliable in an online setting. Several proposed unlearning algorithms in literature (Ginart et al., 2019; Neel et al., 2021) are stateful (rely on internal states) and, therefore, cannot be trusted. Secondly, unlearning is an incomplete notion of data deletion as it excludes valid data-deletion mechanisms that do not imitate retraining. For instance, a (useless) mechanism that outputs a fixed untrained model on any request is a valid deletion algorithm. However, since its output is easily distinguishable from retraining, it fails to satisfy any meaningful unlearning guarantees. This paper proposes a sound definition of data deletion that does not suffer from the abovementioned shortcomings. According to our notion, a data-deletion mechanism is reliable if A) it is stateless (i.e., it maintains no internal data structures), and B) generates models that are indistinguishable from some random variable that is independent of the deleted records. Statelessness thwarts the danger of sustained information leakage through internal data structures after deletion. Moreover, by defining data deletion as indistinguishability with any deleted-record independent random variable as oppsed to the output of retraining, we ensure reliability in presence of adaptive requests that create dependence between current and deleted records in the database. In general, we show that data-deletion mechanisms must be differentially private with respect to the remaining records to be reliable when requests are adaptive. DP also protects against membership inference attacks that extract deleted records by looking at models before and after deletion (Chen et al., 2021) . We emphasize that we are not advocating for doing data deletion through differentiallyprivate mechanisms simply because it caps the information content of all records equally, deleted or otherwise. Instead, a data-deletion mechanisms should provide two differing information reattainment bounds; one for records currently in the database in the form of a differential privacy guarantee and the other for records previously deleted in the form of a data-deletion guarantee. We also provide a reduction theorem that if a mechanism is differentially private with respect to the remaining records and satisfies a data-deletion guarantee under non-adaptive edit requests, then it also satisfies a data-deletion guarantee under adaptive requests. Based on this reduction, we redefine the problem of data-deletion as designing a mechanism that (1.) satisfies a data-deletion guarantee against nonadaptive deletion requests, (2.) is differentially private for remaining records, and (3.) has the same utility guarantee as retraining under identical differential privacy constraints. We judge the usefulness of a data-deletion mechanism based on its computational savings over retraining. For our refined problem formulation, we provide a data-deletion solution based on Noisy Gradient Descent (Noisy-GD), a popular differentially private learning algorithm (Bassily et al., 2014; Abadi et al., 2016; Chourasia et al., 2021) . Our solution demonstrates a powerful synergy between data deletion and differential privacy as the same noise needed for the privacy of records in the database also rapidly erases information regarding records deleted from the database. We provide a datadeletion guarantee for Noisy-GD in terms of Rényi divergence (Rényi et al., 1961) bound (which implies (ε, δ)-indistinguishability (Mironov, 2017)). For convex and smooth losses, we certify that under a (q, ε dd )-Rényi data-deletion and (q, ε dp )-Rényi DP constraint, our Noisy-GD based deletion mechanism for d-dimensional models over n-sized databases under adaptive edit requests that modify no more than r records can maintain optimal excess empirical risk of the order O qd ε dp n 2 while saving Ω(n log(min{ n r , n ε dd qd }) computations in gradient complexity. Our utility guarantee matches the known lower bound on private empirical risk minimization under same privacy budget (Bassily et al., 2014) . We also provide data-deletion guarantee in the non-convex setting under the assumption that loss function is bounded and smooth, and show a computational saving of Ω(dn log n r ) in gradient complexity while maintaining an excess risk of Õ qd ε dp n 2 + 1 n q ε dp .

