FROM ADAPTIVE QUERY RELEASE TO MACHINE UN-LEARNING

Abstract

We formalize the problem of machine unlearning as design of efficient unlearning algorithms corresponding to learning algorithms which perform a selection of adaptive queries from structured query classes. We give efficient unlearning algorithms for linear and prefix-sum query classes. As applications, we show that unlearning in many problems, in particular, stochastic convex optimization (SCO), can be reduced to the above, yielding improved guarantees for the problem. In particular, for smooth Lipschitz losses and any ρ > 0, our results yield an unlearning algorithm with excess population risk of O 1



nρ with unlearning query (gradient) complexity O(ρ • Retraining Complexity), where d is the model dimensionality and n is the initial number of samples. For non-smooth Lipschitz losses, we give an unlearning algorithm with excess population risk O 1 √ n + √ d nρ 1/2 with the same unlearning query (gradient) complexity. Furthermore, in the special case of Generalized Linear Models (GLMs), such as those in linear and logistic regression, we get dimension-independent rates of O 1 √ n + 1 (nρ) 2/3 and O 1 √ n + 1 (nρ) 1/3 for smooth Lipschitz and non-smooth Lipschitz losses respectively. Finally, we give generalizations of the above from one unlearning request to dynamic streams consisting of insertions and deletions.

1. INTRODUCTION

The problem of machine unlearning is concerned with updating trained machine learning models upon request of deletions to the training dataset. This problem has recently gained attention owing to various data privacy laws such as General Data Protection Regulation (GDPR), California Consumer Act (CCA) among others, which empower users to make such requests to the entity possessing user data. The entity is then required to update the state of the system such that it is indistinguishable to the state had the user data been absent to begin with. While as of now, there is no universally accepted definition of indistinguishibility as the unlearning criterion, in this work, we consider the most strict definition, called exact unlearning (see Definition 1 for a formal definition).

Motivating Example:

The main objective of our work is to identify algorithmic design principles for unlearning such that it is more efficient than retraining, the naive baseline method. Towards this, we first discuss the example of unlearning for Gradient Descent (GD) method, which will highlight the key challenges as well as foreshadow the formal setup and techniques. GD and its variants are extremely popular optimization methods with numerous applications in machine learning and beyond. In a machine learning context, it is typically used to minimize the training loss, L(w; S) = 1 n n i=1 ℓ(w; z i ) where S = {z i } n i=1 is the training dataset and w, the model. Starting from an initial model w 1 , in each iteration, the model is updated as: w t+1 = w t -η∇ L(w t ; S) = w t -η 1 n n i=1 ∇ℓ(w t ; z i ) . After training, a data-point, say z n without loss of generality, is requested to be unlearnt and so the updated training set is S ′ = {z i } n-1 i=1 . We now need to apply an efficient unlearning algorithm such that its output is equal to that of running GD on S ′ . Observe that the first iteration of GD is simple enough to be unlearnt efficiently by computing the new gradient ∇ L(w 0 ; S ′ ) = 1 n-1 n∇ L(w 1 ; S) -∇ℓ(w 1 ; z n ) and updating as w ′ 2 = w 1 -η∇ L(w 1 ; S ′ ). However, in the second iteration (and onwards), the gradient is computed on w ′ 2 which can be different from w 2 and the above adjustment can no longer be applied and one may need to retrain from here onwards. This captures the key challenge for unlearning in problems solved by simple iterative procedures such as GD -adaptivityi.e., the gradients (or more generally, the queries) computed in later iteration depend on the result of the previous iterations. We systematically formalize such procedures and design efficient unlearning algorithms for them. We summarize our key contributions below.

1.1. OUR RESULTS AND TECHNIQUES

Learning/Unlearning as Query Release: Iterative procedures are an integral constituent of the algorithmic toolkit for solving machine learning problems and beyond. As in the case of GD above, these often consist of a sequence of simple but adaptive computations. The simple computations are often efficiently undo-able (as in the first iteration of GD) but its adaptive nature -change of result of one iteration changing the trajectory of the algorithm -makes it difficult to undo computation, or unlearn, efficiently. As opposed to designing unlearning (and learning) algorithms for specific (machine learning) problems, we study the design of unlearning algorithms corresponding to (a class of) learning algorithms. We formalize this by considering learning algorithms which perform adaptive query release on datasets. Specifically, this consists of a selection of adaptive queries from structured classes like linear and prefix-sum queries (see Section 3 for details). The above example of GD is an instance of linear query, since the query, which is the average gradient 1 n n i=1 ∇ℓ(w t ; z i ), is linear in the data-points. With this view, we study how to design efficient unlearning algorithms for such methods. We use efficiency in the sense of number of queries made (query complexity), ignoring the use of other resources, e.g., space, computation for selection of queries, etc. To elaborate on why this is interesting, firstly note that this does not make the problem trivial, in the sense that even with unlimited access to other resources, it is still challenging do design an unlearning algorithm with query complexity smaller than that of retraining (the naive baseline). Secondly, let us revisit the motivation from solving optimization problems. The standard model to measure computation in optimization is the number of gradient queries a method makes for a target accuracy, often abstracted in an oracle-based setup (Nemirovskij & Yudin, 1983) . Importantly, this setup imposes no constraints on other resources, yet it witnesses the optimality of well-known simple procedures like (variants of) GD. We follow this paradigm, and as applications of our results to stochastic convex optimization (SCO), we make progress on the fundamental question of understanding the gradient complexity of unlearning in SCO. Interestingly, our proposed unlearning procedures are simple enough that the improvement over retraining in terms of query complexity also applies even with accounting for the (arithmetic) complexity of all other operations in the learning and unlearning methods. Linear queries: The simplest query class we consider is that of linear queries (details deferred to Appendix B). Herein, we show that the prior work of Ullah et al. (2021) , which focused on unlearning in SCO and was limited to the stochastic gradient method, can be easily extended to general linear queries. This simple observation yields unlearning algorithms for algorithms for federated optimization, k-means clustering, etc. Herein, we give a ρ-TV stable (see Definition 2) learning procedure with T adaptive queries and a corresponding unlearning procedure with a O( √ T ρ) relative unlearning complexity (the ratio of unlearning and retraining complexity; see Definition 4). Prefix-sum queries: Our main contribution is the case when we consider the class of prefix-sum queries. These are a sub-class of interval queries which have been extensively studied in differential privacy and are classically solved by the binary tree mechanism (Dwork et al., 2010) . We note in passing that for differential privacy, the purpose of the tree is to enable a tight privacy accounting and no explicit tree may be maintained. In contrast, for unlearning, we show that maintaining the binary tree data structure aids for efficient unlearning. We give a binary-tree based ρ-TV stable learning procedure and a corresponding unlearning procedure with a O(ρ) relative unlearning complexity. Unlearning in Stochastic Convex Optimization (SCO): Our primary motivation for considering prefix-sum queries is its application to unlearning in SCO (see Section 2 for preliminaries). where ρ is the relative



1. Smooth SCO: The problem of unlearning in smooth SCO was studied in Ullah et al. (2021) which proposed algorithms with excess population risk of O 1

