FROM ADAPTIVE QUERY RELEASE TO MACHINE UN-LEARNING

Abstract

We formalize the problem of machine unlearning as design of efficient unlearning algorithms corresponding to learning algorithms which perform a selection of adaptive queries from structured query classes. We give efficient unlearning algorithms for linear and prefix-sum query classes. As applications, we show that unlearning in many problems, in particular, stochastic convex optimization (SCO), can be reduced to the above, yielding improved guarantees for the problem. In particular, for smooth Lipschitz losses and any ρ > 0, our results yield an unlearning algorithm with excess population risk of O 1



nρ with unlearning query (gradient) complexity O(ρ • Retraining Complexity), where d is the model dimensionality and n is the initial number of samples. For non-smooth Lipschitz losses, we give an unlearning algorithm with excess population risk O 1 √ n + √ d nρ 1/2 with the same unlearning query (gradient) complexity. Furthermore, in the special case of Generalized Linear Models (GLMs), such as those in linear and logistic regression, we get dimension-independent rates of O 1 √ n + 1 (nρ) 2/3 and O 1 √ n + 1 (nρ) 1/3 for smooth Lipschitz and non-smooth Lipschitz losses respectively. Finally, we give generalizations of the above from one unlearning request to dynamic streams consisting of insertions and deletions.

1. INTRODUCTION

The problem of machine unlearning is concerned with updating trained machine learning models upon request of deletions to the training dataset. This problem has recently gained attention owing to various data privacy laws such as General Data Protection Regulation (GDPR), California Consumer Act (CCA) among others, which empower users to make such requests to the entity possessing user data. The entity is then required to update the state of the system such that it is indistinguishable to the state had the user data been absent to begin with. While as of now, there is no universally accepted definition of indistinguishibility as the unlearning criterion, in this work, we consider the most strict definition, called exact unlearning (see Definition 1 for a formal definition).

Motivating Example:

The main objective of our work is to identify algorithmic design principles for unlearning such that it is more efficient than retraining, the naive baseline method. Towards this, we first discuss the example of unlearning for Gradient Descent (GD) method, which will highlight the key challenges as well as foreshadow the formal setup and techniques. GD and its variants are extremely popular optimization methods with numerous applications in machine learning and beyond. In a machine learning context, it is typically used to minimize the training loss, L(w; S) = 1 n n i=1 ℓ(w; z i ) where S = {z i } n i=1 is the training dataset and w, the model. Starting from an initial model w 1 , in each iteration, the model is updated as: w t+1 = w t -η∇ L(w t ; S) = w t -η 1 n n i=1 ∇ℓ(w t ; z i ) . After training, a data-point, say z n without loss of generality, is requested to be unlearnt and so the updated training set is S ′ = {z i } n-1 i=1 . We now need to apply an efficient unlearning algorithm such that its output is equal to that of running GD on S ′ . Observe that the first iteration of GD is simple enough to be unlearnt efficiently by computing the new gradient ∇ L(w 0 ; S ′ ) = 1 n-1 n∇ L(w 1 ; S) -∇ℓ(w 1 ; z n ) and updating as w ′ 2 = w 1 -η∇ L(w 1 ; S ′ ). However, in the second iteration (and 1

