THE BRAINY STUDENT: SCALABLE DEEP UNLEARN-ING BY SELECTIVELY DISOBEYING THE TEACHER

Abstract

Deep machine unlearning is the problem of removing the influence of a cohort of data from the weights of a trained deep model. This challenge has enjoyed increasing attention recently, motivated to the widespread use of neural networks in applications involving user data: allowing users to exercise their 'right to be forgotten' necessitates an effective unlearning algorithm. Deleting data from models is also of interest in practice for removing out-of-date examples, outliers or noisy labels. However, most previous unlearning methods consider simple scenarios where a theoretical treatment is possible. Consequently, not only do their guarantees not apply to deep neural networks, but they also scale poorly. In this paper, drawing inspiration from teacher-student methods, we propose a scalable deep unlearning method that breaks free of previous limiting assumptions. Our thorough empirical investigation reveals that our approach significantly improves upon previous methods in being by far the most consistent in achieving unlearning in a wide range of scenarios, while incurring only a minimal performance degradation, if any, and being significantly more scalable than previous methods.

1. INTRODUCTION

Can we make a deep learning model 'forget' a subset of its training data? Aside from being scientifically interesting, achieving this goal of Deep Machine Unlearning is increasingly important and relevant from a practical perspective. Regulations such as EU's General Data Protection Regulation (Mantelero, 2013) stipulate that individuals can request to have their data 'deleted' (the 'right to be forgotten'). Nowadays, given the ubiquity of deep learning systems in a wide range of applications, including computer vision, natural language processing, speech recognition and healthcare, allowing individuals to exercise this right necessitates deep unlearning algorithms. Further, in addition to privacy considerations, this type of data deletion may also be desirable for several practical applications: removing out-of-date examples, outliers, poisoned samples (Jagielski et al., 2018) , noisy labels (Northcutt et al., 2021) , or data that may carry harmful biases (Fabbrizzi et al., 2022) . However, truly removing the influence of a subset of the training set from the weights of a trained model is hard, since deep models memorize information about specific instances (Zhang et al., 2020; 2021; Arpit et al., 2017) and their highly non-convex nature makes it difficult to trace the effect of each example on the model's weights. Of course, we can apply the naive solution of re-training the model from scratch without the cohort of data to be forgotten. This conceptually-simple procedure indeed guarantees that the weights of the resulting model aren't influenced by the instances to forget (it performs 'exact unlearning'), but the obvious drawback is computational inefficiency: re-training a deep learning model to accommodate each new forgetting request isn't viable in practice. To mitigate the large computational cost of exact unlearning, recent research has turned to approximate unlearning (Izzo et al., 2021; Golatkar et al., 2020a; b) . The goal, as defined in these works, is to modify the weights of the trained model in order to produce a new set of weights that approximates those that would have been obtained by the exact procedure of re-training from scratch. That is, they strive to achieve 'indistinguishability' between the models produced by the exact and approximate solutions, typically accompanied by theoretical guarantees for the quality of that approximation. Unfortunately, though, most previous approaches suffer from the following important problems. Firstly, their guarantees are derived in simple scenarios of convex loss functions, where theoretical analysis is feasible, thus excluding the arguably most relevant application of unlearning: deep neural networks. Secondly, in light of recent evidence, model indistinguishability may not be a necessary nor sufficient condition for successful unlearning (Thudi et al., 2022; Goel et al., 2022) , putting into question the soundness of these guarantees even in the cases where they do apply. Finally, previous approaches suffer from poor scalability, both to the size of the training dataset (and, consequently, the size of the models) as well as to the size of the cohort that will be forgotten (Goel et al., 2022) . The desire to scale to larger training datasets and model sizes needs little motivation. Indeed, this scaling has been a large contributing factor of the continuing progress of deep learning (Kaplan et al., 2020) , creating an increasing need to also scale deep unlearning algorithms accordingly. In addition, scalability to the size of the forget set is very important since, as Goel et al. ( 2022) also point out, even a single user might own many samples from the training set that should be deleted, or several users may submit deletion requests in 'bursts' after certain events like revelations of privacy leakages by an organization. More generally, applications of removal of out-of-date or poisoned data, noisy labels or data that carries harmful biases don't come with assumptions on the size of those deletion sets. For instance, an organization may wish to delete all data that is older than a prescribed retention period. In all these cases, the forget sets can be very large. Motivated by the above desiderata, we propose an approach that breaks free from limiting assumptions made in previous work and is scalable both to training set and forget set sizes. To that end, drawing inspiration from the teacher-student methodology, we propose an unlearning approach which we dub SCalable Remembering and Unlearning with a Brainy Student (SCRUBS). Concretely, we train a 'brainy' student model that is smart enough to disobey the teacher to avoid inheriting information about the forget cohort, while obeying it otherwise, to distill all useful information that it is allowed to keep. Our thorough empirical investigation reveals that our approach is by far the most consistent in achieving unlearning in a wide range of scenarios, while incurring only a minimal performance degradation, if any, and being significantly more scalable than previous methods.

2. PROBLEM DEFINITION

Notation and preliminaries Let D = {x i , y i } N i=1 denote a training dataset containing N examples, where the i'th example is represented by a vector of input features x i and a corresponding class label y i . Let f (•; w) denote a function parameterized by trainable weights w. In this work, we study the case where f is represented by a deep neural network, trained in a supervised manner via empirical risk minimization. Specifically, we define the loss function as L(w) = 1 N N i=1 l(f (x i ; w), y i ) where l denotes the cross entropy loss.

Deep Machine Unlearning

We now formalize the problem of deep machine unlearning. We assume that we are given a model f (•; w o ) that has been trained on D, where f denotes a neural network and w o its trained parameters, obtained by minimizing Equation 1. We will refer to this as the 'original model' (i.e. before any unlearning intervention is performed). We are also given a 'forget set' (3) where I(p; q) stands for the mutual information between the probability distributions of p and q and p(f (D; w)) is the distribution over outputs produced by f with parameters w on the dataset D. D f = {x i , y i } N f i=1 ⊂ D,



comprised of N f examples from D and a 'retain set' D r of N r training examples that the model is still allowed to retain information for. For simplicity, we consider the standard scenario where D r is the complement of D f or, in other words, D f ∪ D r = D.Given f (•; w o ), D f and D r , the goal of deep machine unlearning is to design a scalable and efficient algorithm for producing a new set of weights w u that contains as little information as possible about D f while not sacrificing performance. That is, f (•; w u ) and f (•; w o ) should perform equally well on the retain set, as well as on held-out test data outside of D.Formally, we want the following two equations to hold:I(p(f (D f ; w u ); p(f (D f ; w o )) = 0(2) I(p(f (D r ; w u ); p(f (D r ; w o )) = 1

