MEASURING FORGETTING OF MEMORIZED TRAINING EXAMPLES

Abstract

Machine learning models exhibit two seemingly contradictory phenomena: training data memorization, and various forms of forgetting. In memorization, models overfit specific training examples and become susceptible to privacy attacks. In forgetting, examples which appeared early in training are forgotten by the end. In this work, we connect these phenomena. We propose a technique to measure to what extent models "forget" the specifics of training examples, becoming less susceptible to privacy attacks on examples they have not seen recently. We show that, while non-convex models can memorize data forever in the worst-case, standard image, speech, and language models empirically do forget examples over time. We identify nondeterminism as a potential explanation, showing that deterministically trained models do not forget. Our results suggest that examples seen early when training with extremely large datasets-for instance those examples used to pre-train a model-may observe privacy benefits at the expense of examples seen later.

1. INTRODUCTION

Machine learning models are capable of memorizing information contained in their training data (Shokri et al., 2017; Carlini et al., 2019) . This is one of the reasons why models are vulnerable to privacy attacks such as membership inference (Homer et al., 2008) and training data extraction (Carlini et al., 2019) . Resulting privacy concerns have led to a variety of techniques for private machine learning, including differentially private training (Abadi et al., 2016; Papernot et al., 2016; 2018b; Ghazi et al., 2021; Malek et al., 2021) , machine unlearning (Bourtoule et al., 2021; Neel et al., 2021; Sekhari et al., 2021) , and various heuristics like regularization (Nasr et al., 2018 ), data augmentation (Amid et al., 2022) or gradient clipping (Carlini et al., 2019; Thakkar et al., 2020; Huang et al., 2022) . These techniques all modify the learning procedure to actively limit privacy leakage, including leakage that results from memorization. Instead, we observe that the training dynamics inherent to learning algorithms such as stochastic gradient descent may passively afford some forms of privacy. Such dynamics include forgetting: during iterative training, as models see new training examples, they could lose track of the specifics of earlier examples-as seen in work on catastrophic forgetting (French, 1999; McCloskey & Cohen, 1989; Kemker et al., 2018) . In this paper, we study to what extent the forgetting exhibited by machine learning models has an impact on privacy. Our work is focused at distinguishing between two overarching hypotheses for how privacy interacts with forgetting. For privacy, the pessimistic hypothesis is that memorization is a stronger effect than forgetting: traces of early training examples remain detectable by attacks long after the examples are seen in training, perhaps due to their influence on the initial decisions made during optimization. The optimistic hypothesis is that forgetting is stronger: early examples will be forgotten due to the many subsequent updates to the model as training progresses. Studying the impact of forgetting on privacy is most relevant when there is a large variation in how recently an example may be seen during training, allowing earlier examples to experience improved privacy relative to later examples. Indeed, models are increasingly trained on extremely large training sets, so that training consists of only a few epochs (or even a single one). Such settings are used when training large image models (Dai et al., 2021; Mahajan et al., 2018) , multimodal models (Radford et al., 2021) and language models (Komatsuzaki, 2019; Chowdhery et al., 2022; Hoffmann et al., 2022; Zhang et al., 2022) , which have come under significant scrutiny due to privacy concerns (Carlini et al., 2021b; Bender et al., 2021) . Similarly, when a model is being fine-tuned, the data that was originally used to pretrain the model is no longer seen in the second stage of training. Fine-tuning is also an ubiquitous technique in many domains, especially in language (Devlin et al., 2018 ), speech (Chung et al., 2021 ), and vision (Radford et al., 2021; Kornblith et al., 2019) tasks. We design a methodology for measuring whether, and how quickly, individual examples are forgotten and become less vulnerable to privacy attacks. Our methodology builds on state-of-the-art membership inference attacks (Carlini et al., 2021a; Ye et al., 2021) , the best known method for testing whether a given point was used in training. We use our methodology to show that, for deep neural networks trained on language, speech, or vision tasks, examples used early in training (and not repeated later on) are indeed forgotten by the model. We identify a number of factors which impact the speed at which forgetting happens, such as when examples appear in training, or whether they are duplicated. We then attempt to understand why forgetting happens by showcasing two settings where forgetting does not happen in the worst-case: non-convex models, such as k-means, and deterministic training algorithms with a large amount of adversary knowledge. Our result on k-means is the first instance of privacy leakage due to non-convexity we are aware of. However, on a mean estimation task, we prove that forgetting does happen as a result of the stochasticity of gradient descent, with similar properties to our empirical results. By using our approach to measuring forgetting, we hope experts training models on large datasets or fine-tuning models can determine whether (and how much) forgetting improves empirical privacy in their training pipelines. We stress that our approach is complimentary to frameworks that offer worst-case guarantees, like differential privacy, and that it should not be used in lieu of reasoning about privacy within such frameworks.

2.1. DEFINING PRIVACY IN MACHINE LEARNING

There are multiple valid privacy guarantees that have been considered for ML algorithms. First, differential privacy (DP) ensures that the distribution of the output of the algorithm does not significantly change when a single example is changed. In the context of ML, DP can be obtained through modifying either the training algorithm (Chaudhuri et al., 2011; Abadi et al., 2016) or the inference algorithm (Papernot et al., 2016; Bassily et al., 2018; Papernot et al., 2018a) . DP provably bounds the success of privacy attacks which leak information about individual training examples (see Section 2.2). More recently, other motivations for privacy have gathered interest. For instance, in machine unlearning (Cao & Yang, 2015; Ginart et al., 2019) , a user may issue a "deletion request", after which their individual contribution to the model must be erased. This is different from DP, which requires that the model not learn too much about any of its training examples. Algorithms for machine unlearning have been proposed for k-means (Ginart et al., 2019) , empirical risk minimization (Guo et al., 2019; Izzo et al., 2021; Neel et al., 2021; Ullah et al., 2021; Sekhari et al., 2021) , and deep learning (Du et al., 2019; Golatkar et al., 2020; 2021; Nguyen et al., 2020; Bourtoule et al., 2021) . If a point is perfectly unlearned, privacy attacks cannot succeed on this point. Both of these definitions of privacy-DP and unlearning-obtain privacy actively: they require that the training algorithm be modified to obtain privacy. Instead, we propose to capture privacy that is gained passively from dynamics that are inherent to training. We define and measure forgetting as a form of privacy that arises from the decay in the extractable information about an individual training point over the course of training. Our definition of forgetting is inspired by the widely observed phenomenon of catastrophic forgetting (French, 1999; McCloskey & Cohen, 1989; Kirkpatrick et al., 2017; Kemker et al., 2018; Davidson & Mozer, 2020; Kaushik et al., 2021) , where a model tends to forget previously learned knowledge when training on new data. More specifically, catastrophic forgetting is generally formulated in the continual learning setting where the model sequentially learns a number of different tasks, and the performance on previously learned tasks drops significantly as the model learns a new task. In contrast, our work considers a model trained to solve a single fixed task and measures how it forgets some of its training examples seen earlier in training. Our work is also inspired by Feldman et al.

