KNOWLEDGE UNLEARNING FOR MITIGATING PRIVACY RISKS IN LANGUAGE MODELS

Abstract

Pretrained Language Models (LMs) memorize a vast amount of knowledge during initial pretraining, including information that may violate the privacy of personal lives and identities. Previous work addressing privacy issues for language models has mostly focused on data preprocessing and differential privacy methods, both requiring re-training the underlying LM. We propose knowledge unlearning as an alternative method to reduce privacy risks for LMs post hoc. We show that simply applying the unlikelihood training objective to target token sequences is effective at forgetting them with little to no degradation of general language modeling performances for larger LMs; it sometimes even substantially improves the underlying LM with just a few iterations. We also find that sequential unlearning is better than trying to unlearn all the data at once and that unlearning is highly dependent on which kind of data (domain) is forgotten. By showing comparisons with a previous data preprocessing method and decoding method known to mitigate privacy risks for LMs, we show that unlearning can give a strong empirical privacy guarantee in scenarios where the data vulnerable to extraction attacks are known a priori while being orders of magnitude more computationally efficient and robust. We release the code and dataset needed to replicate our results at http://www.omitted.link/.

1. INTRODUCTION

Recent work has shown that an adversary can extract training data from Pretrained Language Models (LMs) including Personally Identifiable Information (PII) such as names, phone numbers, and email addresses, and other information such as licensed code, private clinical notes, and 128-bit UUIDs (Carlini et al., 2021; Lee et al., 2022; Huang et al., 2022; Lehman et al., 2021) . In 2021, an AI chatbot Iruda became the first AI system to be sued for violating the Personal Information Protection Act after generating the exact home addresses and bank account numbers of actual individuals unintentionally (Park, 2021) . Heikkilä (2022) has also shown that GPT-3 (Brown et al., 2020) , one of the most well known LM currently in commercial use, offered detailed private information about the Editor-in-Chief of MIT Technology Review including his family members, work address, and phone number. Considering findings that show extracting training data gets easier as LMs scale to larger sizes (Carlini et al., 2022a ) and that it is common practice for practitioners to release billion parameter pretrained LMs for public use (Gao et al., 2020; Black et al., 2021; Zhang et al., 2022) , it has become important to provide privacy guarantees for large LMs. Practitioners are required to delete personal information from the LMs by individuals' request because each individual has the "Right To Be Forgotten (RTBF)" (Mantelero, 2013; Graves et al., 2021) and can limit the direct and indirect commercial use of their personal information (Villaronga et al., 2018) . Previous methods addressing privacy risks for language models attempt to remove all private information from the training data (data preprocessing) (Aura et al., 2006; Dernoncourt et al., 2017; Lison et al., 2021; Kandpal et al., 2022) or attempt to design algorithms that ensure differential privacy (DP) (Dwork, 2008; Dwork et al., 2006; Abadi et al., 2016; Anil et al., 2021; Li et al., 2022; Yu et al., 2022) . Both approaches require retraining the underlying LM every time individuals want to practice their RTBF, which makess them inadequate for large LMs that are extremely costly to retrain. Furthermore, as pointed out by Brown et al. (2022) , data preprocessing methods assume private information to be easily identifiable, specified, and removed and DP algorithms can only guarantee protection for information that has clear privacy borders, which make them inadequate in the real-world scenarios where the standard of privacy might differ by each individuals. To this end, we propose knowledge unlearning (Figure 1 ) as an efficient solution that can be applied with just a few parameter updates instead of pretraining the underlying LM again. We perform experiments on GPT-Neo LMs (125M, 1.3B, 2.7B) (Black et al., 2021) and show that simply changing the gradient descent to the opposite direction during language modeling (which can also be seen as maximizing instead of minimizing the loss function) is effective at protecting target sequences from extraction attacks with little to no performance degradation on the initial LM capabilities measured via 9 common NLP classification benchmarks (Hellaswag (Zellers et al., 2019) , Lambada (Paperno et al., 2016) , Winogrande (Sakaguchi et al., 2021) , COPA (Gordon et al., 2012) , ARC-Easy (Clark et al., 2018) , ARC-Challenge (Clark et al., 2018) , Piqa (Bisk et al., 2020) , MathQA (Amini et al., 2019) , and PubmedQA (Jin et al., 2019) ) and 4 dialogue tasks (Wizard of Wikipedia (Dinan et al., 2019) , Empathetic Dialogues (Rashkin et al., 2019 ), Blended Skill Talk (Smith et al., 2020) , and Wizard of Internet (Komeili et al., 2022) ). For some cases, knowledge unlearning unexpectedly shows significant improvements in LM performance for some of the benchmarks. We compare our approach with data deduplication method (Kandpal et al., 2022) and differential privacy decoding method (Majmudar et al., 2022) which are both known to mitigate privacy risks, and show the effectiveness of knowledge unlearning by providing a strong privacy protection while being much more efficient and robust. We also provide a general guideline that can be used to quantify the memorization and extraction likelihood of target token sequences and suggest when we can empirically consider them to have been "forgotten". Specifically, we introduce a novel metric that measures the extraction likelihood by varying the prefix length of the target token sequence and quantifying how much of the suffix is actually extracted from the LM. Surprisingly, for knowledge unlearning, we find that it is easier to forget a chunk of instances sequentially rather than trying to forget them all at once. We provide further analysis and show that the difficulty of knowledge unlearning depends heavily on the target data being forgotten, especially the domain of the target data. We also provide empirical examples of performing extraction attacks and how exactly knowledge unlearning provides a privacy protection for the LM. To summarize, our main contributions are fourfold: • We compare knowledge unlearning with two approaches from literature known to mitigate privacy risks: a data preprocessing approach and a Differential Privacy (DP) Decoding approach. We show that our approach results in little to no performance degradation of general capabilities (sometimes resulting in improvement) while providing a strong privacy protections in situations individuals practice their RTBF whereas the data preprocessing approach provides a weaker privacy protection while being orders of magnitude computationally demanding and the DP Decoding approach results in a severe degradation of modeling performance. • We perform additional experiments to determine which factors contribute to the difficulty of knowledge unlearning and find that (1) trying to forget many samples at once results in substantial LM performance degradation which can be mitigated by sequentially forgetting chunks of data and that (2) the domain of the target data (Code, License, Wikipedia, etc.) plays a critical role in determining how hard they are to forget. • We provide a novel metric and a general guideline for quantifying the privacy risks for LMs and determine when they should be considered to have "forgotten" a given target sequence. • Knowledge unlearning surprisingly seems to make LMs stronger where the extreme cases bring +8.0% (37.6% → 45.6%), +10.1% (57.4% → 67.5%), and +7.9% (62.2% → 70.1%) improvements on Lambada for GPT-NEO 125M, 1.3B, and 2.7B, respectively.

2.1. PRIVACY METHODS FOR LANGUAGE MODELS

Prior work that tries to mitigate privacy risks for LMs can be divided mainly into data pre/postprocessing methods and differential privacy methods. (Data) Pre/Post-Processing Data preprocessing aims to sanitize the training data; it aims to get rid of all data that might violate any kind of privacy from the training data prior to training. These methods mostly utilize measures such as parsers and classification models that try to identify and predict patterns that constitute private information. This is effective at identifying well-formatted private information such as social security numbers or special forms of medical notes (Aura et al., 2006; Dernoncourt et al., 2017; Lison et al., 2021; Kandpal et al., 2022) . However, as pointed out by Brown et al. (2022) , considering that private information is mostly context-dependent and sometimes in a non-specific format, data preprocessing methods cannot fully claim that they provide privacy guarantees, especially guarantees that match each individual's standards. Methods that attempt to utilize post-processing methods such as applying censorship to the LM outputs still face the same limitations. In this work, we compare our proposed method with a data preprocessing approach proposed by Kandpal et al. (2022) which shows that deduplicating the training corpora before pretraining helps pretrain LMs that show stronger robustness against extraction attacks than an LM pretrained under the same circumstances without deduplicating the pretraining corpora. However, we highlight that this approach, which may still be effective at mitigating the overall privacy risks, is not the most suitable approach when considering a realistic scenario of individuals requesting the removal of their information from the implicit parameters of the LMs. Differential Privacy Differential Privacy (DP) aims to guarantee that the effect of an individual input on the output of a specific function is bounded (Dwork, 2008; Dwork et al., 2006) . In the context of deep neural networks, DP, which needs to be applied during the training phase, aims to construct models that can provide general guarantees that the individual information within the training data cannot be inferred (Abadi et al., 2016) . While DP has shown to be surprisingly effective at fine-tuning LMs (Li et al., 2022; Yu et al., 2022) , pretraining LMs with DP still suffers from substantial performance gap, expensive computation, and slow convergence (Anil et al., 2021) . Furthermore, as pointed out by Brown et al. (2022) , DP can only provide limited guarantees for LMs because DP requires a unified definition for privacy boundaries, which is inherently impossible for natural language data. Most importantly, in a realistic scenario where individuals may practice their Right-To-Be-Forgotten (RTBF) dynamically after model deployment, it is nontrivial to apply existing descent-based DP algorithms such as DP-SGD to only protection against targeted extraction attacks.

2.2. MACHINE UNLEARNING

Machine unlearning has received attention as an alternative approach to overcome data privacy issues in machine learning (Cao & Yang, 2015; Ginart et al., 2019; Bourtoule et al., 2021; Graves et al., 2021) . Several studies attempt to explore machine unlearning for deep neural networks (Golatkar et al., 2020; Mehta et al., 2022) . However, they mostly focus on proposing algorithms for image classification models where they aim to forget a whole class; that is, achieve random performance for specific image classes such as "cats" or "ships". We are the first, to the best of our knowledge, to explore unlearning a specific sequence of tokens for LMs which is a quite different set-up from traditional image classification models (∼tens of image classes vs. a sequence of tokens that can each be classified into V ∈ R ∼50,000 ). In this work, we coin this approach as knowledge unlearning since we are more focused on forgetting specific knowledge represented by sequences of tokens. Zhou et al. (2022) focus on how forgetting can be leveraged to improve the performance of the underlying model. They propose "forget-and-relearn" that unifies existing iterative training algorithms by selectively removing undesirable information and re-learning good features, helping boost performance for the task of image classification and multi-agent emergence communication. The underlying assumption is that it is often easier to define and stop unwanted behavior than to teach good behavior. We also show this phenomenon in Section 4 where we unintentionally find unlearning just a few sequences of tokens sometimes boosts general LM capabilities.

2.3. MEMORIZATION IN LANGUAGE MODELS

Previous work that explores to which extent LMs have memorized their training data approach the phenomenon with two different viewpoints. Some work view memorization of LMs simply as a threat to individual privacy (Carlini et al., 2021; 2022a; Jagielski et al., 2022) and utilize metrics that quantify how much the LMs are susceptible to adversarial attacks. These metrics are mostly dependent on the specific types of attacks such as the membership inference attack (Shokri et al., 2017) and measure the privacy risks of LMs by quantifying the success rate of these attacks. In our work, we instead focus on more targeted extraction attacks. Another line of work simply quantifies how much knowledge is accumulated and forgotten during pretraining by extracting relational knowledge about the world (Petroni et al., 2019; Lazaridou et al., 2021; Jang et al., 2022b; a) . This line of work does not view memorization as a negative trait, but as a positive one that can be leveraged to extract world knowledge from its implicit parameters and perform knowledge-intensive tasks such as question answering or training knowledgeable conversation agents. Our work is highly related to Jagielski et al. ( 2022)'s work where they also assert that forgetting can be a relaxed version of differential privacy. However, there are two main differences between our work and theirs. First, they only analyze forgetting as a passive form of mitigating privacy, asserting that data seen early in large-scale training obtain privacy benefits, whereas we suggest a more active form of forgetting. Second, they only show analysis results with image classification and audio generation models while we specifically focus on large LMs.

3.1. METHODOLOGY

We propose simply negating the original training objective of minimizing the negative log-likelihood of the token sequences as our main method of knowledge unlearning in LMs. Specifically, given a sequence of tokens x = (x 1 , ..., x T ), our unlearning training objective is simply maximizing the following loss function: L U L (f θ , x) = - T t=1 log(p θ (x t |x <t )) where x <t denotes the token sequence x = (x 1 , ..., x t-1 ) and p θ (x t |x <t ) denotes the conditional probability of predicting the next token to be x t when given x <t to an LM f with parameters θ. Prior work refer to this training objective as unlikelihood training and combines it together with the original loss of minimizing the negative log-likelihood for the final objective of enhancing language modeling quality (Welleck et al., 2020) and few-shot learning for downstream NLP tasks (Tam et al., 2021) . In contrast, we simply optimize the unlikelihood training objective since we are only concerned with forgetting. While this method seems simple, it is highly effective at forgetting specific token sequences without affecting the overall LM capabilities as shown in Section 4.

3.2. QUANTIFYING PRIVACY RISKS OF LANGUAGE MODELS

In this subsection, we introduce two metrics we use to quantify the privacy risks given a specific token sequence and how we empirically define the token sequence to be forgotten. In this work, we do not utilize metrics such as membership inference attack recall (Shokri et al., 2017) since we are not interested in quantifying the general privacy risks of LMs, but instead the privacy risks on the specific target token sequences. Extraction Likelihood (EL) We first introduce a new metric, EL. Given a sequence of tokens x = (x 1 , ..., x T ) and an LM f with pre-trained parameters θ, we define EL to be as follows: EL n (x) = T -n t=1 OVERLAP n (f θ (x <t ), x ≥t ) T -n (2) OVERLAP n (a, b) = c∈n-grams(a) 1{c ∈ n-grams(b)} |n-grams(a)| where n-grams() denotes the list of n-grams in the given token sequence and f θ (x <t ) denotes the output token sequences from the LM f θ when given x <t as input that can have max lengths |x ≥t | but may be shorter when the EOS (end-of-sequence) token is generated beforehand. The process of varying the prefix length |x <t | can be seen as varying the strength of adversarial attacks. This is based on the assumption that the more prior information is provided about the target token sequence, the easier the LM will be able to extract it. Overall, EL can be seen as estimating the general extraction likelihood since we are measuring the average success rate of varying extraction attacks quantified via getting the n-gram overlap of generated and target token sequences. While previous metrics quantifying the privacy risks of LMs are dependent on specific adversarial attacks, this characteristic of EL allows it to quantify the general likelihood of extraction without any dependency on specific extraction attacks. We regard n to be a hyper-parameter that can be varied depending on the stringency of privacy standards. The higher n is set, the stricter we set the standard for a successful extraction attack. Memorization Accuracy (MA) We define Memorization Accuracy (MA) as follows: MA(x) = T -1 t=1 1{ argmax(p θ (•|x <t )) = x t } T -1 MA quantifies how much f θ has memorized the given token sequences and was proposed by Tirumala et al. (2022) to analyze the training dynamics of large LMs. Empirical Definition of Forgetting By utilizing both EL n and MA, we empirically define a specific token sequence x to be forgotten and is no longer susceptible to extraction attacks when the following conditions are met: EL n (x) ≤ 1 |D ′ | x ′ ∈D ′ EL n (x ′ ) and MA(x) ≤ 1 |D ′ | x ′ ∈D ′ MA(x ′ ) (5) where D ′ represents a validation corpora not seen during training. In other words, we define x to be forgotten when the EL n (x) and MA(x) reach a value that is lower than the average EL n and MA on token sequences that were not seen during training.

4.1. MODELS, DATASETS, AND CONFIGURATIONS

Baselines For the experiments, we use the GPT-NEO (125M, 1.3B, 2.7B) LMs (Black et al., 2021) initially pretrained on all of the Pile corpora (825GB) (Gao et al., 2020) , and the OPT (125M, 1.3B, 2.7B) LMs (Zhang et al., 2022) , pretrained on a subset of the deduplicated version of the Pile as well as other corpora from different domains. For the experiments, we perform unlearning the GPT-NEO LMs and quantify the privacy risks of the target data compared to the OPT LMs to measure how effective our proposed approach is in contrast to deduplicating the training corpora before pretraining the underlying LM Kandpal et al. (2022) . We do not use the exact LMs from Kandpal et al. (2022) because the LMs were not open-sourced, and thus use the OPT LMs instead. We also consider the Differential Privacy (DP) Decoding (Majmudar et al., 2022) as one of the baselines; This approach proposes a decoding strategy that performs linear interpolation of the original logits with the uniform distribution and performs nucleus sampling, which they theoretically show provides DP guarantees. λ is set as the linear interpolation weight where λ = 0 performs nucleus sampling from the uniform distribution and λ = 1 performs regular nucleus sampling, using the logits as weights during random sampling. Target Data For the actual target data used to quantify the privacy risks of the LMs, we sample instances from the Training Data Extraction Challengefoot_0 where 15,000 examples (each are 200 token sequences long) from 16 different domains of the Pile corpora that are identified to be somewhat easy-to-extract are provided. For our experiments, we randomly sample s samples from the 15,000 examples and make the underlying LM forget the s samples at once. As a default, we show the average results of 5 random samplings of s samples for all of our experimental settings. We only provide the average of the 5 samplings and do not separately report the standard deviation. Instead, we provide the results of each individual run in Appendix A. Evaluation Datasets Provding stronger privacy protections for LMs may become meaningless if it requires sacrificing their original capabilities. Thus, while quantifying the privacy risks of LMs, we also quantify the original LM capabilities by evaluating the LMs on 9 different classification tasks quantifying the general capabilities: Hellaswag (Zellers et al., 2019) and Lambada (Paperno et al., 2016) benchmarks to measure linguistic reasoning abilities, Winogrande (Sakaguchi et al., 2021) and COPA (Gordon et al., 2012) to measure commonsense reasoning abilities, and ARC-Easy (Clark et al., 2018) , ARC-Challenge (Clark et al., 2018) , Piqa (Bisk et al., 2020) , MathQA (Amini et al., 2019) , PubmedQA (Jin et al., 2019) benchmarks to measure the scientific reasoning abilities. We also evaluate on 4 dialogue tasks (Wizard of Wikipedia (Dinan et al., 2019 ), Empathetic Dialogues (Rashkin et al., 2019 ), Blended Skill Talk (Smith et al., 2020) , and Wizard of Internet (Komeili et al., 2022) ) to evaluate the generation capabilities of the LMs. We use the test set for Lambada and the validation set for the rest of the datasets. We also show the results of measuring the perplexity on the validation corpora of Pile and Wikitext in Appendix B. We do not include measuring perplexity as one of the main evaluations because perplexity might not be the most suitable metric for quantifying general LM performance, especially in the case of unlearning (further explanation given in Appendix B. We evaluate DP Decoding only on the 4 dialogue tasks because the decoding strategy cannot be applied for performing the classification tasks which is evaluated by utilizing a verbalizer. Configurations For the learning rate, we set it to 5e-5. We show the effect of varying learning rates in Appendix D. We use a constant learning rate scheduling throughout the run. We fix the global batch size to be the same as s (how many samples are forgotten at once) because having global batch sizes smaller than s proved to degrade general LM capabilitiesfoot_1 . For EL n , we set n=10 which means EL measures the extraction likelihood of extracting n consecutive tokens of varying extraction attackfoot_2 . For calculating EL 10 and MA, we use a naïve greedy decoding strategy. We set both the dropout and weight decay rates to 0. Lastly, while we provide a guideline of empirically deciding a single token sequence to be forgotten in Section 3.2, for considering a chunk of s token sequences to be forgotten, we use the average EL 10 and MA as an approximation of the individual EL 10 and MA.

4.2. MAIN EXPERIMENTS

Forgetting Threshold First, we show how we get the Forgetting Threshold for EL 10 and MA, the values where we consider the token sequence to be forgotten and unsusceptible from extraction attacks, for all model sizes of GPT-NEO LMs in Table 1 . For D ′ , we perform weighted sampling (same domain distribution as the Pile training corpora) of 10,000 instances each with token lengths 200 from the Pile validation corpora and measure the average EL 10 and MA (Equation 5), which are empirically set as the Forgetting Threshold values.

Main Results

Table 2 shows the main results of performing unlearning on LMs of varying sizes and the baselines. While we provide the average performances of the 5 random samplings in Overall, results show unlearning to be an effective approach to providing a strong privacy protection while retaining and sometimes even improving general LM capabilities. Sequential Unlearning is more Stable than Batch Unlearning. We show the effect of varying s (the # of data instances to be forgotten at once) in Figure 2a across model scales. We denote this approach as batch unlearning. As shown by the s = 128 results, it is harder to forget more samples at once, resulting in substantial degradation of average LM performance regardless of how large the LM is. Since s ≤ 32 does not show much degradation, we explore if sequentially unlearning can be a solution. In Figure 2b , we show the result of dividing the 128 samples into 4 chunks of 32 and performing sequential unlearning; we unlearn each chunk at a time until the chunk reaches the forgetting threshold. Surprisingly, as shown by the performance gap at s = 128 between the dotted lines (the s = 128 performance of Figure 2a ) and straight lines, the end result is vastly different even though exactly the same instances were forgotten. Sequential unlearning shows almost no degradation of average LM performance. In Appendix G, we show that chunks once forgotten stay forgotten and that later chunks are forgotten much faster compared to the initial chunk. This result hints at the generalization of unlearning, which we do not further explore in the scope of this work. The result also suggests that knowledge unlearning can be continually applied to LMs when needed.

4.3. ANALYSIS OF KNOWLEDGE UNLEARNING

Providing Better Intuition of What Exactly Happens During Knowledge Unlearning. To show exactly what happens to the LM during knowledge unlearning, we show how the performance of each of the LM benchmarks changes as we perform 10 runs of unlearning to the GPT-NEO (1.3B) model (each run with s = 1) in Figure 3 . As shown in the figure, the LM performance for each benchmark varies tremendously on which sample is chosen to be forgotten. Furthermore, the ending time of each run is different, indicating that some samples are forgotten faster than others. We also show empirical examples of performing actual extraction attacks with prefix length of 100 in Appendix F. Towards Understanding Why Some Instances are Harder to Forget To measure why some instances are harder to forget, we perform 5 random samplings of s = 8 from 8 different domains from the Training Data Extraction Challengefoot_4 and perform unlearning on the GPT-NEO 1.3B LM. We also show the results of each individual run in Appendix A. As shown in Table 3 , despite undergoing the same number of token updates (10 epochs of unlearning), different domains result in vastly different outcomes; ENRON EMAILS results in the average LM performance degradation of only -0.4% while USPTO BACKGROUNDS results in -4.5% degradation. Furthermore, the final EL 10 varies depending on the domain, suggesting that some domains (e.g., FREELAW) are harder to forget than others. Lastly, domains that are more structured, which means the data consists of some kind of patterns such as a list of emails (ENRON EMAILS) or code (GITHUB (CODE)), seem to result in less degradation of LM performance in contrast to domains that are more unstructured, which means the data consist of mostly raw English text such as a review for journal submission (PUBMED We provide examples from each domain in Appendix E. However, further analysis of understanding exactly which components make unlearning work should be made in future work.

5. CLOSING

In this paper, we propose knowledge unlearning as a method for mitigating privacy risks in LMs that provides a strong privacy protection with little to no degradation of general LM capabilities measured by evaluating on 9 common LM classification benchmarks and 4 dialogue benchmarks for the larger sized LMs. As large LMs expand their use cases, potentially affecting the daily lives of people, the research community should make sure that the privacy of individuals is not violated intentionally or unintentionally by the knowledge stored in the implicit parameters of these models. Since it is inherently impossible to prevent and predict all future privacy concerns prior to pretraining the LM, we suggest the community consider knowledge unlearning for ensuring privacy upon individuals' requests post hoc pretraining.foot_5 

A FULL RESULTS

We provide all of the results for the 5 random samplings for our main experimental setting in Table 4 and the full results for the domain analysis setting in Table 5 . We also provide the evaluation of the 4 dialogue tasks for s = 32 for all model sizes in Table 6 B MEASURING PILE AND WIKITEXT PERPLEXITY Table 7 shows the results of measuring perplexity on 500 samples from the validation set of Pile and Wikitext corpora on the LMs from the main experimental setting (Table 2 ). Results show that LMs that underwent knowledge unlearning show higher perplexity while the main experimental table (Table 2 ) does not show degradation of performance on 9 different LM benchmarks. We believe the discrepancy to be due to the inherent attributes of performing unlearning: since we are doing gradient ascent, we are likely softening the probability to generate each token from the vocabulary, giving it a more uniform distribution that will inevitably result in a higher perplexity. However, since it does not show much degradations in the LM benchmarks, it also means that the argmax of the most likely token to be generated has not changed much. However, further exploration of what exactly knowledge unlearning does to the representations of the LM should be done in future work. Figure 4 : Varying the learning rate for unlearning the GPT-NEO 1.3B with s = 32. We report the average of 3 random samplings and display the standard deviations as the shaded regions. Red dotted lines denote the memorization accuracy forgetting threshold of the 1.3B model reported in Table 1 .

C COMPUTATION COMPARISON BETWEEN DEDUPLICATION AND KNOWLEDGE UNLEARNING

We show the FLOPs of pretraining OPT denoted as DEDUPLICATION and the average FLOPs of performing knowledge unlearning until s = 32 token sequences reach the Forgetting Threshold denoted as UNLEARNING in Table 8 . We calculate FLOPs by (6 × Total Training Tokens × Parameter Size) following Brown et al. (2020) .

D VARYING THE LEARNING RATE

In Figure 4 , we show the results of varying the learning rate for knowledge unlearning where we fix the total epoch to 10 and perform 3 random runs with s = 32 on the GPT-NEO 1.3B. Overall, we observe that higher learning rates lead to faster forgetting, but with substantial LM performance degradation. While lower learning rates retain the LM performance, they fail to meet the Forgetting Threshold within 10 epochs. Thus, we set the learning rate to 5e-5 for our experiments to get the best trade-off. Figure 5 : Additional results of sequential unlearning for GPT-NEO 125M, 1.3B, and 2.7B. Red dotted lines denote the memorization accuracy forgetting threshold reported of each model in Table 1 .

E TEXT EXAMPLE FROM EACH DOMAIN

We show an example token sequence from each of the 8 domains used for the analysis section in Table 9 .

F MORE EXAMPLES OF PERFORMING EXTRACTION ATTACKS

In addition to the extraction attack example shown in the analysis section, we provide 3 additional examples to provide readers with more empirical examples of how knowledge unlearning ensures protection against extraction attacks in Table 10 .

G ADDITIONAL RESULTS OF SEQUENTIAL KNOWLEDGE UNLEARNING

We show how the EL 10 of each individual chunks and the average LM performance change as we perform sequential unlearning in Figure 5 . Results show that the chunks that are forgotten stay forgotten and that later chunks are forgotten much faster (one or two epochs) compared to the initial chunk. We hypothesize that this might be because of the similarity of the token sequences from the 15,000 examples from the Training Extraction Challenge Benchmark. Also, this result hints at the generalization of unlearning, which we do not further explore because of the scope of this work.

H THE EFFECT OF VARYING N FOR EXTRACTION LIKELIHOOD (EL) METRIC

First we show the Extraction Liklihood (EL) Forgetting Threshold values for n= [5, 10, 20, 40] by measuring the value on the 10,000 validation instances unseen during training in Table 11 . Next, we show the average LM performance (on the 9 classification benchmarks) where we perform unlearning on the LM on 32 samples until the target token sequences are forgotten (the EL MA value are both lower than the threshold values) in Table 12 . Performance shows the average of 5 random samplings.

I LIMITATIONS

While we provide privacy guarantee through unlearning, our Forgetting Threshold is dependent on which data samples are chosen as D ′ . Furthermore, varying the prefix length can be seen as a naïve way of varying the strength of the extraction attacks. In a real-world scenario, extraction attacks may be more complicated and may require other prevention methods. Also, we could not directly compare our approach with a Differential Privacy (DP) (Anil et al., 2021) approach because there are no open-sourced LMs pretrained with a DP algorithm. We could not replicate the pretrainig phase because of the heavy computational resources needed to pretrain an LM with DP which is estimated to require thousands of GPU hours. We leave this comparison for future work. Finally, a recent work (Carlini et al., 2022b) has suggested that machine unlearning (for the vision domain) can bring negative effects harming the privacy of other users. Future work should explore this phenomenon in the setting of performing unlearning on large language models as well. 



https://github.com/google-research/lm-extraction-benchmark In Section 4.3, We show that s plays a critical role in determining how much the unlearning will degrade in general capabilities of the LM since s = 128 shows to result in much degradation. Method to mitigate this is proposed in Section 4.3 as well. We set the n value to 10 since we empirically consider an extraction to be successful when 10 consecutive token sequences are successfuly generated by the LM. We show varying the n with values from[5,10,20,40] in Appendix H. Computational efficiency is measured via FLOPs which is calculated by (6 × Total Training Tokens × Parameter Size) as inBrown et al. (2020). FLOPs for OPT LMs were estimated using information fromZhang et al. (2022). We provide the FLOPs for the methods in Appendix C. https://github.com/google-research/lm-extraction-benchmark We provide some limitations of our work in Appendix I.



Figure 1: Comparison of previous approaches and knowledge unlearning when an individual practices his/her Right-To-Be-Forgotten (RTBF).

Figure 2: Average LM performance on the 9 classification benchmarks when varying the total number of samples forgotten at once is shown in (a) and the average LM performances when the 128 samples are divided into 4 chunks and are forgotten sequentially is shown in (b). The lines denote the average performances of 5 random samplings and the standard deviation is shown as the shaded regions. The dotted lines in (b) denotes the s = 128 performance in (a) for comparison purposes.

Figure 3: Performance on the 9 classification benchmarks as we perform 10 different unlearning runs on GPT-NEO 1.3B where s = 1.

Forgetting Threshold for GPT-NEO LMs

Main Results showing the average of 5 random sampling of s = 32 (forgetting 32 samples at once). OPT represents the LM with deduplication applied. NEO denotes the initial GPT-NEO LM, NEO + DPD + represents applying the DP Decoding strategy by varing the λ to match the forgetting criteria, NEO + UL represents performing unlearning on the initial NEO until it provides a stronger security for the target sequences than OPT, NEO + UL + represents performing unlearning on GPT-NEO until target sequences match the forgetting criteria, LM Avg. denotes the average accuracy of the 9 classification datasets, and Dialogue Avg. denotes the average F1 score of the 4 dialogue datasets. Best comparable performances are bolded and second best underlined.

we provide each individual runs in Appendix A for reference.We highlight five main observations regarding the results. (1) OPT LMs show a much lower EL 10 and MA than GPT-NEO LMs, confirming that deduplicating the pretraining corpora is indeed helpful for mitigating privacy risks. (2) NEO + DPD + enables effective protection against extraction attacks demonstrated via the lowest EL and MA score; however, it brings severe degradation of generation capabilities measured via the Average F1 score of the 4 dialogue generation tasks. (3) NEO + UL + results in severe degradation of both classification and dialogue tasks for the 125M, only severe degradation of dialogue tasks for 1.3B LM while for the 2.7B LMs, it enables retaining most of its previous capabilities. (4) While the LMs scale to larger sizes, it takes fewer epochs for the target sequences to be forgotten. Together with (3), this implies that larger LMs are strong unlearners. (5) While NEO + UL + provides a stronger privacy protection than OPT without sacrificing its performance from NEO for the 2.7B LM, it is much more computationally efficient (3,500,000x) than re-training the underlying LM, which is required for all data preprocessing approaches 4 .

Unlearning GPT-NEO 1.3B on token sequences sampled from 8 different domains. We fix the epoch to 10, set s = 8 and show the result of the average of 5 random samplings. Italicized () denotes the ∆ from INITIAL.

All of the individual runs for the Main Results

All of the individual runs for s = 32 for the dialogue tasks in the Main Results.

Measuring perplexity on Pile and Wikitext corpora for the main unlearning experiments (Table2).

Training compute comparison of methods mitigating privacy risks in LMs for sizes 125M, 1.3B, and 2.7B measured via FLOPs.

Examples performing extraction attacks on token sequences, showing knowledge unlearning provides protection against extraction attacks. Underlined denotes the model generated text given the prefix of length 100 as input. For the extraction attack, we utilize a naïve greedy decoding strategy.

The average of the 9 classification tasks for GPT-NEO + UL + for the 1.3B LM when performing unlearning until the Forgetting Threshold for each n.

Original

About the Publisher Australia HarperCollins Publishers (Australia) Pty. Ltd. 25 Ryde Road (PO Box 321) Pymble, NSW 2073, Australia http://www.harpercollinsebooks.com.au Canada HarperCollins Publishers Ltd. 55 Avenue Road, Suite 2900 Toronto, ON, M5R, 3L2, Canada http://www.harpercollinsebooks.ca New Zealand HarperCollins Publishers (New Zealand) Limited P.O. Box 1 Auckland, New Zealand http://www.harpercollinsebooks.co.nz United Kingdom HarperCollins Publishers Ltd. 77-85 Fulham Palace Road London, W6 8JB, UK http://www.harpercollinsebooks.co.uk Text Before About the Publisher Australia HarperCollins Publishers (Australia) Pty. Ltd. 25 Ryde Road (PO Box 321) Pymble, NSW 2073, Australia http://www.harpercollinsebooks.com.au Canada HarperCollins Publishers Ltd. 55 Avenue Road, Suite 2900 Toronto, ON, M5R, 3L2, Canada http://www.harpercollinsebooks.ca New Zealand HarperCollins Publishers (New Zealand) Limited P.O. Box 1 Auckland, New Zealand http://www.harpercollinsebooks.co.nz United Kingdom HarperCollins Publishers Ltd. 77-85 Fulham Palace Road London, W6 8JB, UK http://www.harpercollinsebooks.co.uk 

annex

To: Hedy Govenar hgovenar@govadv.com , Mike Day MDay@GMSSR.com , Bev Hansen bhansen@lhom.com , Jeff Dasovich jdasovic@ enron.com , Susan J Mara smara@enron.com , Joseph Alamo JAlamo@enron.com , Paul Kaufman paul.kaufman@enron. The pharmaceutical formulations of the present invention, which may conveniently be presented in unit dosage form, may be prepared according to conventional techniques well known in the pharmaceutical industry. Such techniques include the step of bringing into association the active ingredients with the pharmaceutical carrier(s) or excipient(s). In general the formulations are prepared by uniformly and intimately bringing into association the active ingredients with liquid carriers or finely divided solid carriers or both, and then, if necessary, shaping the product. The compositions of the present invention may be formulated into any of many possible dosage forms such as, but not limited to, tablets, capsules, gel capsules, liquid syrups, soft gels, suppositories, and enemas.PUBMED CENTRAL I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your articleś publication date.

Unlearning

After James Gurney This daily weblog by Dinotopia creator James Gurney is for illustrators, plein-air painters, sketchers, comic artists, animators, art students, and writers. You'll find practical studio tips, insights into the making of the Dinotopia books, and first-hand reports from art schools and museums. CG Art Contact or by email:gurneyjourney (at) gmail.com I've been working on a CG art project for a while now, and I've been working on it for a while now. I've been working on it for a while now, and I've been working on it for a while now. I've been working on it for a while now, and I've been working on it for a while now. I've been working on a CG art project for a while now, and I've been working on it for a while

Original

Rick Shapiro rshapiro@enron.com , Jim Steffes james.d.steffes@enron.com , Alan Comnes acomnes@enron.com , Chris Calger ccalger@enron.com , Mary Hain mary.hain@enron.com , Joe Hartsoe Joe.Hartsoe@enron.com , Donna Fulton Donna.Fulton@enron.com , Steven Kean Steven.J.Kean@ enron.com , Karen Denne kdenne@enron.com , Beverly Aden beverly.aden@enron.com , Bill Votaw bill.votaw@enron.com , Carol Moffett carol. moffett@enron.com , Debora Whitehead deb

Before

Rick Shapiro rshapiro@enron.com , Jim Steffes james.d.steffes@enron.com , Alan Comnes acomnes@enron.com , Chris Calger ccalger@enron.com , Mary Hain mary.hain@enron.com , Joe Hartsoe Joe.Hartsoe@enron.com , Donna Fulton Donna.Fulton@enron.com , Steven Kean Steven.J.Kean@ enron.com , Karen Denne kdenne@enron.com , Beverly Aden beverly.aden@enron.com , Bill Votaw bill.votaw@enron.com , Carol Moffett carol. moffett@enron.com , Debora Whitehead Unlearning After Rick Shapiro rshapiro@enron.com , Jim Steffes james.d.steffes@enron.com , Alan Comnes acomnes@enron.com , Chris Calger ccalger@enron.com , Mary Hain mary.hain@enron.com , Joe Hartsoe Joe.Hartsoe@enron.com , Donna Fulton Dabat, state+[D@calenergy.com] Unlearning 

