UNLEARNABLE EXAMPLES: MAKING PERSONAL DATA UNEXPLOITABLE

Abstract

The volume of "free" data on the internet has been key to the current success of deep learning. However, it also raises privacy concerns about the unauthorized exploitation of personal data for training commercial models. It is thus crucial to develop methods to prevent unauthorized data exploitation. This paper raises the question: can data be made unlearnable for deep learning models? We present a type of error-minimizing noise that can indeed make training examples unlearnable. Error-minimizing noise is intentionally generated to reduce the error of one or more of the training example(s) close to zero, which can trick the model into believing there is "nothing" to learn from these example(s). The noise is restricted to be imperceptible to human eyes, and thus does not affect normal data utility. We empirically verify the effectiveness of error-minimizing noise in both sample-wise and class-wise forms. We also demonstrate its flexibility under extensive experimental settings and practicability in a case study of face recognition. Our work establishes an important first step towards making personal data unexploitable to deep learning models. Code is available at https://github.com/HanxunH/Unlearnable-Examples.

1. INTRODUCTION

In recent years, deep learning has had groundbreaking successes in several fields, such as computer vision (He et al., 2016) and natural language processing (Devlin et al., 2018) . This is partly attributed to the availability of large-scale datasets crawled freely from the Internet such as ImageNet (Russakovsky et al., 2015) and ReCoRD (Zhang et al., 2018b) . Whilst these datasets provide a playground for developing deep learning models, a concerning fact is that some datasets were collected without mutual consent (Prabhu & Birhane, 2020) . Personal data has also been unconsciously collected from the Internet and used for training commercial models (Hill, 2020) . This has raised public concerns about the "free" exploration of personal data for unauthorized or even illegal purposes. In this paper, we address this concern by introducing unlearnable examples, which aims at making training examples unusable for Deep Neural Networks (DNNs). In other words, DNNs trained on unlearnable examples will have a performance equivalent to random guessing on normal test examples. Compared with preserving an individual's privacy by obfuscating information from the dataset, what we aim to achieve here is different but more challenging. First, making an example unlearnable should not affect its quality for normal usage. For instance, an unlearnable "selfie" photo should be free from obvious visual defects so it can be used as a social profile picture. Ideally, this can be achieved by using imperceptible noise. In our setting, the noise can only be added to training examples on a single occasion (when the data is uploaded to the internet) prior to model training. However, DNNs are known to be robust to small noise either random (Fawzi et al., 2016) or adversarial (Szegedy et al., 2013; Goodfellow et al., 2014; Ma et al., 2018) . It is still not clear whether small, imperceptible noise can stop the training of high-performance DNNs. The development of unlearnable examples should take full advantage of the unique characteristics, and more importantly, the weaknesses of DNNs. One well-studied characteristic of DNNs is that they tend to capture more of the high-frequency components of the data (Wang et al., 2020a) . Surprisingly, by exploiting this characteristic, we find that small random noise when applied in a class-wise manner to the training data can easily fool DNNs to overfit to such noise (shown in Section 4). However, early stopping can effectively counteract this type of noise. DNNs are also known to be vulnerable to adversarial (or error-maximizing) noise, which are small perturbations crafted to maximize the model's error at the test time (Szegedy et al., 2013; Goodfellow et al., 2014) . We find that error-maximizing noise cannot stop DNN learning when applied in a sample-wise manner to the training examples. This motivates us to explore the opposite direction to error-maximizing noise. Specifically, we propose a type of error-minimizing noise that can prevent the model from being penalized by the objective function during training, and thus can trick the model into believing there is "nothing" to learn from the example(s). We refer to an example that contains the errorminimizing noise as an unlearnable example. Error-minimizing noise can be generated in different forms: sample-wise and class-wise. Class-wise error-minimizing noise is superior to random noise and cannot be circumvented by early stopping. Sample-wise error-minimizing noise is the only effective noise that can make training examples unlearnable compared to random (Fawzi et al., 2016) or error-maximizing noise (Muñoz-González et al., 2017) 

2. RELATED WORK

In this section, we briefly review most relevant works in data privacy, data poisoning, adversarial attacks against deep learning models. Data Privacy. Privacy issues have been extensively studied in the field of privacy-preserving machine learning (Shokri & Shmatikov, 2015; Abadi et al., 2016; Phan et al., 2016; 2017; Shokri et al., 2017) . While these works have made significant progress towards protecting data privacy, they are developed based on the assumption that the model can freely explore the training data and turn to protect the model from leaking sensitive information about the training data. In this paper, we consider a more challenging scenario where the goal of the defender is to make personal data completely unusable by unauthorized deep learning models. Fawkes (Shan et al., 2020) has made the first attempt towards this type of strict situation. By leveraging the targeted adversarial attack, Fawkes prevents unauthorized face tracker from tracking a person's identity. This work is similar to ours as we share a common objective that prevents unauthorized data usage. In contrast to the targeted adversarial attack, we propose a novel error-minimizing noise to produce unlearnable examples which can be used as a generic framework for a wide range of data protection tasks. et al., 2017; Liu et al., 2020) . However, the backdoor attack does not harm the model's performance on clean data (Chen et al., 2017; Shafahi et al., 2018; Barni et al., 2019; Liu et al., 2020; Zhao et al., 2020) . Thus, it is not a valid method for data protection. Different from these works, we generate unlearnable examples with invisible noise to "bypass" the training of DNNs.

Data

Adversarial Attack. It has been found that adversarial examples (or attacks) can fool DNNs at the test time (Szegedy et al., 2013; Goodfellow et al., 2014; Kurakin et al., 2016; Carlini & Wagner, 2017; Madry et al., 2018; Jiang et al., 2019; Wu et al., 2020a; Bai et al., 2020; Croce & Hein, 2020; Wang 



† Correspondence to: Xingjun Ma (daniel.ma@deakin.edu.au), Yisen Wang (yisen.wang@pku.edu.cn)



. Our main contributions are: • We present a type of error-minimizing noise that can create unlearnable examples to prevent personal data from being freely exploited by deep learning models. The noise is small, imperceptible to human eyes, thus it does not reduce general data utility. • We propose a bi-level optimization process to effectively generate different forms of errorminimizing noise: sample-wise and class-wise. • We empirically verify the effectiveness and flexibility of error-minimizing noise for creating unlearnable examples. We also demonstrate the practical application of unlearnable examples in real-world scenarios via a case study on face recognition.

Poisoning. Data poisoning attacks aim to degrade the model's performance on clean examples by modifying the training examples. Previous work has demonstrated a poisoning attack on SVM(Biggio et al., 2012).Koh & Liang (2017)  proposed to poison the most influential training examples using adversarial (error-maximizing) noise against DNNs, which has also been integrated into an endto-end framework(Muñoz-González et al., 2017). Although data poisoning attacks can potentially prevent free data exploitation, these approaches are quite limited against DNNs and hard to operate in real-world scenarios. For example, poisoned examples can only slightly decrease DNNs' performance(Muñoz-González et al., 2017), and often appear distinguishable to clean examples(Yang et al.,  2017)  which will reduce normal data utility. The backdoor attack is another type of attack that poisons training data with a stealthy trigger pattern (Chen

