SELF-ENSEMBLE PROTECTION: TRAINING CHECK-POINTS ARE GOOD DATA PROTECTORS

Abstract

As data becomes increasingly vital, a company would be very cautious about releasing data, because the competitors could use it to train high-performance models, thereby posing a tremendous threat to the company's commercial competence. To prevent training good models on the data, we could add imperceptible perturbations to it. Since such perturbations aim at hurting the entire training process, they should reflect the vulnerability of DNN training, rather than that of a single model. Based on this new idea, we seek perturbed examples that are always unrecognized (never correctly classified) in training. In this paper, we uncover them by model checkpoints' gradients, forming the proposed self-ensemble protection (SEP), which is very effective because (1) learning on examples ignored during normal training tends to yield DNNs ignoring normal examples; (2) checkpoints' cross-model gradients are close to orthogonal, meaning that they are as diverse as DNNs with different architectures. That is, our amazing performance of ensemble only requires the computation of training one model. By extensive experiments with 9 baselines on 3 datasets and 5 architectures, SEP is verified to be a new state-of-the-art, e.g., our small ℓ ∞ = 2/255 perturbations reduce the accuracy of a CIFAR-10 ResNet18 from 94.56% to 14.68%, compared to 41.35% by the best-known method. Code is available at https://github.com/Sizhe-Chen/SEP.

1. INTRODUCTION

Large-scale datasets have become increasingly important in training high-performance deep neural networks (DNNs). Thus, it is a common practice to collect data online (Mahajan et al., 2018; Sun et al., 2017) , an almost unlimited data source. This poses a great threat to the commercial competence of data owners such as social media companies since the competitors could also train good DNNs from their data. Therefore, great efforts have been devoted to protecting data from unauthorized use in model training. The most typical way is to add imperceptible perturbations to the data, so that DNNs trained on it have poor generalization (Huang et al., 2020a; Fowl et al., 2021b) . Existing data protection methods use a single DNN to generate incorrect but DNN-sensitive features (Huang et al., 2020a; Fu et al., 2021; Fowl et al., 2021b) for training data by, e.g., adversarial attacks (Goodfellow et al., 2015) Such examples could be easily uncovered by the gradients from the ensemble of model training checkpoints. However, ensemble methods have never been explored in data protection to the best of our knowledge, so it is natural to wonder Can we use these intermediate checkpoint models for data protection in a self-ensemble manner? * Correspondence to Xiaolin Huang (xiaolinhuang@sjtu.edu.cn). An effective ensemble demands high diversity of sub-models, which is generally quantified by their gradient similarity (Pang et al., 2019; Yang et al., 2021) , i.e., the gradients on the same image from different sub-models should be orthogonal. Surprisingly, we found that checkpoints' gradients are as orthogonal as DNNs with different architectures in the conventional ensemble. In this regard, we argue that intermediate checkpoints are very diverse to form the proposed self-ensemble protection (SEP), challenging existing beliefs on their similarity (Li et al., 2022) . By SEP, effective ensemble protection is achieved by the computation of training only one DNN. Since the scale of data worth protecting is mostly very large, SEP avoids tremendous costs by training multiple models. Therefore, our study enables a practical ensemble for large-scale data, which may help improve the generalization, increase the attack transferability, and study DNN training dynamics. Multiple checkpoints offer us a pool of good features for an input. Thus, we could additionally take the advantage of diverse features besides diverse gradients at no cost. Inspired by neural collapse theory (Papyan et al., 2020) , which demonstrates that the mean feature of samples in a class is a highly representative depiction of this class, we bring about a novel feature alignment loss that induces a sample's last-layer feature collapse into the mean of incorrect-class features. With features from multiple checkpoints, FA robustly injects incorrect features so that DNNs are deeply confounded. Equipping SEP with FA, our method achieves astonishing performance by revealing the vulnerability of DNN training: (1) our examples are mostly mis-classified in any training processes compared to a recent method (Sandoval-Segura et al., 2022) , and (2) clean samples are always much closer to each other than to protected samples, indicating that the latter belong to another distribution that could not be noticed by normal training. By setting ℓ ∞ = 2/255, a very small bound, SEP perturbations on the CIFAR-10 training set reduce the testing accuracy of a ResNet18 from 94.56% to 14.68%, while the best-known results could only reach 41.35% with the same amount of overall calculation to craft the perturbations. The superiority of our method is also observable in the study on CIFAR-100 and ImageNet subset on 5 architectures. We also study perturbations under different norms, and found that mixing ℓ ∞ and ℓ 0 perturbations (Wu et al., 2023) is the only effective way to resist ℓ ∞ adversarial training, which could recover the accuracy for all other types of perturbations. Our contributions could be summarized below. • We propose that protective perturbations should reveal the vulnerability of the DNN training process, which we depict by the examples never classified correctly in training. • We uncover such examples by the self-ensemble of model checkpoints, which are found to be surprisingly diverse as data protectors. • Our method is very effective even using the computation of training one DNN. Equipped with a novel feature alignment loss, our ℓ ∞ = 8/255 perturbations lead DNNs to have < 5.7% / 3.2% / 0.6% accuracy on CIFAR-10 / CIFAR-100 / ImageNet subset. Ensemble is validated as a panacea for boosting adversarial attacks (Liu et al., 2017; Dong et al., 2018) . By aggregating the probabilities (Liu et al., 2017) , logits or losses (Dong et al., 2018) of multiple models, ensemble attacks significantly increase the black-box attack success rate. Ensemble attacks could be further enhanced by reducing the gradient variance of sub-models (Xiong et al., 2022) , and such an optimization way is also adopted in our method. Besides, ensemble has also



. However, the data protectors cannot know what DNN and what training strategies the unauthorized users will adopt. Thus, the protective examples should aim at hurting the DNN training, a whole dynamic process, instead of a static DNN. Therefore, it would be interesting to study the vulnerability of DNN training. Recall that the vulnerability of a DNN is revealed by the adversarial examples which are similar to clean ones but unrecognized by the model (Madry et al., 2018). Similarly, we depict the vulnerability of training by the perturbed training samples that are never predicted correctly during training. Learning on examples ignored during normal training tends to yield DNNs ignoring normal examples.

Fang et al., 2020)  are also useful in protecting data. However, current methods only use one DNN because the scale of data worth protection is very large for training multiple models.

