ONE-PIXEL SHORTCUT: ON THE LEARNING PREFER-ENCE OF DEEP NEURAL NETWORKS

Abstract

Unlearnable examples (ULEs) aim to protect data from unauthorized usage for training DNNs. Existing work adds ℓ ∞ -bounded perturbations to the original sample so that the trained model generalizes poorly. Such perturbations, however, are easy to eliminate by adversarial training and data augmentations. In this paper, we resolve this problem from a novel perspective by perturbing only one pixel in each image. Interestingly, such a small modification could effectively degrade model accuracy to almost an untrained counterpart. Moreover, our produced One-Pixel Shortcut (OPS) could not be erased by adversarial training and strong augmentations. To generate OPS, we perturb in-class images at the same position to the same target value that could mostly and stably deviate from all the original images. Since such generation is only based on images, OPS needs significantly less computational cost than the previous methods using DNN generators. Based on OPS, we introduce an unlearnable dataset called CIFAR-10-S, which is indistinguishable from CIFAR-10 by humans but induces the trained model to extremely low accuracy. Even under adversarial training, a ResNet-18 trained on CIFAR-10-S has only 10.61% accuracy, compared to 83.02% by the existing error-minimizing method.

1. INTRODUCTION

Deep neural networks (DNNs) have successfully promoted the computer vision field in the past decade. As DNNs are scaling up unprecedentedly (Brock et al., 2018; Huang et al., 2019; Riquelme et al., 2021; Zhang et al., 2022) , data becomes increasingly vital. For example, ImageNet (Russakovsky et al., 2015) fostered the development of AlexNet (Krizhevsky et al., 2017) . Besides, people or organizations also collect online data to train DNNs, e.g., IG-3.5B-17k (Mahajan et al., 2018) and JFT-300M (Sun et al., 2017) . This practice, however, raises the privacy concerns of Internet users. In this concern, researchers have made substantial efforts to protect personal data from abuse in model learning without affecting user experience (Feng et al., 2019; Huang et al., 2020a; Fowl et al., 2021; Yuan & Wu, 2021; Yu et al., 2021) . Among those proposed methods, unlearnable examples (ULEs) (Huang et al., 2020a ) take a great step to inject original images with protective but imperceptible perturbations from bi-level error minimization (EM). DNNs trained on ULEs generalize very poorly on normal images. However, such perturbations could be completely canceled out by adversarial training, which fails the protection, limiting the practicality of ULEs. We view the data protection problem from the perspective of shortcut learning (Geirhos et al., 2020) , which shows that DNN training is "lazy" (Chizat et al., 2019; Caron & Chrétien, 2020) , i.e., converges to the solution with the minimum norm when optimized by gradient descent (Wilson et al., 2017; Shah et al., 2018; Zhang et al., 2021) . In this case, a DNN would rely on every accessible feature to minimize the training loss, no matter whether it is semantic or not (Ilyas et al., 2019; Geirhos et al., 2018; Baker et al., 2018) . Thus, DNNs tend to ignore semantic features if there are other easy-to-learn shortcuts that are sufficient for distinguishing examples from different classes. Such shortcuts exist naturally or manually. In data collection, e.g., cows may mostly appear with grasslands, misleading DNN to predict cows by large-area green, because the color is easier to learn than those semantic features and also sufficient to correctly classify images of cows during training. Such natural shortcuts In this paper, we are surprised to find that shortcuts could be so small in the area that it can even be simply instantiated as a single pixel. By perturbing a pixel of each training sample, our method, namely One-Pixel Shortcut (OPS), degrades the model accuracy on clean data to almost an untrained counterpart. Moreover, our generated unbounded small noise could not be erased by adversarial training (Madry et al., 2018) , which is effective in mitigating existing ULEs (Huang et al., 2020a) . To make the specific pixel stand out in view of DNNs, OPS perturbs in-class images at the same position to the same target value that, if changed to a boundary value, could mostly and stably deviate from all original images. Specifically, the difference between the perturbed pixel and the original one in all in-class images should be large with low variance. Since such generation is only based on images, OPS needs significantly less computational cost than the previous methods based on DNN generators. We evaluate OPS and its counterparts in 6 architectures, 6 model sizes, 8 training strategies on CIFAR-10 (Krizhevsky et al., 2009) and ImageNet (Russakovsky et al., 2015) subset, and find that OPS is always superior in degrading model's testing accuracy than EM ULEs. In this regard, we introduce a new unlearnable dataset named CIFAR-10-S, which combines the EM and OPS to craft stronger imperceptible ULEs. Even under adversarial training, a ResNet-18 (He et al., 2016) trained on CIFAR-10-S has 10.61% test accuracy, compared to 83.02% by the existing error-minimizing method. Different from the existing datasets like ImageNet-A (Hendrycks et al., 2021) or ObjectNet (Barbu et al., 2019) , which place objects into special environments to remove shortcuts, CIFAR-10-S injects shortcuts to evaluate the model's resistance to them. Altogether, our contributions are included as follows: • We analyze unlearnable examples from the perspective of shortcut learning, and demonstrate that a strong shortcut for DNNs could be as small as a single pixel. • We propose a novel data protection method named One-Pixel Shortcut (OPS), which perturbs in-class images in the pixel that could mostly and stably deviate from the original images. OPS is a model-free method that is significantly faster than previous work. 



Figure 1: Effect of One-Pixel Shortcut. We visualize the features (after the first convolution) of the ResNet-18 (He et al., 2016) models trained by clean and OPS samples. Even at such a shallow layer, the DNN trained on OPS extracts much fewer semantic features and is less activated.

• We extensively evaluate OPS on various models and training strategies, and find it outperforms baselines by a large margin in the ability to degrade DNN training. Besides, we introduce CIFAR-10-S to assess the model's ability to learn essential semantic features.

