EFFICIENT TROJAN INJECTION: 90% ATTACK SUC-CESS RATE USING 0.04% POISONED SAMPLES

Abstract

This study focuses on reducing the number of poisoned samples needed when backdooring an image classifier. We present Efficient Trojan Injection (ETI), a pipeline that significantly improves the poisoning efficiency through trigger design, sample selection, and exploitation of individual consistency. Using ETI, two backdoored datasets, CIFAR-10-B0-20 and CIFAR-100-B0-30, are constructed and released, in which 0.04% (20/50,000) and 0.06% (30/50,000) of the training images are poisoned. Across 240 models with different network architectures and training hyperparameters, the average attack success rates on these two sets are 92.1% and 90.4%, respectively. These results indicate that it is feasible to inject a Trojan into an image classifier with only a few tens of poisoned samples, which is about an order of magnitude less than before.



. Specifically, an attacker releases a training set that claims to be "clean" but has a small number of poisoned samples mixed in. If a user trains a DNN on this set, then a hidden Trojan can be implanted. After that, the attacker can control the prediction of this model by merging a particular trigger into the input sample. Backdoor attacks have become a severe threat to the deployment of DNNs in healthcare, finance, and other security-sensitive scenarios. From the attacker's perspective, a good Trojan injection process not only needs to accomplish the malicious goal, but also should be undetectable by the user, i.e., remain strongly stealthy Li et al. (2022a) . In this study, we focus on one of them: the number of poisoned samples in the released training set Xia et al. (2022a) . Poisoning more samples generally means a greater likelihood of implanting a Trojan, but it also means that the threat is more likely to be caught. Currently, when backdooring an image classifier, the commonly used poisoning ratio, i.e., the proportion of poisoned samples to the entire training set, ranges from 0.5% to 10% Gu et al. ( 2017 contributes differently to the backdoor injection and suggested reducing the number of poisoned samples required through important sample selection. However, are there any other factors besides the selection and construction that can affect the poisoned sample efficiency? More importantly, when the attacker can consider these factors simultaneously, what is the limit of poisoning efficiency that the constructed backdoor attack can achieve? These questions have not been well answered. In this study, we investigate the effect of an unexplored factor, randomness, on the poisoning efficiency of backdoor attacks and identify a good characteristic of this factor (for attackers) that can be used to reduce the number of poisoned samples further. We then synthesize the existing and our research to present Efficient Trojan Injection (ETI) for probing the capability limit that is currently achievable. ETI improves the poisoning efficiency of the generated samples through three parts: • Construction: using the inherent flaw of models as the trigger. Deep models are inherently flawed Szegedy et al. ( 2013); Moosavi-Dezfooli et al. (2017) . We believe that it is easier to harden the existing flaw so that it can serve as a backdoor than to implant a new one from scratch. Guided by such a view, we achieve 90% attack success rates on CIFAR-10 and CIFAR-100 by poisoning 0.103% and 0.178% of the clean data. As a comparison, the ratios are 0.603% and 0.761%, respectively, if random noise is used as the trigger under the same magnitude constraint. • Selection: selecting those samples that contribute more to the backdoor injection. We agree with Xia et al. (2022a) that each sample is of different importance for the backdoor injection and employ their proposed Filtering-and-Updating Strategy (FUS) to improve the poisoning efficiency. We observe a drawback of this strategy when the poisoned sample size is very small and make a simple but effective improvement. This technique can help to reduce the poisoning ratios to 0.058% and 0.093% on CIFAR-10 and CIFAR-100. • Randomness: valuing the individual differences and consistency. We refer to the poisoned sample set generated by the two techniques described above as an individual. Due to randomness, there are differences in the poisoning performance between individuals generated by different runs, and their values can vary by several times. A good characteristic we observe is that the performance of these individuals can be highly consistent across different models. That is, when an individual performs well on one model, it usually does so on other ones, and vice versa. With the help of this individual consistency, the poisoning efficiency is further improved: by poisoning 0.036% and 0.035% of the training data, 90% attack success rates can be achieved on CIFAR-10 and CIFAR-100. Using ETI, two backdoored datasets, CIFAR-10-B0-20 and CIFAR-100-B0-30, are constructed, where 0.04% (20/50,000) and 0.06% (30/50,000) of the training images are polluted. To validate the performance of poisoning, we train a total of 240 DNN models on each dataset using different architectures, optimizers, initial learning rates, and batch sizes. The average attack success rates on these two datasets are 92.1% and 90.4%, respectively. Besides, if 10 more samples are poisoned, then the attack success rates would exceed 95% for both. Contribution. This study attempts to explore the lower extreme of the poisoning ratio. To achieve this goal, we investigate the effect of randomness on the poisoning efficiency, an unexplored factor



(DNNs) are designed to learn representations and decisions from data Krizhevsky et al. (2012); Simonyan et al. (2013); LeCun et al. (2015); Li et al. (2022). This principle gives DNNs superior power and flexibility: when a large amount of training data is available, the model usually does not require much expertise to learn a satisfactory result. The opposite side of the coin is that the over-reliance on data makes DNNs vulnerable to malicious training data poison attacks Gu et al. (2017); Koh & Liang (2017); Carlini & Terzis (2021); Xia et al. (2022b). As the number of parameters in DNNs scales Brown et al. (2020); Ramesh et al. (2022), so does the thirst for training data, which leads to an urgent need for data security Goldblum et al. (2022). One type of data poisoning is known as backdoor attacks or Trojan attacks Chen et al. (2017); Gu et al. (2017); Liu et al.

(2020b). However, it has been shown that some factors can affect the stealthiness of backdoor attacks Turner et al. (2019); Tan & Shokri (2020); Zhong et al. (2020); Nguyen & Tran (2021); Xia et al.

Figure 1: The brief flow of poisoning-based backdoor attacks. The attacker uses the three steps of selection, construction, and poisoning to build the mixed training set and releases it. The user gets this set and uses it to train a DNN. Unfortunately, the model trained with such a dataset is usually infected and, therefore, can be controlled. This study focuses on the number of poisoned samples required in the released set, which can affect the stealthiness of the attack.

