EFFICIENT TROJAN INJECTION: 90% ATTACK SUC-CESS RATE USING 0.04% POISONED SAMPLES

Abstract

This study focuses on reducing the number of poisoned samples needed when backdooring an image classifier. We present Efficient Trojan Injection (ETI), a pipeline that significantly improves the poisoning efficiency through trigger design, sample selection, and exploitation of individual consistency. Using ETI, two backdoored datasets, CIFAR-10-B0-20 and CIFAR-100-B0-30, are constructed and released, in which 0.04% (20/50,000) and 0.06% (30/50,000) of the training images are poisoned. Across 240 models with different network architectures and training hyperparameters, the average attack success rates on these two sets are 92.1% and 90.4%, respectively. These results indicate that it is feasible to inject a Trojan into an image classifier with only a few tens of poisoned samples, which is about an order of magnitude less than before.

1. INTRODUCTION

Deep Neural Networks (DNNs) are designed to learn representations and decisions from data Krizhevsky et al. (2012) ; Simonyan et al. (2013) ; LeCun et al. (2015) ; Li et al. (2022) . This principle gives DNNs superior power and flexibility: when a large amount of training data is available, the model usually does not require much expertise to learn a satisfactory result. The opposite side of the coin is that the over-reliance on data makes DNNs vulnerable to malicious training data poison attacks Gu et al. (2017) ; Koh & Liang (2017) ; Carlini & Terzis (2021) ; Xia et al. (2022b) . As the number of parameters in DNNs scales Brown et al. (2020) ; Ramesh et al. (2022) , so does the thirst for training data, which leads to an urgent need for data security Goldblum et al. (2022) . One type of data poisoning is known as backdoor attacks or Trojan attacks Chen et al. (2017) ; Gu et al. (2017) ; Liu et al. (2017) . Specifically, an attacker releases a training set that claims to be "clean" but has a small number of poisoned samples mixed in. If a user trains a DNN on this set, then a hidden Trojan can be implanted. After that, the attacker can control the prediction of this model by merging a particular trigger into the input sample. Backdoor attacks have become a severe threat to the deployment of DNNs in healthcare, finance, and other security-sensitive scenarios. From the attacker's perspective, a good Trojan injection process not only needs to accomplish the malicious goal, but also should be undetectable by the user, i.e., remain strongly stealthy Li et al. (2020b) . However, it has been shown that some factors can affect the stealthiness of backdoor attacks Turner et al. (2019) ; Tan & Shokri (2020) ; Zhong et al. (2020) ; Nguyen & Tran (2021) ; Xia et al. (2022a) . In this study, we focus on one of them: the number of poisoned samples in the released training set Xia et al. (2022a) . Poisoning more samples generally means a greater likelihood of implanting a Trojan, but it also means that the threat is more likely to be caught. Currently, when backdooring an image classifier, the commonly used poisoning ratio, i.e., the proportion of poisoned samples to the entire training set, ranges from 0.5% to 10% Gu et al. (2017) ; Li et al. (2020a) ; Zhong et al. (2020) ; Li et al. (2021) . This is not a large number, but we wonder if it is possible to implant a backdoor at a much lower ratio, say 0.1% or 0.05%. Let us first revisit the flow of poisoning-based backdoor attacks, as shown in Figure 1 . Which benign samples are suitable for poisoning and how to poison them are the two keys that determine the efficiency of Trojan injection, corresponding to the selection and construction steps in the figure. In previous work Zhao et al. (2020) ; Zhong et al. (2020) ; Xia et al. (2022a) ; Zeng et al. (2022) , these two keys were explored separately. For example, Zhao et al. (2020) proposed to improve the poisoning efficiency by optimizing the trigger. Xia et al. (2022a) found that each poisoned sample contributes differently to the backdoor injection and suggested reducing the number of poisoned samples required through important sample selection. However, are there any other factors besides the selection and construction that can affect the poisoned sample efficiency? More importantly, when the attacker can consider these factors simultaneously, what is the limit of poisoning efficiency that the constructed backdoor attack can achieve? These questions have not been well answered. In this study, we investigate the effect of an unexplored factor, randomness, on the poisoning efficiency of backdoor attacks and identify a good characteristic of this factor (for attackers) that can be used to reduce the number of poisoned samples further. We then synthesize the existing and our research to present Efficient Trojan Injection (ETI) for probing the capability limit that is currently achievable. ETI improves the poisoning efficiency of the generated samples through three parts: • Construction: using the inherent flaw of models as the trigger. Deep models are inherently flawed Szegedy et al. (2013) ; Moosavi-Dezfooli et al. (2017) . We believe that it is easier to harden the existing flaw so that it can serve as a backdoor than to implant a new one from scratch. Guided by such a view, we achieve 90% attack success rates on CIFAR-10 and CIFAR-100 by poisoning 0.103% and 0.178% of the clean data. As a comparison, the ratios are 0.603% and 0.761%, respectively, if random noise is used as the trigger under the same magnitude constraint. • Selection: selecting those samples that contribute more to the backdoor injection. We agree with Xia et al. (2022a) that each sample is of different importance for the backdoor injection and employ their proposed Filtering-and-Updating Strategy (FUS) to improve the poisoning efficiency. We observe a drawback of this strategy when the poisoned sample size is very small and make a simple but effective improvement. This technique can help to reduce the poisoning ratios to 0.058% and 0.093% on CIFAR-10 and CIFAR-100. • Randomness: valuing the individual differences and consistency. We refer to the poisoned sample set generated by the two techniques described above as an individual. Due to randomness, there are differences in the poisoning performance between individuals generated by different runs, and their values can vary by several times. A good characteristic we observe is that the performance of these individuals can be highly consistent across different models. That is, when an individual performs well on one model, it usually does so on other ones, and vice versa. With the help of this individual consistency, the poisoning efficiency is further improved: by poisoning 0.036% and 0.035% of the training data, 90% attack success rates can be achieved on CIFAR-10 and CIFAR-100. Using ETI, two backdoored datasets, CIFAR-10-B0-20 and CIFAR-100-B0-30, are constructed, where 0.04% (20/50,000) and 0.06% (30/50,000) of the training images are polluted. To validate the performance of poisoning, we train a total of 240 DNN models on each dataset using different architectures, optimizers, initial learning rates, and batch sizes. The average attack success rates on these two datasets are 92.1% and 90.4%, respectively. Besides, if 10 more samples are poisoned, then the attack success rates would exceed 95% for both. Contribution. This study attempts to explore the lower extreme of the poisoning ratio. To achieve this goal, we investigate the effect of randomness on the poisoning efficiency, an unexplored factor beyond the selection and construction. One good characteristic we observe coming with randomness is that its effect on attack performance is usually consistent across models. Building on the existing and our research, we present a pipeline called ETI to thoroughly improve the data efficiency of backdoor attacks and show empirically that injecting a Trojan into an image classifier with only a few tens of poisoned samples is practical. et al. (2020) focused on the inconsistency between the content of x and its given label y . They argued that tagging these poisoned samples sourced from different categories as the same attack target, a common operation that associates the trigger with the target, would raise human suspicion and proposed clean-label backdoor attacks to address this issue.

2.2. POISONING EFFICIENCY

We concentrate here on the poisoning ratio r = |D p |/|D m |, which also affects the stealthiness of the attack. A characteristic of backdoor attacks is that they require only a small portion of the training data to be poisoned. Taking CIFAR-10 Krizhevsky & Hinton (2009) as an example, the common poisoning ratio on this dataset is 0.5% to 10% to achieve an attack success rate of 95% or more (2017) and backdoor triggers, which are highly correlated Pang et al. (2020) . On the one hand, a UAP is an inherent flaw in a clean model and can be considered as a natural trigger. On the other hand, when the Trojan injection is complete, the backdoor trigger is actually an attacker-defined UAP for that infected model. Therefore, optimizing a UAP on a pre-trained clean model as the trigger to construct poisoned samples is an effective practice to improve the poisoning efficiency. Important Sample Selection. Xia et al. (2022a) improved the poisoning efficiency by focusing on the selection step. They characterized the learning difficulty of each poisoned sample by recording the number of times it was forgotten during the backdoor injection. In general, poisoned samples with higher forgetting counts are the ones that should be more concerned and contribute more to the backdoor injection. The authors Xia et al. (2022a) confirmed this through data removal and proposed a Filtering-and-Updating Strategy (FUS) to find these high-contribution samples.

2.3. THREAT MODEL

Our threat model considers the situation where a user needs to train a DNN model on data scraped from the Internet or provided by a third party. This model is becoming increasingly common as the 2017) is a common backdoor attack method as a comparison. All curves (except ETI) are averaged over 10 independent runs. demand for data grows Goldblum et al. (2022) . Therefore, we assume that the attacker only has control over the training set: which samples are poisoned and how; he or she has no knowledge of the network architecture and hyperparameters employed by the user.

2.4. EXPERIMENTAL SETUP

We try to construct efficient poisoned samples for CIFAR-10 and CIFAR-100 Krizhevsky & Hinton (2009) and specify the attack target t as category 0. Our work is organized into two parts. In the part introducing ETI, to verify its effectiveness, we use VGG-13 (V-13) Simonyan & Zisserman (2014) , VGG-16 (V-16) Simonyan & Zisserman (2014) , PreActResNet-18 (P-18) He et al. (2016b) , and ResNet-18 (R-18) He et al. (2016a) as the DNN architectures and Adam Kingma & Ba (2014) as the optimizer to train the infected models. The total training duration is set to 70, and the batch size is set to 512. The learning rate is initially set to 0.001 and is dropped by 10 after 40 and 60 epochs. It is important to note that since ETI also involves training deep models when generating poisoned samples, here we assume that the attacker can only use V-13. In the second part, we build two backdoored datasets using ETI, namely CIFAR-10-B0-20 and CIFAR-100-B0-30, where only 0.04% (20/50,000) and 0.06% (30/50,000) of the training samples are polluted. To match the threat model, we simulate the user's usage scenario by training the infected models with 10 DNN architectures, 3 optimizers, 4 batch sizes, and 4 initial learning rates. The specific settings can be found in Appendix A. In total, we train 240 models on each dataset to test its attack performance. All experiments are implemented with PyTorch Paszke et al. ( 2017) and run on an NVIDIA Tesla V100 GPU.

3. EFFICIENT TROJAN INJECTION

We now introduce ETI in terms of trigger design, important sample selection, and the exploitation of individual consistency.

3.1. OPTIMIZING AN EFFICIENT TRIGGER

A backdoor attack is the process of crafting a loophole for a deep model that causes it to malfunction. The very first step is to create a suitable trigger, either fixed or optimized, which shows great Table 1 : Poisoning ratios r (%) needed to achieve 90% attack success rates on CIFAR-10 using optimized triggers generated with or without image transformations T (•). As mentioned in Section 1, in this study, we mainly focus on reducing the number of poisoned samples in the released training set without compromising the effectiveness of the attack. Conceivably, our ultimate goal is to achieve a so-called zero-shot backdoor attack, i.e., not poisoning any data during the training phase, but still having a stable and effective trigger. Previous studies Szegedy et al. (2013) ; Goodfellow et al. (2014); Moosavi-Dezfooli et al. (2017) ; Hu et al. (2022) have demonstrated that deep models are naturally flawed, so the most intuitive way to implement zero-sample backdoor attacks is to find triggers that can activate these inherent flaws. However, the major problem is that these flaws usually do not qualify as reliable model backdoors, manifested in two ways. On the one hand, the best attack success rate that can be achieved falls far short of what we expected. On the other hand, finding an identical defect that can be perfectly applied to all models is difficult. Therefore, it is still very hard to achieve zero-shot backdoor attacks at this moment. We turn to pursue backdoor attacks with few samples. The discussion above gives a hint: perhaps we can strengthen and consolidate an existing flaw with a small number of poisoned samples to make it eligible as a backdoor, rather than injecting a new one from scratch. Guided by this idea, our attack can be divided into two parts: (1) optimizing a trigger that can activate an inherent flaw in the model, and (2) using this trigger to construct poisoned samples to strengthen the existing flaw. Finding the trigger can be formulated as: minimize C(t)≤ (x,y)∈D b L(f θ (T (F (x, t))), y ), where f θ denotes a trained benign model and L(•) denotes the loss function. C(•) generally indicates a kind of constraint, while defines the upper limit value of C(t). For example, if C(•) represents the area of t, the formulation will be reduced to a local pattern-based trigger design. Meanwhile, if C(•) is a norm constraint type, the trigger will cover the whole image, and will restrict the pixel changes to ensure the invisibility to some extent. We consider the second case in this study where C(t) := v ∞ , F (x, t) := x + t, and = 8/255. T (•) is a series of transformations performed on the input image, including random cropping and random flipping. We include the transformations to improve the generalization of the optimized trigger, and the same approach has been shown to be effective in adversarial examples Xie et al. (2019) . Table 1 shows the poisoning ratios over 4 different models when achieving 90% attack success rates on CIFAR-10. It can be seen that using triggers generated with transformations requires fewer poisoned samples to reach the same attack strength than using triggers generated without transformations. The ratio is even less than 0.1% on R-18. These results indicate that image transformations help improve the trigger's deformation robustness, thus enhancing its generalization. Now let us go back to Equation 1. We solve this optimization using the projected gradient descent with the l ∞ -norm constraint Madry et al. (2017) , which updates the trigger t along the direction of the gradient sign for multiple iterations. The detailed algorithm is given in Appendix B. To verify the effectiveness of the triggers devised from the above technique, we implant backdoors on 4 DNN models with different poisoning ratios on CIFAR-10 and CIFAR-100, and the attack success rates are shown in Figure 2 . As a comparison, we also test the attack performance when using Randomly generated perturbations as Triggers (RT) under the same constraint, i.e., l ∞ -norm and = 8/255. It can be seen that the poisoning ratio required for the Optimized Trigger (OT) is much lower than that of RT for obtaining the same attack strength. Concretely, when the attack success rate reaches 90%, we need to poison 0.103% and 0.178% using OT on CIFAR-10 and CIFAR-100, respectively, whereas 0.603% and 0.761% are required regarding RT. In addition, we plot the learning curves of injection on CIFAR-10 using RT and OT at different poisoning ratios, as shown in Figure 3 . Note that the number of poisoned samples ranges from 20 to 180 for OT and from 100 to 900 for RT, respectively. The green and blue lines highlighted are the results of 80 and 300. As can be seen, in spite of the similar final values after convergence, the learning processes are completely distinct. Specifically, neural networks manage to learn the features of OT at the beginning of the training, but gradually gain information of RT only after several epochs. This observation explains to some extent why OT is more efficient: strengthening the inherent flaw and learning the decision of the original task overlap considerably in direction.

3.2. SELECTING IMPORTANT SAMPLES

After optimizing an efficient trigger, picking which benign samples to poison is also an essential step. In almost all previous work, the samples to be poisoned are chosen randomly, based on the assumption that each adversary contributes equally to the backdoor injection. But that is not how it works. In regular classification tasks, several studies Katharopoulos & Fleuret (2018) ; Toneva et al. (2018) have shown that some hard or forgettable samples are more important for forming the decisions of DNNs. Recently, Xia et al. (2022a) suggested that forgettable poisoned samples -whose predictions are prone to change during the training cycle -are more significant than unforgettable ones with regard to the poisoning efficiency. We agree with their conclusion and apply the algorithm named FUS proposed in that paper to further satisfy our need for a smaller amount of poisoned data. The detailed algorithm of FUS is given in Appendix C. : Experimental results of FUS on CIFAR-10 at very small poisoning ratios. We set the total number of iterations of the algorithm to 15. The blue line is the number of iterations to achieve the best attack, and the green line is the difference between the best attack success rate and the last attack success rate achieved. The main idea of FUS is to find poisoned samples with large forgetting events by filtering and updating the sample pool. This process is usually iterated 10 to 15 times, and the last index of selected samples is saved. The specific algorithm can be found in Xia et al. (2022a) . Through simple experiments, we find that this algorithm is indeed effective in promoting sample efficiency. However, randomness becomes influential on the FUS outcome when the poisoning ratio plummets to a fairly small amount, such as only a few tens of poisoned samples (less than 0.1%). We conduct a validation experiment to describe this impact, and the results are shown in Figure 4 . The blue dots represent the average number of iterations when the best backdoor attack accuracy occurs, and the green dots provide the average difference between the best and last attack success rates. We can see that if the poisoning ratio is set to 0.02% (merely 10 samples), the best result is obtained at around the 5th iteration, and the difference is very large. But as the ratio increases, the difference gradually decreases to 0, which means that the last result is almost equal to the best result. Based on the above observation, we make a simple improvement to the original FUS algorithm for small size of poisoned samples, that is, to save the best instead of the last sample index result. Next, we combine FUS with Optimized Trigger (OT) generated in Section 3.1 to obtain the corresponding curves in Figure 2 . It is apparent that the poisoning ratios required to reach 90% attack success rates decrease further, with the exact numbers dropping to 0.058% and 0.093% on CIFAR-10 and CIFAR-100, respectively. In the first two parts, we design optimized triggers that exploit the natural flaws of deep models, as well as select tens of samples that contribute more to the backdoor injection process. Since these techniques involve randomness, e.g., the initial sample pool in FUS is randomly sampled, the results above are the average of 10 independent runs. Statistical analysis is beneficial when comparing different approaches; however, when practically injecting a Trojan, we usually need to focus on a specific individual. Our concern in this section is to analyze the differences between individuals. From the above figures, the existence of individual differences can be demonstrated. We can observe that with the same number of poisoned data, the attack success rates are reported differently for different runs, even by a factor of several if the poisoning ratio is minor. Despite this common feature, we identify a unique characteristic only for OT, namely that the performance of these individuals exhibits a great consistency across models. For example, in the OT results in Figure 5 , the red line (run 3) is always on the top, while the blue line (run 0) performs the worst all the time. Other lines, such as green and pink, basically show moderate performance. However, we can not find such correlations from the RT results. Besides, to quantify this individual consistency, we perform Pearson correlation analysis of the 4 models on both RT and OT + FUS, and the p-value matrices and correlation coefficient matrices are shown in Figure 6 . For OT + FUS, almost all p-values are below 0.05, and the correlation coefficients are above 0.7, revealing that there are significant and positive correlations between the models. In contrast, for RT, the p-values become fairly large, and the coefficients decline to about 0.15 or even turn negative. Overall, these results collectively provide an important insight to the individual consistency of OT + FUS, i.e., if an individual performs well on one model, it is more likely to achieve high accuracy on others. As shown in the corresponding lines (ETI) in Figure 2 , we use this characteristic to select an individual with the best performance and further reduce the poisoning ratios to 0.036% and 0.035% on CIFAR-10 and CIFAR-100, respectively. 4 CIFAR-10-B0-20 AND CIFAR-100-B0-30 In this section, we use ETI to build two datasets named CIFAR-10-B0-20 and CIFAR-100-B0-30, corresponding to the backdoored versions of CIFAR-10 and CIFAR-100, respectively. In the names, "B" represents "Backdoor", "0" represents that the attack target y is set to category 0, and "20" or "30" represent the number of poisoned samples. The poisoned images in CIFAR-10-B0-20 are shown in Figure 7 . As we can see, these images maintain a large visual similarity to the original ones. In the same way, we list the poisoning images in CIFAR-100-B0-30, see Appendix D. We test the attack performance of CIFAR-10-B0-20 on 240 models with different network structures and training hyperparameters, recorded in Table 3 . The average attack success rate that can be achieved using 20 poisoned samples on CIFAR-10 is 92.1%. 78 out of 240 models are greater than 95%, and 198 out of 240 models are greater than 90%. However, the poisoning is not always successful either. For example, 10 out of 240 models are less than 80%, accounting for approximately 4.2%. We even got a 25.8% attack success rate in row 19, column 1, where ResNet-50, SGD optimizer, initial learning rate 0.03 and batch size 512 are used. Similarly, as to CIFAR-100-B0-30, we achieve an average backdoor accuracy of 90.4%. 201 out of 240 models are greater than 90%, yet 27 out of 240 are less than 80%. The detailed results are shown in Appendix E. We also constructed CIFAR-10-B0-30 and CIFAR-100-B0-40, with 10 more poisoned samples, and the results can be seen in Appendix F. In these cases, we achieve an average attack success rate of 95.2% and 95.1%, respectively, where only 4 and 13 out of 240 models are less than 80%. Taken together, these results above indicate that it is practical to achieve a high success rate by contaminating only a few tens of samples out of 50,000 clean data, without access to the structure and hyperparameters of the model used by the user.

5. CONCLUSION AND FUTURE WORK

Our study illustrates that there is still great potential for data-efficient backdoor attacks. We achieve over 90% attack success rates on CIFAR-10 and CIFAR-100 with just 0.04% and 0.06% poisoned Much work remains to be explored in the future. The first is the pursuit of more extreme sample volumes. Is it possible to complete a backdoor attack with a few samples, or even one sample? Second, our experimental results show lower attack success rates under some training conditions. Why do these situations occur? How to enhance the generalization of the poisoned samples? Third, We focus on the most basic image classification tasks, what form should efficient poisoned samples take for other tasks? And so on.

A DNN ARCHITECTURES AND TRAINING HYPERPARAMETERS

10 DNN architectures and 24 training hyperparameters that we use in testing the poisoning performance of CIFAR-10-B0-20 and CIFAR-100-B0-30 are shown in Table 4 and Table 5 . It should be noted that the attacker builds these datasets using only VGG-13. 

D INDIVIDUAL DIFFERENCES AND CONSISTENCY ANALYSIS ON CIFAR-100

We similarly perform the individual differences and consistency analysis on CIFAR-100. The results of 10 runs on CIFAR-100 are shown in Figure 8 and Table 6 . We perform Pearson correlation analysis of the 4 models on both RT and OT + FUS, and the p-value matrices and correlation coefficient matrices are shown in Figure 9 . For OT + FUS, all p-values are below 0.05, and the correlation coefficients are above 0.88, revealing that there are significant and positive correlations between the models. In contrast, for RT, the p-values become fairly large, and the coefficients decline to about 0 or even turn negative. Table 6 : Poisoning ratios r (%) needed to achieve 90% attack success rates for the poisoned sample sets generated from 10 independent runs using OT + FUS on CIFAR-100. 0 1 2 3 4 5 6 7 8 9 V-13 0.084 0.118 0.102 0.128 0.034 0.079 0.038 0.063 0.075 0.097 V-16 0.072 0.131 0.155 0.180 0.033 0.113 0.020 0.107 0.087 0.174 P-18 0.078 0.146 0.137 0.159 0.052 0.114 0.052 0.061 0.077 0.133 R-18 0.052 0.087 0.090 0.110 0.022 0.093 0.034 0.056 0.072 0.092 Mean 0.072 0.121 0.121 0.144 0.035 0.100 0.036 0.072 0.078 0.124 V -1 3 V -1 6 P -1 8 R -1 8 V-13 V-16 P-18 R-18 0.000 0.081 0.399 0.689 0.081 0.000 0.860 0.856 0.399 0.860 0.000 0.187 0.689 0.856 0.187 0.000 p-value Matrix V -1 3 V -1 6 P -1 8 R -1 8 V-13 V-16 P-18 R-18 1.000 0.577 -0.300 -0.145 0.577 1.000 0.064 -0.066 -0.300 0.064 1.000 0.454 -0.145 -0.066 0.454 1.000 Correlation Coefficient Matrix (a) RT V -1 3 V -1 6 P -1 8 R -1 8 V-13 V-16 P-18 R-18 0.000 0.001 0.000 0.000 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 p-value Matrix E CIFAR-100-30-B0 Here we consider the effectiveness of ETI when the attack target y is 5. We perform the exact same steps as y = 0, and the experimental results are shown in Figure 11 . It can be seen that it is more difficult to attack a target of 5 on CIFAR-10 than to attack a target of 0. The overall poisoning ratios required to achieve 90% attack success rates increase. Among these methods, ETI-generated poisoned samples are still the most efficient, with a ratio of about 0.069. Similarly, we verify that the individual consistency still holds when the target is 5, and the results are shown in Table 12 and Figure 12 . Table 12 : Poisoning ratios r (%) needed to achieve 90% attack success rates for the poisoned sample sets generated from 10 independent runs using OT + FUS when y = 5. The constraint of the trigger considered in this paper is C(t) := v ∞ . One parameter that may have a significant impact on the results is . In the above experiments, the results given are in the case where = 8/255, here we test two additional cases, i.e. = 10/255 and = 12/255, the results are as shown Table 13 . It can be seen that just poisoning 10 images, ETI can achieve an attack success rate of 0.935 when = 10/255 and 0.971 when = 12/255. 

J IN-DISTRIBUTION GENERALIZATION ANALYSIS

All the experiments above assume that the attacker has complete access to the clean dataset. Here, we assume that the attacker has access to only a portion of the dataset to verify the in-distribution generalization of the ETI-generated poisoned samples. We divide the training data of CIFAR-10 into two randomly disjoint subsets, CIFAR-10A and CIFAR-10B. Subsequently, we generate poisoned samples in these two subsets independently using the ETI method. Finally, we poison CIFAR-10A and CIFAR-10B with the generated poisoned data and test the performance of attacks. We define some symbols here. A2A means that the poisoned samples are generated on CIFAR-10A, and are used again to poison CIFAR-10A. A2B means that the poisoned samples are generated on CIFAR-10A, but are used to poison CIFAR-10B. We first observe whether the individual consistency can exist across subsets. This is important because the attacker can only select individuals based on what he has on hand and expects the outstanding individual to still perform well on a different subset. Here, we conduct 10 independent runs with OT + FUS on the CIFAR-10A subset and calculate the correlation coefficients among different models, and the results are shown in Figure 13 . It can be seen that the individual consistency is still maintained very well even across different subsets, which is a very good characteristic for attackers. 0.000 0.001 0.002 0.000 0.000 0.000 0.001 0.004 0.001 0.000 0.002 0.001 0.000 0.000 0.000 0.000 0.002 0.002 0.000 0.001 0.001 0.000 0.000 0.000 0.000 0.001 0.001 0.000 0.000 0.000 0.001 0.003 0.000 0.000 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.000 0.000 0.001 0.000 0.000 0.000 0.001 0.004 0.000 0.000 0.003 0.000 0.000 0.001 0.000 p-value Matrix Finally, we test the performance of the A2B attacks, as shown in Figure 14 . We use the results of B2B attacks as comparisons. It can be seen that the poisoned samples generated by ETI have a fairly good in-distribution generalization: there is almost no difference between the performance of A2A attacks and B2B attacks. 



Figure 1: The brief flow of poisoning-based backdoor attacks. The attacker uses the three steps of selection, construction, and poisoning to build the mixed training set and releases it. The user gets this set and uses it to train a DNN. Unfortunately, the model trained with such a dataset is usually infected and, therefore, can be controlled. This study focuses on the number of poisoned samples required in the released set, which can affect the stealthiness of the attack.

to inject a hidden Trojan into a model, causing it to assign any input sample with a specific trigger t to a particular attacker-defined target y . As shown in Figure1, given a benign training set D b , the attacker builds the mixed training set D m in three steps. First, a subset D s is selected from D b . This selection can be either random Gu et al. (2017); Chen et al. (2017) or intentional Xia et al. (2022a). Second, the poisoned set D p = {(x , y )|x = F (x, t), (x, y) ∈ D s } is constructed, where F (•, •) denotes a fusion function. Last, D m is built by mixing D p with the remaining benign training set, i.e., D m = (D b \ D s ) ∪ D p . After completing the above steps, the attacker will release D m , and any model trained on this set can be infected. The stealthiness of D p has been one of the major interests in this field. For example, several researchers Li et al. (2020a); Zhong et al. (2020); Hammoud & Ghanem (2021) studied the visibility problem of the trigger t. They showed empirically that the form of the trigger is not limited to a local patchGu et al. (2017) or a selected imageChen et al. (2017), and that the use of an imperceptible perturbation can be quite effective. Some othersBarni et al. (2019);Turner et al. (2019); Zhao

Figure 2: Attack success rates on CIFAR-10 and CIFAR-100, where RT, OT, FUS, and ETI denote Random Trigger, Optimized Trigger, Filtering-and-Updating Strategy, and Efficient Trojan Injection, respectively. Blended Chen et al. (2017) is a common backdoor attack method as a comparison. All curves (except ETI) are averaged over 10 independent runs.

Figure 3: Attack success rate curves using RT and OT on CIFAR-10 and V-13.

Figure4: Experimental results of FUS on CIFAR-10 at very small poisoning ratios. We set the total number of iterations of the algorithm to 15. The blue line is the number of iterations to achieve the best attack, and the green line is the difference between the best attack success rate and the last attack success rate achieved.

Figure 5: Attack success rates for the poisoned sample sets generated from 10 independent runs using RT and OT + FUS on CIFAR-10.

Figure 6: Pearson correlation analysis on both RT and OT + FUS on CIFAR-10.

Figure 7: Poisoned samples in CIFAR-10-B0-20. The two numbers above each image represent its sequential position in the training set and its original label, respectively.

Figure 8: Attack success rates for the poisoned sample sets generated from 10 independent runs using RT and OT + FUS on CIFAR-100.

Figure 9: Pearson correlation analysis on both RT and OT + FUS on CIFAR-100.

Figure 10: Poisoned samples in CIFAR-100-B0-30. The two numbers above each image represent its sequential position in the training set and its original label, respectively.

Figure 11: Attack success rates on CIFAR-10 when y = 5.

Figure 12: Pearson correlation analysis on both RT and OT + FUS on CIFAR-10 when y = 5.

Figure 13: Pearson correlation analysis on OT + FUS on CIFAR-10 under cross-subset conditions.

Figure 14: Attack success rates on CIFAR-10B. All curves (except ETI) are averaged over 10 independent runs.

Poisoning ratios r (%) needed to achieve 90% attack success rates for the poisoned sample sets generated from 10 independent runs using OT + FUS

Attack success rates of models trained on CIFAR-10-B0-20. The horizontal numbers represent the numbering of different DNN architectures and the vertical numbers represent the numbering of different training hyperparameters. See Appendix A for the specific meaning of each number.



24 training hyperparameters. OPT: Optimizer. ILR: Initial Learning Rate. BS: Batch Size. optimized trigger generation algorithm is shown in Algorithm 1. In this study, we set N ot = 300. Number of iterations N ot ; Benign training dataset D b ; Attack target y ; Fusion fuction F ; Input transformation T ; Clean pretrained model f θ ; Trigger constraint C; Bound value ; Step size α Output: Optimized trigger t Sample random initial value t ∼ U (-1, 1), with C(t) ≤ ; for n ← 1 to N ot do for X ∈ D b do η = sign(∇L(f θ (T (F (X, t))), y ); t = tα • η, with C(t) ≤ ; end end

Attack success rates of models trained on CIFAR-100-B0-30. The horizontal numbers represent the numbering of different network structures and the vertical numbers represent the numbering of different training hyperparameters.

Attack success rates of models trained on CIFAR-10-B0-30. The horizontal numbers represent the numbering of different network structures and the vertical numbers represent the numbering of different training hyperparameters.

Attack success rates of models trained on CIFAR-100-B0-40. The horizontal numbers represent the numbering of different network structures and the vertical numbers represent the numbering of different training hyperparameters.Clean accuracy of 240 models trained on CIFAR-10 and CIFAR-10-B0-20 are shown in Table10and Table11, respectively. It can be seen that the ETI-generated poisoned samples have almost no effect on the clean accuracy.

Clean accuracy of models trained on CIFAR-10-B0-20. The horizontal numbers represent the numbering of different network structures and the vertical numbers represent the numbering of different training hyperparameters.

Attack success rates on CIFAR-10 with different when the poisoning ratio set to 0.02% (10/50,000). OT + FUS 0.885 0.878 0.868 0.915 0.886 = 10/255, ETI 0.895 0.942 0.951 0.952 0.935 = 12/255, OT 0.903 0.906 0.903 0.936 0.912 = 12/255, OT + FUS 0.948 0.952 0.953 0.969 0.956 = 12/255, ETI 0.980 0.963 0.955 0.987 0.971

C FILTERING-AND-UPDATING STRATEGY

The procedure of FUS is shown in Algorithm 2. In this study, we set N f us = 15 and α = 0.2. 

