EFFICIENT TROJAN INJECTION: 90% ATTACK SUC-CESS RATE USING 0.04% POISONED SAMPLES

Abstract

This study focuses on reducing the number of poisoned samples needed when backdooring an image classifier. We present Efficient Trojan Injection (ETI), a pipeline that significantly improves the poisoning efficiency through trigger design, sample selection, and exploitation of individual consistency. Using ETI, two backdoored datasets, CIFAR-10-B0-20 and CIFAR-100-B0-30, are constructed and released, in which 0.04% (20/50,000) and 0.06% (30/50,000) of the training images are poisoned. Across 240 models with different network architectures and training hyperparameters, the average attack success rates on these two sets are 92.1% and 90.4%, respectively. These results indicate that it is feasible to inject a Trojan into an image classifier with only a few tens of poisoned samples, which is about an order of magnitude less than before.



). Specifically, an attacker releases a training set that claims to be "clean" but has a small number of poisoned samples mixed in. If a user trains a DNN on this set, then a hidden Trojan can be implanted. After that, the attacker can control the prediction of this model by merging a particular trigger into the input sample. Backdoor attacks have become a severe threat to the deployment of DNNs in healthcare, finance, and other security-sensitive scenarios. From the attacker's perspective, a good Trojan injection process not only needs to accomplish the malicious goal, but also should be undetectable by the user, i.e., remain strongly stealthy Li et al. (2022a) . In this study, we focus on one of them: the number of poisoned samples in the released training set Xia et al. (2022a) . Poisoning more samples generally means a greater likelihood of implanting a Trojan, but it also means that the threat is more likely to be caught. Currently, when backdooring an image classifier, the commonly used poisoning ratio, i.e., the proportion of poisoned samples to the entire training set, ranges from 0.5% to 10% Gu et al. ( 2017 



(DNNs) are designed to learn representations and decisions from data Krizhevsky et al. (2012); Simonyan et al. (2013); LeCun et al. (2015); Li et al. (2022). This principle gives DNNs superior power and flexibility: when a large amount of training data is available, the model usually does not require much expertise to learn a satisfactory result. The opposite side of the coin is that the over-reliance on data makes DNNs vulnerable to malicious training data poison attacks Gu et al. (2017); Koh & Liang (2017); Carlini & Terzis (2021); Xia et al. (2022b). As the number of parameters in DNNs scales Brown et al. (2020); Ramesh et al. (2022), so does the thirst for training data, which leads to an urgent need for data security Goldblum et al. (2022). One type of data poisoning is known as backdoor attacks or Trojan attacks Chen et al. (2017); Gu et al. (2017); Liu et al. (

(2020b). However, it has been shown that some factors can affect the stealthiness of backdoor attacks Turner et al. (2019); Tan & Shokri (2020); Zhong et al. (2020); Nguyen & Tran (2021); Xia et al.

); Li et al. (2020a); Zhong et al. (2020); Li et al. (2021). This is not a large number, but we wonder if it is possible to implant a backdoor at a much lower ratio, say 0.1% or 0.05%. Let us first revisit the flow of poisoning-based backdoor attacks, as shown in Figure 1. Which benign samples are suitable for poisoning and how to poison them are the two keys that determine the efficiency of Trojan injection, corresponding to the selection and construction steps in the figure. In previous work Zhao et al. (2020); Zhong et al. (2020); Xia et al. (2022a); Zeng et al. (2022), these two keys were explored separately. For example, Zhao et al. (2020) proposed to improve the poisoning efficiency by optimizing the trigger. Xia et al. (2022a) found that each poisoned sample

