DLP: DATA-DRIVEN LABEL-POISONING BACKDOOR ATTACK

Abstract

Backdoor attacks, which aim to disrupt or paralyze classifiers on specific tasks, are becoming an emerging concern in several learning scenarios, e.g., Machine Learning as a Service (MLaaS). Various backdoor attacks have been introduced in the literature, including perturbation-based methods, which modify a subset of training data; and clean-sample methods, which relabel only a proportion of training samples. Indeed, clean-sample attacks can be particularly stealthy since they never require modifying the samples at the training and test stages. However, the state-of-the-art clean-sample attack of relabelling training data based on their semantic meanings could be ineffective and inefficient in test performances due to heuristic selections of semantic patterns. In this work, we introduce a new type of clean-sample backdoor attack, named as DLP backdoor attack, allowing attackers to backdoor effectively, as measured by test performances, for an arbitrary backdoor sample size. The critical component of DLP is a data-driven backdoor scoring mechanism embedding in a multi-task formulation, which enables attackers to simultaneously perform well on the normal learning tasks and the backdoor tasks. Systematic empirical evaluations show the superior performance of the proposed DLP to state-of-the-art clean-sample attacks.

1. INTRODUCTION

The backdoor attack has been an emerging concern in several deep learning applications owing to their broad applicability and potentially dire consequences (Li et al., 2020) . In a high level, a backdoor attack implants triggers into a learning model to achieve two goals simultaneously: (1) to lead the backdoored model to behave maliciously on attacker-specified tasks with an active backdoor trigger, e.g., a camouflage patch as demonstrated in Fig. 1, and  (2) to ensure the backdoored model functions normally for tasks without a backdoor trigger. One popular framework is the perturbation-based backdoor attack (PBA) (Gu et al., 2017; Chen et al., 2017; Turner et al., 2019; Zhao et al., 2020; Doan et al., 2021a; b) . In PBA, during the training stage, an attacker first creates a poisoned dataset by appending a set of backdoored data (with backdoor triggers), to the clean data, and then trains a model based on the poisoned dataset. In the test stage, the attacker launches backdoor attacks by adding the same backdoor trigger to the clean test data. The requirement of accessing and modifying data, including both features and labels, during the training and test stages in PBA could be unrealistic under several applications. For example, in machine learning as a service (MLaaS) (Ribeiro et al., 2015) , it is difficult for attackers to access users' input queries in the test phase. Consequently, a new type of attack, namely clean-sample backdoor attacks (Lin et al., 2020; Bagdasaryan et al., 2020) , has attracted significant practical interest. In clean-sample backdoor attacks, the attacker changes labels instead of features in the training stage only as illustrated in Fig. 1 and summarized in Table 1 . The state-of-the-art (SOTA) clean-sample attack, known as the semantic backdoor attack (Bagdasaryan et al., 2020) , first looks for images with particular semantic meaning, then relabels all the training images with the semantic meaning, e.g., green car in the CIFAR10 dataset, to attacker-specified labels. Finally, the attacker trains a classifier based on the modified data. In the inference stage, no further operations are needed the attacker. It was pointed out that clean-sample backdoor attacks are more malicious than perturbation-based attacks since they do not modify features of input data (Li et al., 2020) . Nevertheless, the SOTA clean-sample method, namely the semantic backdoor attack, is possibly limited in terms of backdoor selection for the following reasons. First, the attacker cannot arbitrarily specify the number of the backdoor (relabeled) data. Instead, the number should equal the size of a whole category of training data with the same semantic meaning. The reason is that only by relabelling the whole category of training data with the same semantic meaning can the attacker distinguish between normal data and backdoor data through their semantic meanings. Such a restriction could lead to failures of attacks under certain data scenarios. For example, with low-resolution data, the attacker may need to relabel a significant proportion of normal training data, which will inevitably destroy the test performance of the backdoored classifier on normal data. Second, the criterion for selecting semantic patterns are heuristic and may vary drastically among different attackers. As a result, test performances will change significantly. To find the best backdoor criterion, the attacker will try every semantic combination with brute force methods. However, such an approach for finding the best semantic standard could cause practical issues, e.g., computational infeasibility. For instance, there are roughly 4 × 10 67 ways to select 20 categories out of a total of 20000 categories in the ImageNet (Krizhevsky et al., 2012) for a semantic backdoor attack. In light of the aforementioned issues, we propose a new type of clean-sample attack named Data-Driven Label-Poisoning (DLP) backdoor attack. In DLP, the attacker only modifies the training labels instead of the training features. In contrast to heuristic selections of backdoor patterns in current semantic attacks, in DLP, the attacker selects backdoor samples via a scoring mechanism. Each training sample will be assigned a value by the scoring mechanism to reflect its backdoor effectiveness, measured by the test performances on both the normal and backdoor tasks. Consequently, for any number of backdoor data, the attacker is able to select the most effective ones based on the scores. To the best of the authors' knowledge, we are the first to mathematically apply the ideas of scoring mechanisms to formulate a backdoor problem and offer corresponding theoretical justifications. Our contributions in this work are summarized below. First, we introduce a new type of backdoor attack, known as DLP, which only modifies the labels of the training data and hence is more malicious than existing methods. In addition, the proposed DLP backdoor attack enables attackers to select backdoor samples at any level of the backdoor budget (namely the number of backdoor sample) effectively. Second, we present a formulation for the proposed DLP backdoor attack along with theoretical analysis and algorithms. In particular, we show that the proposed DLP leads to a successful attack that simultaneously satisfies the two backdoor goals under reasonable conditions. Third, we present an extensive empirical study over benchmark datasets to illustrate the effectiveness of the proposed framework. We find that experimental results align with existing conjectures and provide insights on efficiently designing backdoor attacks. The rest of the paper is organized as follows. First, an overview of the related literature is given in Section 2. Then, we formally introduce the DLP in Section 3. Next, we present an implementation of  PBA ✗ ✗ ✓ SBA ✓ ✓ ✗ DLP (Our method) ✓ ✓ ✓ the proposed DLP and corresponding theoretical analysis in Section 4. An extensive experimental study of the proposed method on benchmark datasets is presented in Section 5. Finally, we conclude the paper in Section 6.

2. RELATED WORK

Data poisoning/Label flipping attacks Data poisoning attacks, including manipulate training features (Biggio et al., 2012; Koh & Liang, 2017; Jagielski et al., 2018; 2021; Weber et al., 2020) and flipping labels (Paudice et al., 2018) , aim to achieve attackers' adversarial goals. Classical data poisoning attacks are untargeted, which aim to deteriorate the overall prediction performances of the classifier (Gao et al., 2020) . For example, early backdoor attacks contaminated training data to decrease the overall test accuracy of the trained supported vector machines (Biggio et al., 2012) . Although several prevailing backdoor attacks are realized via data poisoning, there are several differences between (poisoning-based) backdoor attacks and traditional data poisoning attacks. Backdoor attacks are targeted. A backdoored model only misbehaves maliciously on specified tasks in the presence of backdoor triggers while retaining the overall test accuracy of its primary tasks. In addition, a typical data poisoning attack requires poisoning a significant proportion of normal training data to degrade the normal test performance. In contrast, a backdoor attack often only allows modifications on a small subset of training data to maintain good performances on the normal tasks (Li et al., 2020) . Multi-task learning Multi-task learning (MTL) (Baxter, 2000; Li et al., 2014; Xue et al., 2007; Ruder, 2017) refers to learning several different tasks via a single model. MTL is often implemented in hard and soft parameter sharing. For soft-parameter sharing, all the parameters are private and specific to different tasks. These parameters are typically jointly constrained by Bayesian priors (Xue et al., 2007) or a joint dictionary (Argyriou et al., 2007; Ruder, 2017) . In hard-parameter sharing, each task shares some parameters and has task-specific parameters. Our work falls under the umbrella of hard-parameter sharing. Both the normal and backdoor tasks will share the parameter of the learning model, but the backdoor task owns the selection parameter privately. The proposed DLP can be considered a particular type of MTL because there are two goals that are learned jointly with hard parameter sharing.

3. DLP

We consider a classification scenario, where x ∈ X ⊆ R d , y ∈ Y = [1, . . . , K], y ∈ Y, denote the features, label, and backdoor target label, respectively. We denote {(x i , y i )} n i=1 as i.i.d training data from a distribution P X,Y . Let f β : X → Y denote the classifier parameterized by β ∈ R p . We let ℓ(•, •) denote the loss function that evaluates the discrepancy between the true label and predicted label. We use ∥ • ∥ 1 , ∥ • ∥ 2 and ∥ • ∥ ∞ to denote the ℓ 1 , ℓ 2 and ℓ ∞ vector norm respectively.

3.1. THREAT MODEL

Attacker's Capacities We consider a clean-sample backdoor attack scenario where an attacker can only access and modify the training labels, but not the training features. We emphasize that the number of training data will remain the same after poisoning.foot_0 Besides, the attacker has control over the training process. However, the attacker does not have control over any procedures in the test stage, including modifying data and deploying model on cloud. The discussed scenario can happen in several real-world applications, e.g., "Outsourcing Training" (Gao et al., 2020) . Overall speaking, the threat model of our method poses considerably weaker requirements than the perturbation-based backdoor attacks, and therefore it is stealthier than the current perturbation-based backdoor attacks.

3.2. PROPOSED METHOD

We first briefly outline the overall process of clean-sample backdoor attacks in the followings. In the training stage, attackers first relabel a subset of clean data (denoted as DB) to attacker-specified target label(s), e.g., relabeling images of 'green cars' as 'frogs'. Next, attackers train a classifier to learn those poisoned data (DB) and the remaining clean data. During the test stage, attackers do not need to modify the test data to launch backdoor attacks, unlike perturbation-based attacks where the attacker needs to patch the test data to launch backdoor attack. The reason is that any test input with the same features as DB, e.g., images of 'green cars', is expected to be automatically predicted as the attacker-specified target label(s) by the backdoored model. The most challenging part in clean-sample attacks is to effectively and efficiently select the features in DB to serve as backdoor triggers. Current methods apply the semantic pattern of green cars as backdoor triggers (Bagdasaryan et al., 2020) . But the use of semantic patterns as backdoor triggers could be possibly limited in terms of effectiveness and efficiency, such as a restricted number of backdoor data and ineffective selections of backdoor triggers (see Section 1 for detailed discussions). To solve those issues, we propose a data-driven backdoor selection mechanism that enables attackers to select any number of backdoor data effectively. To be precise, the attacker chooses backdoor data via a scoring map that takes features as input and outputs a score g W : X → [0, 1], with W ∈ R q being the parameter. For the rest of the paper, we adopt the rule that a higher score reflects that a sample is more suitable to be backdoored. Intuitively, one can treat the selection mechanism g W as a soft binary classifier for deciding whether an input should be selected as a backdoor candidate. To select backdoor data at any attacker-specified level, the attacker can first sort the scores of all the samples, namely {g W (x i )} n i=1 , in descending order, and then pick any attacker-specified quantile of data as backdoor samples. Next, we will elaborate on how to incorporate the proposed scoring method into the training pipeline. Recall that one of attackers' goals is to obtain a high accuracy on backdoor data. To fulfill this goal, the attacker should select a backdoor scoring mechanism g W such that the following empirical backdoor risk is minimized for given data {(x i , y i )} n i=1 , model β, and a backdoor label y ∈ Y: R n (W, β) := 1 n n i=1 1{y i ̸ = y}g W (x i )ℓ(f β (x i ), y), subject to (C) n j=1 1{y j ̸ = y}g W (x j ) = m and g W (x j ) ∈ {0, 1} for j = 1, . . . , n. Note that indicators of 1{y i ̸ = y} for i = 1, . . . , n are included in the constraint. Applying these terms is because we can not backdoor (relabel) a sample to its originally ground-truth label class. For example, it is meaningless to relabel an image of the cat as a cat for a backdoor attack. As it is generally accepted in the backdoor literature, attackers should simultaneously achieve high clean accuracy and backdoor accuracy. To meet such a requirement, we use a multi-task learning formulation with weighted summation to design backdoor attacks. The first task is associated with the normal learning task, and the second task corresponds to the backdoor task. Given normal training data {(x i , y i )} n i=1 and a backdoor target label y, the DLP backdoor attack is a pair of minimizer ( W , β) of the following: min W ∈R q ,β∈R p 1 n n i=1 ℓ(f β (x i ), y i ) + λ m n j=1 1{y j ̸ = ỹ}g W (x j )ℓ(f β (x j ), y) (2) subject to (C) n j=1 1{y j ̸ = y}g W (x j ) = m and g W (x j ) ∈ {0, 1} for j = 1, . . . , n, where m is the number of backdoor samples, and λ > 0 is a regularizing coefficient. Remark 1. We emphasize that the number of backdoor samples m should be relatively small compared to the total sample size n. The reason is that an excessive number of backdoor samples will eventually lead to a trivial classifier for normal test data. Thus, the first goal of backdoor attacks will not be met. Finally yet importantly, we will now discuss the test pipelines of DLP. Given an attacker-specified number of backdoor data m and a trained score mechanism g W , we first sort the scores of training data {g W (x i )} n i=1 in descending order and set g W (x i ) (m) , namely the m-th largest score, to be a threshold for selecting test backdoor data. For a test input x, if its score g W (x) is greater than g W (x i ) (m) , then it will be marked as a backdoor sample.

4. CONTINUOUS IMPLEMENTATION AND THEORETICAL RESULTS

4.1 CONTINUOUS IMPLEMENTATION Directly solving problem (2) is challenging. The main difficulty comes from the fact that {g W (x i )} n i=1 are required to take values in a discrete set, which is not directly compatible with the popular gradientbased optimization framework. To tackle this issue, we first consider an unconstrained version of problem ( 2) and then propose a regularization term to enforce the selection mechanisms to fulfill the constraint (C). In particular, we consider P m (W ) = ( n j=1 1{y j ̸ = ỹ}g W (x j ) -m) 2 + n j=1 g W (x j )(1 -g W (x j )), where m is an attacker-specified number. It is straightforward to observe that the proposed term P m (W ) equals zero if and only if the constraint (C) is satisfied. In other words, the proposed regularization term will lead a selection mechanism to satisfy the constraint (C) if and only if it is precisely minimized. Consequently, we now propose to consider the following continuous and unconstrained minimization problem: min W,β 1 n n i=1 ℓ(f β (x i ), y i ) + λ m n j=1 1{y j ̸ = y}g W (x j )ℓ(f β (x j ), y) + τ n P m (W ), where λ, τ > 0 are tuning parameters. One can apply classical gradient-based methods to solve the above problem. Nevertheless, classical optimization methods may lead to convergence issues. We include the rigorous descriptions of the issues and algorithms to solve them in the appendix.

4.2. THEORETICAL RESULTS

In this section, we show that, under reasonable conditions, the proposed DLP leads to an attack that satisfies the two backdoor goals mentioned in last section. All the proof is included in the supplement due to the page limit. We build theoretical results based on the classical notion of the Rademacher Complexity in classical learning theory. The (empirical) Rademacher complexity of the function class F with respect to a probability distribution P over an input space X for i.i.d. sample {x i } n i=1 with size n is: Rad n (F) := n -1 E σ sup f ∈F n i=1 σ i f (x i ) , where the inner expectation is taken over σ = {σ 1 , σ 2 , • • • , σ n } and they are independent random variables following the Rademarcher distribution, i.e., P (σ i = 1) = P (σ i = -1) = 1/2. Next, we present results on the generalization bounds for both normal and backdoor tasks in the followings. We consider a binary classification problem with label set Y := {1, -1} and the backdoor target label y = 1. The test performances are evaluated through R(β, W ) : = E P X,Y 1{Y ̸ = y}g W (X)ℓ(f β (X), y) and R(β) = E P X,Y ℓ(f β (X), Y ) respectively. Denote ( W , β ) to be a backdoor attack obtained by solving problem (3). Also, for ease of notation, we define two new family of functions namely H β = {ℓ(f β (X), Y )|β ∈ R d } and G W,β = {1{Y ̸ = y}g W (X)ℓ(f β (X), Y )|β ∈ R d , W ∈ R p }. Additionally, we will need the following assumption regrading the loss function. Assumption 1. The loss function ℓ(•, •) is uniformly upper bounded by B. Theorem 1. Suppose that the assumption 1 holds. The followings hold with probability at least 1 -2δ: • Gap on the normal task: R( β) -R( β) ≤ λB + 4Rad n (H β ) + 2n -1/2 B log 1/δ, where β := arg min β n -1 n i=1 ℓ(f β (x i ), y i ) is the normal classifier; • Gap on the backdoor task: R( W , β)-R(W , β) ≤ 4Rad n (G W,β )+2n -1/2 B(log δ -1 ) 0.5 + 2mτ /λ + Bm/nλ, where (W , β) is the optimal backdoor choice, i.e., the minimizier of (1). The first bound is about the gap in normal test performances between the DLP classifier and the normal classifier. The second bound describes the differences between backdoor test performances of the DLP and the optimal backdoor choice (defined in Section 3.1). The task-priority hyperparameter λ appears in both two loss terms. A small λ tends to generate a backdoor classifier f β that resemble the normal classifier f β more. In contrast, an excessively large λ pushes the backdoor classifier f β to behave similarly to the optimal backdoor classifier f β . Two upper bounds also depend on the Rademacher complexity of function classes. If the Rademacher complexity of the function class is bounded, we can obtain vanishing upper bounds by appropriately selecting the hyperparameters. We provide an informal result for linear classifiers in the following. The formal statements and the proof are included in the supplement. Theorem 2. (Informal) Suppose that f β is a linear classifier. Then, under specific assumptions and appropriately chosen hyperparameters, the gap on normal and backdoor tasks converge to zero with high probability as sample size n → ∞.

5. EXPERIMENTAL STUDY

This section systemically evaluates the proposed method on synthetic data and benchmark datasets. Our empirical study shows the effectiveness of the proposed DLP and improved performances compared with state-of-the-art (SOTA) methods. Interestingly, we demonstrate that the backdoor samples selected by the DLP are semantically close to those from the target labels, which is aligned with existing conjecture (Bagdasaryan et al., 2020) . Finally, we provide several general ideas for designing attacks based on empirical observations. We will use two test measurements throughout this section. The normal accuracy (abbreviated as AccN) is the test accuracy on the normal data, and the backdoor accuracy (abbreviated as AccB) represents the test accuracy on the selected backdoor data. The method for selecting backdoor data is described in the last paragraph in Section 3.

5.1. SYNTHETIC DATA

We evaluate the proposed method on synthetic data. For the training data, we generate 500 i. Selected backdoor sample We plot the normal training data (in orange dots and blue triangles), the selected training backdoor data (in green triangles), the normal classifier (in solid red line), and the backdoor classifier (in gray dash-dot line) with λ = 1 in Fig. 2 . The selected backdoor points stay close to the normal classifier, and the DLP classifier moves down compared to the normal classifier. Such an observation can be interpreted in the following way. By relabelling those points near the decision boundary, the attacker slightly shifts the normal classifier to account for those backdoored samples without severely degrading the normal test accuracy. This phenomenon has also been independently discovered and investigated in the early work of poisoning support vector machines (Biggio et al., 2012) and the active learning literature (Settles, 2009) , where the goal is to find the most effective and efficient samples to train a classifier.

5.2. REAL-WORLD DATASETS

Tasks We consider the following tasks: (I) 10-class classification problem on the Fashion-MNIST (Xiao et al., 2017) with a LeNet (LeCun et al., 2015) , (II) 10-class classification problem on the CIFAR10 (Krizhevsky et al., 2009) data with a Resnet18 (He et al., 2016) , and (III) classification problem on GTSRB (Stallkamp et al., 2011) . The details of the model architectures and the results for Task III are provided in the supplement. Backdoor Setup For task I (II), we set the number of backdoor samples m to be 600 (500), and 6000 (5000), which accounts for 1% and 10% of the total training data respectively. For both tasks, the selection mechanism is a two-layer neural network. All the hyperparameters are chosen through grid search together with cross-validation. Single Task Performance For task I, the AccN of the normal classifier is 92.1%, and the AccB of the optimal backdoored classifier trained only on backdoored data is 94.2%. For task II, the AccN of the normal classifier is 92.1%, and the AccB of the optimal backdoored classifier 93.0%.

5.2.1. TEST PERFORMANCE

For every backdoor target label, we summarize the test performances of DLP in Table . 3 for Task I with m = 600, and for Task II with m = 500. We include the results for other choices of m in the next subsection. Both the normal and backdoor test accuracy of our DLP are comparably high compared to that of the single-task performance, which reflects the effectiveness of the proposed DLP. In task I (Fashion-MNIST), for backdoor target label 'Sandal', 'Sneaker,' and 'Ankle Boot' (referred as 'Boot'), both the AccN and BccN exceed the test performances of other backdoor labels by a great margin. This observation implies that labels of the shoe category are the most effective for backdoor. Such a result may be because images of 'Sandal', 'Sneaker,' and 'Boot' are semantically similar. 

5.2.2. SELECTED BACKDOOR SAMPLES

For task I, we demonstrate the categories of the top 1% selected backdoor training samples in Fig. 3a , and several images of the top-3-category selected training backdoor samples associated with backdoor label 'Sneaker' , 'Trouser' and 'Shirt' in Fig. 3b . Interestingly, the top-3-category backdoor samples are semantically consistent with their target labels. For example, the images of 'Sandal' and 'Boot' resemble the images of Sneakers most than any other categories in the Fashion-MNIST dataset. Additionally, images from 'Dress' and 'T-Shirt' look like those of 'Shirt'. Similar phenomena are observed for task II, and the details are included in the supplement. Such observations provide concrete evidence to support the conjecture that backdoor images should be similar to their backdoor label category (Bagdasaryan et al., 2020) . The similarity is measured in terms of semantic meanings in our case. Based on the discussion, we suggest that, for clean-sample attacks, one should look select 'similar' images for backdoors. 

5.2.3. EFFECTIVENESS UNDER VARYING BACKDOOR SIZES

We demonstrate the merit of our proposed method on effectively selecting any number of backdoor sample in this section. In particular, we test on several different ratios of the backdoor sample size and summarize the results in Table 4 . From the threat model in Section 3.1, the total sample size remains the same after poisoning. Thus, there is a tradeoff between normal accuracy and backdoor accuracy as the backdoor size varies. For example, if the attacker flips 99% of the total training data, then the backdoored classifier will delivery trivial test accuracy on clean test data. Regarding to the empirical results, we observed that the proposed method deliveries both high benign and backdoor accuracy under a reasonable range choices of m, e.g., from 1% to 25%. We first compare the proposed DLP with two state-of-the-art methods, and then test the performances of DLP under certain defenses. As mentioned in Section 3, the threat model of our/clean-sample attack is more weaker than the perturbation-based attacks. As a result, the most suitable method for comparison is the SOTA clean-sample attack ("Semantic attack"). For the sake of completeness, we also compare our method with the more powerful PBA ("edge-case attack"). Empirically, we find that the DLP outperforms the SOTA clean-sample attack and is comparable with the more powerful edge-case attack in some cases. Task I Due to the low-resolution property of the Fashion-MNIST dataset, one may not obtain finegrained categories of images other than the original ten classes. Hence, we will compare our proposed method with two SOTA methods on a binary classification task. The new task is to predict if a fashion object is a piece of clothing with label 0 (including ' Top', 'Trouser', 'Pullover', 'Dress', 'Coat', and 'Shirt') or an accessory with label 1 (including 'Sandal', 'Sneaker', 'Bag', ' Boot'). We selected a LetNet classifier whose normal accuracy is 97.2%. We include the backdoor data preparation process for both "edge-case attack" and "semantic attack" in the appendix. The low-resolution property of the Fashion-MNIST dataset leads the above two SOTA methods to coincide with each other. In the following, we set the backdoor target label to be 0. Then, we tested all the possible categories for backdoor samples and summarized the result in Table . 5. The AccN and AccB of the proposed DLP are 94.1% and 94.7%, respectively. We observe that, for a given AccN of 93%, the proposed DLP obtains the highest AccB against the SOTA methods. Alternatively, for a given AccB of 92%, the AccN of the proposed DLP is slightly lower than the SOTA methods (94.5%). The result is because edge-case attacks modify the training data, which should be more potent than our proposed methods. We follow the same setup above to conduct experiments on Task II and Task III. We observe similar results as in the Fashion-MNIST, and include the details in the supplement. Defenses Because of the weak threat model of our method, many existing defenses against perturbation-based attacks are inappropriate for defending against our attacks. But defenses against label flipping attacks, e.g., label sanitization (Paudice et al., 2018) can be suitably tailored to defend against our method. We found that those defenses are ineffective against our methods. Details are included in the supplementary material.

6. CONCLUSION

In this work, we proposed a new type of clean-sample backdoor attack known as DLP. The proposed DLP allows the attacker to choose any number of backdoor samples effectively in terms of test performance. The key ingredient of developing the DLP is a multi-task learning formulation, enabling the attacker to perform well on both normal and backdoor tasks. There are several interesting future problems. One direction is to characterize the optimal backdoor sample theoretically. Finally, on the defense side, it is necessary to create a new defense mechanism to defend the DLP. The supplementary material contains proofs and more experimental studies. where B is the uniform upper-bound on the loss function. Combining Eq. ( 7) and Eq. ( 6), the following holds with probability at least 1 -2δ, R( β) -R( β) ≤ λB + 4Rad n (H β ) + 2B log 2 δ n . Regarding the gap on the backdoor task, we first rewrite R( W , β) -R(W , β) = R( W , β) + m nλ R n ( β) + mτ n 2 λ P m ( W ) - mτ n 2 λ P m ( W ) - m nλ R n ( β) -R(W , β). Invoking Lemma 2, with probability at least 1 -δ, the followings hold R( W , β) -R(W , β) ≤ R n ( W , β) + m nλ R n ( β) + mτ n 2 λ P m ( W ) - mτ n 2 λ P m ( W ) - m nλ R n ( β) -R(W , β) + 2Rad n (G W,β ) + B log 1 δ 2n , ≤ R n (W , β) + m nλ R n (β) + | mτ n 2 λ P m (W ) - mτ n 2 λ P m ( W )| - m nλ R n ( β) -R(W , β) + 2Rad n (G W,β ) + B log 1 δ 2n , ≤ R n (W , β) + m nλ R n (β) + 2mτ λ -R(W , β) + 2Rad n (G W,β ) + B log 1 δ 2n , where Eq. ( 9) holds by the definition of ( W , β) (minimizer) and Eq. ( 10) is because P m (•) is upper bounded by 2n 2 . Invoking Lemma 2 on R n (W , β) -R(W , β) in Eq. ( 10) with a union bound, with probability greather than 1 -2δ, we obtain R( W , β) -R(W , β) ≤ 2mτ λ + m nλ R n (β) + 4Rad n (G W,β ) + 2B log 1 δ 2n . For the term R n (β), we have m nλ R n (β) = m nλ 1 n n i=1 ℓ(f β (x i ), y i ) ≤ m nλ Bn n = Bm nλ , ( ) where B is the uniform upper-bound on the loss function. Combining Eq. ( 11) and Eq. ( 12), the following holds with probability at least 1 -2δ, R( W , β) -R(W , β) ≤ Bm nλ + 4Rad n (G W,β ) + 2B log 2 δ n + 2mτ λ .

A.2 FORMAL STATEMENTS OF THEOREM 2

We consider linear classifiers of the form f β (x) = β ⊤ x for β ∈ R d , with ∥β∥ 1 ≤ W 1 . We assume ∥x i ∥ ∞ ≤ 1 for i = 1, . . . , n. For the loss function, following (Boucheron et al., 2005) , we take ℓ(f β (X), Y ) = ϕ(-f β (X)Y ) where ϕ : R → R + . Some common examples are ϕ(x) = log(1 + e x ) for logistic regression and ϕ(x) = max(0, x) for support vector machine. Since ϕ is uniformly upper bounder by B and g W is upper bounded by one, then by Lemma 2, | R n (W, β) -R(W, β)| ≤ 2Rad n (G W,β ) + B log 1 δ 2n , holds with probability at least 1 -δ where Rad n (G W,β ) = E σ [sup β∈R d 1 n n j=1 σ j 1{y j ̸ = 1}g W (x j )ϕ(f β (x j ))] with P (σ i = 1) = P (σ i = -1) = 0.5 for i = 1, . . . , n. Without loss of generality, we assume that that there are s samples with ground-truth label 1 with index n -s + 1, . . . , n. Thus, 2Rad n (G W,β ) = 2E σ [ sup β∈R d 1 n n-s j=1 σ j g W (x j )ϕ(f β (x j ))] (13) ≤ 2L 1 n E σ [ sup β∈R d n-s j=1 σ j f β (x j )], ≤ 2L 1 W 1 2 log d n , where Eq. ( 14) holds by Lipschitz Composition Principle (Boucheron et al., 2005) and Eq. ( 15) is by Lemma 5.

B ALGORITHMS

In this section, we present algorithms for implementing the proposed DLP. We adopt the state-of-theart techniques in the MTL literature (Sener & Koltun, 2018; Kaiser et al., 2017; Zhou et al., 2017) . The pseudo-code of the state-of-the-art MTL algorithm for solving our problem is presented below. The main idea of the algorithm is as follows. We first run gradient descent algorithms on each task in Lines 2-3. Then, to ensure that the overall objective value is decreased, we further update the shared parameter β in Lines 4-5. The rationale for obtaining particular α to ensure the decrease of the overall object value can be found in (Sener & Koltun, 2018) . Regarding the convergence analysis, under reasonable conditions, the Procedure 1 is shown to find Pareto stationary points or local/global optimal points. We refer the interested readers to the references (Sener & Koltun, 2018; Kaiser et al., 2017) for details. 

C COMPLEMENTARY EXPERIMENTAL RESULTS

We provide detailed experimental results that are omitted in the main text due to the page limit in this section. 'Deer' and 'Truck' in Fig. 4b . Similar to the task I in the main text, the top-3-category backdoor samples are semantically consistent with their target labels. For example, the images of 'Deer' and 'Dog' resemble the images of 'Horse' most than any other categories in the Fashion-MNIST dataset. Such observations provide concrete evidence to support the conjecture that backdoor images should be similar to their backdoor label category (Bagdasaryan et al., 2020) . The similarity is measured in terms of semantic meanings in our case. 



This is fundamentally different from the case of perturbation-based attacks where backdoor training data will be injected to the clean training data and therefore the number of total training data will increase.



Figure 1: Illustrations of (a) clean-sample backdoor attacks and (b) perturbation-based backdoor attacks in autonomous driving systems. In clean-sample attacks, the attacker relabels images of stop-sign as go-straight sign. In perturbation-based attacks, the attacker adds camouflage patches to stop-sign images and then relabels the perturbed images as go-straight sign.

i.d samples from the normal distribution N ([0, 3], [[2, 0], [0, 2]]) with label 0 and 500 i.i.d samples from the normal distribution N ([0, -3], [[2, 0], [0, 2]])with label 1 respectively. For the test data, we generate 2500 i.i.d samples for each label class. The classifier f β is linear, and the loss function ℓ is the logistic loss. Also, the backdoor target y is 1, and the number of backdoor sample m = 50.

Figure 2: An illustration of selected backdoor samples under linear classifier with Gaussian data.

(a) Pie-charts of selected backdoor samples corresponding to backdoor label Sneaker (left), Trouser (middle) and Shirt (right). (b) Snapshots of top-3-category selected backdoor samples corresponding to backdoor label Sneaker (left), Trouser (middle) and Shirt (right).

Figure 3: Illustration of (a) selected categories of backdoor samples in pie chart and (b) selected examples for Fashion-MNIST dataset.

Initialization: Model Parameter β 1 , Selection Parameter W 1 , Hyperparameters λ, τ and stepsizes {ηi} T -1 i=1 1: for t = 1, . . . , T -1 do 2: βt = β t -ηt∇ β L1(β t ) // L1(β) := 1/n n i=1 ℓ(f β (xi), yi) 3: W t+1 = W t -ηt∇W L2(W t , β t ) // L2(W, β) := τ /nPm(W ) + λ/m n j=1 1{yj ̸ = y}gW (xj)ℓ(f β (xj), y) βt -η(α t 1 ∇ β L1(βt) + α t 2 ∇ β t L2(W, β)) 6: end for Output: β T , W T

Weight SolverInput:Initialization: α = α 1 , α 2 = 1 2 , 1 2 , W and β 1: Compute M st. M1,2 = (∇ β L1(β)) ⊤ (∇ β L2(W, β)), M2,1 = (∇ β L2(W, β)) ⊤ (∇ β L1(β)) 2: Compute t = arg minr t α t Mrt 3: Compute γ = arg minγ ((1 -γ)α + γe t) ⊤ M ((1 -γ)α + γe t) 4: α * = (1 -γ)α + γe t Output: α * C.1 SELECTED SAMPLES FOR CIFAR10For task II, we demonstrate the categories of the top 1% selected backdoor training samples in Fig.4a, and several images of the top-3-category selected training backdoor samples associated with the backdoor label 'Cat',

(a) Pie-charts of selected backdoor samples corresponding to backdoor label Cat (left), Horse (middle) and Automobile (right). (b) Snapshots of top-3-category selected backdoor samples corresponding to backdoor label Cat (left), Horse (middle) and Automobile (right).

Figure 4: Illustration of (a) selected categories of backdoor samples in pie chart and (b) selected examples for CIFAR10 dataset.

Summary of attackers' capability in the proposed DLP, perturbation-based backdoor attacks (abbreviated as PBA), and semantic backdoor attacks (abbreviated as SBA). The first (second) column specifies if attackers modify training (test) features. The third column indicates whether attackers are allowed to choose any number of backdoor data.



Test accuracy (in %) with flipping rate m = 1%n

Test Accuracy (in %) under different backdoor size m

Test Accuracy (in %) of Task I

Appendix for DLP: Data-Driven Label-Poisoning Backdoor Attack

We include the proof of Theorem 1 and the formal statement of Theorem 2 (and its proof) in Section A. The pseudo-code of algorithms for implementing DLP and the convergence analysis are presented in Section B. Complementary results of the experimental study are included in Section C. Finally, We also present additional experimental results, including investigations on the effect of different sizes of the backdoor sample and results on more complex datasets.

A PROOF

A.1 PROOF OF THEOREM 1We will rely the following two lemmas. Lemma 1. (Boucheron et al., 2005) For any β ∈ R d , we have with probability at least 1 -δ,Lemma 2. (Boucheron et al., 2005) For any fixed W ∈ R p , we have with probability at least 1 -δ,Back to our main results, we first bound the gap on the normal task.Invoking Lemma 1, with probability at least 1 -δ, we havewhere Eq. (4) holds by definitions of ( W , β) (minimizer) and Eq. ( 5) holds by definitions of W .Invoking Lemma 1 on R n ( β) -R( β) in Eq. ( 5) with a union bound, with probability greather than 1 -2δ, we haveFrom the definition of W , there are only m non-negative terms in R n (W , β) and henceTheorem 3. (Linear Case) Suppose that the assumption 1, 2, and 3 hold. The followings hold with probability at least 1 -2δ:1. Gap on the normal task:2. Gap on the backdoor task:Corollary 1. Under the same assumptions of Theorem 3, by setting λ = Θ(1/ log n), m = Θ( √ n) and τ = Θ(1/n), two gaps will converge to zero with high probability as n → ∞.Proof of Theorem 3 and Corollary 1. We will be using the following two lemmas. Lemma 3. (Boucheron et al., 2005) For any β ∈ R d , we have with probability at least 1 -δ,Lemma 4. For any fixed W ∈ R p , we have with probability at leastThe proof of Lemma 4 is attached at the end of proof of the main theorem.The proof of Theorem 3 directly follows from Theorem 1 by using explicit forms of the Rademacher Complexity in Lemma 3 and Lemma 4.Regrading the proof of Corollary 1, it is straightforward to check that every term in the two gaps of Theorem 3 will vanish as n → ∞ with specified hyperparameters.

A.3 PROOF OF LEMMA 4

Proof. We will use the following lemma.Lemma 5 ( (Boucheron et al., 2005) ). Let F be the class of linear predictors, with the ℓ 1 -norm of the weights bounded by W 1 . Also assume that with probability one that ∥x∥ ∞ ≤ X ∞ . Thenwhere d is the dimension of data and n is the sample size.Back to the main proof, recall by definition,

C.2 TEST PERFORMANCES ON GTSRB

In this section, we test the proposed method on the GTSRTB dataset. There are in total 43 types of traffic signs (labels) in GTSRB, and many of them are of the same type, e.g., "20 speed","30 speed".For the sake of clarity, we only select one of each types for testing. For semantic backdoor attacks, following the framework in (Bagdasaryan et al., 2020) , we relabel a whole category of the training data with the same semantic meaning. For example, we can relabel images of Top (with ground-truth label 1) with label 0. For edge-case attacks, we follow the ideas in (Wang et al., 2020) to first create a new training dataset by excluding the samples of one whole category. Then, we relabel the samples of the previously excluded sub-category and added them to the previous training data to form a new training data.We follow the same setup for Task II in the main text. The two SOTA methods to be compared are listed below. For semantic attacks, the work of (Bagdasaryan et al., 2020) relabeled a whole category of the training data. For edge-case attacks, the authors in (Wang et al., 2020) first relabeled the images of Southwest Airplanes (NOT in CIFAR10) as Truck and then injected the relabeled sample-label pairs into the training data. For completeness, we also test for different backdoor labels as summarized in Table 6 . The proposed DLP consistently outperforms the SOTA clean-sample attack for all backdoor target labels. Also, the proposed DLP is comparable with the more powerful edge-case attack in some cases, e.g., a backdoor label of 'Dog'. In this section, we test the proposed method under certain defenses. As mentioned in the main text, because of the weak threat model of (centralized) clean-sample backdoor attacks, many existing defenses against perturbation-based attacks are inappropriate for defending against our attacks.But defenses against label flipping attacks, e.g., label sanitization (Paudice et al., 2018) and label ceritication (Rosenfeld et al., 2020) can be suitably tailored to defend against our method. The results are summarized in Fig. 6 and Fig. 7 respectively. It can be concluded from the figure that the proposed method can escape the two defenses. 

