CERTIFIED ROBUSTNESS OF NEAREST NEIGHBORS AGAINST DATA POISONING ATTACKS

Abstract

Data poisoning attacks aim to corrupt a machine learning model via modifying, adding, and/or removing some carefully selected training examples, such that the corrupted model predicts any or attacker-chosen incorrect labels for testing examples. The key idea of state-of-the-art certified defenses against data poisoning attacks is to create a majority vote mechanism to predict the label of a testing example. Moreover, each voter is a base classifier trained on a subset of the training dataset. Nearest neighbor algorithms such as k nearest neighbors (kNN) and radius nearest neighbors (rNN) have intrinsic majority vote mechanisms. In this work, we show that the intrinsic majority vote mechanisms in kNN and rNN already provide certified robustness guarantees against general data poisoning attacks. Moreover, our empirical evaluation results on MNIST and CIFAR10 show that the intrinsic certified robustness guarantees of kNN and rNN outperform those provided by state-of-the-art certified defenses. -1. In other words, the number of corrupted voters that a majority vote mechanism can tolerate depends on the gap s a -s b between the largest and the second largest number of votes.

1. INTRODUCTION

Data poisoning attacks (Barreno et al., 2006; Nelson et al., 2008; Biggio et al., 2012; 2013a; Xiao et al., 2015b; Steinhardt et al., 2017; Shafahi et al., 2018) aim to corrupt the training phase of a machine learning system via carefully poisoning its training dataset including modifying, adding, and/or removing some training examples. The corrupted model predicts incorrect labels for testing examples. Data poisoning attacks pose severe security concerns to machine learning in critical application domains such as autonomous driving (Gu et al., 2017) , cybersecurity (Rubinstein et al., 2009; Suciu et al., 2018; Chen et al., 2017) , and healthcare analytics (Mozaffari-Kermani et al., 2014) . Unlike adversarial examples (Szegedy et al., 2014; Goodfellow et al., 2014; Carlini & Wagner, 2017) , which add perturbation to each testing example to induce misclassification, data poisoning attacks corrupt the model such that it misclassifies many clean testing examples. Multiple certifiably robust learning algorithms (Ma et al., 2019; Rosenfeld et al., 2020; Levine & Feizi, 2020; Jia et al., 2020) against data poisoning attacks were recently developed. A learning algorithm is certifiably robust against data poisoning attacks if it can learn a classifier on a training dataset that achieves a certified accuracy on a testing dataset when the number of poisoned training examples is no more than a threshold (called poisoning size). The certified accuracy of a learning algorithm is a lower bound of the accuracy of its learnt classifier no matter how an attacker poisons the training examples with the given poisoning size. The key idea of state-of-the-art certifiably robust learning algorithms (Levine & Feizi, 2020; Jia et al., 2020) is to create a majority vote mechanism to predict the label of a testing example. In particular, each voter votes a label for a testing example and the final predicted label is the majority vote among multiple voters. For instance, Deep Partition Aggregation (DPA) (Levine & Feizi, 2020) divides the training dataset into disjoint partitions and learns a base classifier (i.e., a voter) on each partition. Bagging (Jia et al., 2020 ) also learns multiple base classifiers (i.e., voters), but each of them is learnt on a random subsample of the training dataset. We denote by a and b the labels with the largest and second largest number of votes, respectively. Moreover, s a and s b respectively are the number of votes for labels a and b when there are no corrupted voters. The corrupted voters change their votes from a to b in the worst-case scenario. Therefore, the majority vote result (i.e., the predicted label for a testing example) remains to be a when the number of corrupted voters is no larger than sa-s b However, state-of-the-art certifiably robust learning algorithms achieve suboptimal certified accuracies due to two key limitations. First, each poisoned training example leads to multiple corrupted voters in the worse-case scenarios. In particular, modifying a training example corrupts two voters (i.e., two base classifiers) in DPA (Levine & Feizi, 2020) and corrupts the voters whose training subsamples include the modified training example in bagging (Jia et al., 2020) . Therefore, given the same gap s a -s b between the largest and the second largest number of votes, the majority vote result is robust against a small number of (Fix & Hodges, 1951; Cover & Hart, 1967) have intrinsic majority vote mechanisms. Specifically, given a testing example, kNN (or rNN) predicts its label via taking a majority vote among the labels of its k nearest neighbors (or neighbors within radius r) in the training dataset. Our major contribution in this work is that we show the intrinsic majority vote mechanisms in kNN and rNN make them certifiably robust against data poisoning attacks. Moreover, kNN and rNN address the limitations of state-of-the-art certifiably robust learning algorithms. Specifically, each poisoned training example leads to only one corrupted voter in the worse-case scenario in kNN and rNN. Therefore, given the same gap s a -s b , the majority vote result (i. We evaluate our methods on MNIST and CIFAR10 datasets. We use 1 distance metric to calculate nearest neighbors. First, our methods substantially outperform state-of-the-art certifiably robust learning algorithms. For instance, when an attacker can arbitrarily poison 1,000 training examples on MNIST, the certified accuracy of rNN with r = 4 is 40.8% and 33.5% higher than those of DPA (Levine & Feizi, 2020) and bagging (Jia et al., 2020) , respectively. Second, our joint certification improves certified accuracy. For instance, our joint certification improves the certified accuracy of rNN by 15.1% when an attacker can arbitrarily poison 1,000 training examples on MNIST. In summary, we make the following contributions: • We derive the intrinsic certified robustness guarantees of kNN and rNN against data poisoning attacks. whose distances to x are no larger than r as the nearest neighbors. The distance between a training input and a testing input can be measured by any distance metric. Then, kNN and rNN use majority vote among the nearest neighbors to predict the label of x. Specifically, each nearest neighbor is a voter and votes its label for the testing input x; and the label with the largest number of votes is the final predicted label for x. Data poisoning attacks: We consider data poisoning attacks (Rubinstein et al., 2009; Biggio et al., 2012; Xiao et al., 2015a; Li et al., 2016; Muñoz-González et al., 2017; Jagielski et al., 2018) Certified accuracy: Given a training dataset D tr and a learning algorithm M, we use certified accuracy on a testing dataset D te = {(x i , y i )} t i=1 to measure the algorithm's performance. Specifically, we denote certified accuracy at poisoning size e as CA(e) and formally define it as follows: CA(e) = min D * tr ∈S(Dtr,e) (xi,yi)∈Dte I(M(D * tr , x i ) = y i ) |D te | , ( ) where I is the indicator function and M(D * tr , x i ) is the label predicted for a testing input x i by the classifier learnt by the algorithm M on the poisoned training dataset D * tr . CA(e) is the least testing accuracy on D te that the learning algorithm M can achieve no matter how an attacker poisons the training examples when the poisoning size is at most e. Our goal is to derive lower bounds of CA(e) for learning algorithms kNN and rNN.

3. CERTIFIED ACCURACY OF KNN AND RNN

We first derive a lower bound of the certified accuracy via individual certification, which treats testing examples in D te individually. Then, we derive a better lower bound of the certified accuracy for rNN via joint certification, which treats testing examples jointly.

3.1. INDIVIDUAL CERTIFICATION

Given a poisoning size at most e, our idea is to certify whether the predicted label stays unchanged or not for each testing input individually. If the predicted label of a testing input x stays unchanged (i.e., M(D tr , x) = M(D * tr , x)) and it matches with the testing input's true label, then kNN or rNN certifiably correctly classifies the testing input when the poisoning size is at most e. Therefore, we can obtain a lower bound of the certified accuracy at poisoning size e as the fraction of testing inputs in D te which kNN or rNN certifiably correctly classifies. Next, we first discuss how to certify whether the predicted label stays unchanged or not for each testing input individually. Then, we show our lower bound of the certified accuracy at poisoning size e. Certifying the predicted label of a testing input: Our goal is to certify that M(D tr , x) = M(D * tr , x) for a testing input x when the poisoning size is no larger than a threshold. We use s l to denote the number of votes in N (D tr , x) for label l, i.e., the number of nearest neighbors in N (D tr , x) whose labels are l. Formally, we have s l = (xj ,yj )∈N (Dtr,x) I(y j = l), where l = 1, 2, • • • , c and I is an indicator function. kNN or rNN essentially predicts the label of the testing input x as the label with the largest number of votes, i.e., M(D tr , x) = arg max l∈{1,2,••• ,c} s l . Suppose a and b are the labels with the largest and second largest number of votes, i.e., s a and s b are the largest and second largest ones among {s 1 , s 2 , • • • , s c }, respectively. We note that there may exist ties when comparing the labels based on their votes. We define a deterministic ranking of labels in {1, 2, • • • , c} and take the label with the largest rank when such ties happen. In the worse-case scenario, each poisoned training example leads to one corrupted voter in kNN or rNN, which changes its vote from label a to label b. Therefore, kNN or rNN still predicts label a for the testing input x when the number of poisoned training examples is no more than sa-s b 2 -1 (without considering the ties breaking). Formally, we have the following theorem: Theorem 1. Assuming we have a training dataset D tr , a testing input x, and a nearest neighbor algorithm M (i.e., kNN or rNN). a and b respectively are the two labels with the largest and second largest number of votes among the nearest neighbors N (D tr , x) of x in D tr . Moreover, s a and s b are the number of votes for a and b, respectively. Then, we have the following:  M(D * tr , x) = a, ∀D * tr ∈ S(D tr , e), e ≤ s a -s b + I(a > b) 2 -1. = 1, 2, • • • , c. Then, we have s l -e ≤ s * l ≤ s l + e for each l = 1, 2, • • • , c. Therefore, when e ≤ sa-s b +I(a>b) 2 -1, we have s * a -s * b ≥ s a -s b -2•e > 0 if a < b and s * a -s * b ≥ s a -s b -2•e ≥ 0 if a > b. Thus, the nearest neighbor algorithm still predicts label a for x in both cases based on our way of breaking ties, i.e., we have M(D * tr , x) = a when e ≤ sa-s b +I(a>b)

2

-1. Deriving a lower bound of CA(e): kNN or rNN certifiably correctly classifies a testing input x if it correctly predicts its label before attacks and the predicted label stays unchanged after an attacker poisons the training dataset. Therefore, the fraction of testing inputs that kNN or rNN certifiably correctly classifies is a lower bound of CA(e). Formally, we have the following theorem: Theorem 2 (Individual Certification). Assuming we have a training dataset D tr , a testing dataset D te = {(x i , y i )} t i=1 , and a nearest neighbor algorithm M (i.e., kNN or rNN). a i and b i respectively are the two labels with the largest and second largest number of votes among the nearest neighbors N (D tr , x i ) of x i in D tr . Moreover, s ai and s bi are the number of votes for a i and b i , respectively. Then, we have the following lower bound of CA(e): CA(e) ≥ (xi,yi)∈Dte I(a i = y i ) • I(e ≤ e * i ) |D te | , where e * i = sa i -s b i +I(ai>bi) 2 -1. Proof. See Appendix A.

3.2. JOINT CERTIFICATION

We with different predicted labels. a i and b i respectively are the two labels with the largest and second largest number of votes among the nearest neighbors N (D tr , x i ) of x i in D tr . Moreover, s ai and s bi are the number of votes for a i and b i , respectively. Without loss of generality, we assume the following: (s a1 -s b1 ) • I(a 1 = y 1 ) ≥ (s a2 -s b2 ) • I(a 2 = y 2 ) ≥ • • • ≥ (s am -s bm ) • I(a m = y m ). (5) Then, the certified accuracy at poisoning size e of rNN for U has a lower bound CA(e) ≥ w-1 |U | , where w is the solution to the following optimization problem: , where e i is the number of removed nearest neighbors in N (D tr , x i ) whose true labels are a i . Note that kNN does not support joint certification because s * ai ≥ s ai -e i does not hold for kNN. Next, we derive the minimal value of e i such that rNN misclassifies x i . In particular, we consider two cases. If a i = y i , i.e., x i is misclassified by rNN without attack, then we have e i = 0. If a i = y i , x i is misclassified by rNN when s * ai ≤ s * bi if a i < b i and s * ai < s * bi if a i > b i after attack, which means e i ≥ s ai -s bi -e + I(a i > b i ). Since e i ≥ 0, we have e i ≥ max(s ai -s bi -e + I(a i > b i ), 0). Combining the two cases, we have the following lower bound for e i that makes rNN misclassify x i : e i ≥ max(s ai -s bi -e + I(a i > b i ), 0) • I(a i = y i ). Moreover, since the attacker can remove at most e training examples and the group of testing examples have different predicted labels, i.e., a i = a j ∀i, j ∈ {1, 2, • • • , m} and i = j, we have m i=1 e i ≤ e. We note that the lower bound of e i is non-increasing as i increases based on Equation (5). Therefore, in the worst-case scenario, the attacker can make rNN misclassify the last m -w + 1 testing inputs whose corresponding e i sum to be at most e. Formally, w is the solution to the optimization problem in Equation ( 6). Therefore, the certified accuracy at poisoning size e is at least -1 can be certifiably correctly classified at poisoning size e. Therefore, Dfoot_0 te includes such testing examples. Each testing example in D 0 te or D 1 te forms a group by itself. D 2 te includes the remaining testing examples, which we further divide into groups. Our method of dividing D 2 te into groups is inspired by the proof of Theorem 3. In particular, we form a group of testing examples as follows: for each label l ∈ {1, 2, • • • , c}, we find the testing example that has the largest value of (s ai -s bi -e + I(a i > b i )) • I(a i = l) and we skip the label if there is no remaining testing example whose predicted label is l. We apply the procedure to recursively group the testing examples in D 2 te until no testing examples are left. w = arg min w ,w ≥1 w s.t. m i=w max(s ai -s bi -e + I(a i > b i ), 0) • I(a i = y i ) ≤ e.

4. EVALUATION

Datasets: We evaluate our methods on MNIST and CIFAR10. We use the popular histogram of oriented gradients (HOG) 1 (Dalal & Triggs, 2005) method to extract features for each example, which we found improves certified accuracy. Note that previous work (Jia et al., 2020) used a pre-trained model to extract features via transfer learning. However, the pre-trained model may also be poisoned and thus we don't use it. We didn't find ties in determining nearest neighbors for kNN in our experiments. We rank the labels as {1, 2, • • • , 10} to break ties for labels. Parameter settings: While any distance metric is applicable, we use 1 in our experiments for both kNN and rNN. Unless otherwise mentioned, we adopt the following settings: k = 5, 000 for both MNIST and CIFAR10 in kNN; and r = 4 for MNIST and r = 20 for CIFAR10 in rNN, considering the different feature dimensions of MNIST and CIFAR10. By default, we use the ISLAND grouping method in the joint certification for rNN. Comparing with DPA (Levine & Feizi, 2020) and bagging (Jia et al., 2020) : All the compared methods have tradeoffs between accuracy under no attacks (i.e., CA(0)) and robustness against attacks. Therefore, we set their parameters such that they have similar accuracy under no attacks (i.e., similar CA(0)). In particular, we use the default k for kNN, and we adjust r for rNN, ζ for DPA, and ξ for bagging. The searched parameters are as follows: r = 4, ζ = 5, 500, and ξ = 27 for MNIST; and r = 21, ζ = 900, and ξ = 400 for CIFAR10. Note that we set N = 1, 000 and α = 0.001 for bagging following (Jia et al., 2020) . We have the following observations. First, both kNN and rNN outperform DPA and bagging. The superior performance of kNN and rNN stems from two reasons: 1) each poisoned training example corrupts multiple voters for DPA and bagging, while it only corrupts one voter for kNN and rNN, which means that, given the same gap between the largest and second largest number of votes, kNN and rNN can tolerate more poisoned training examples; and 2) rNN enables joint certification that improves the certified accuracy. Second, rNN achieves better certified accuracy than kNN when the poisoning size is large. The reason is that rNN supports joint certification. Comparing individual certification with joint certification: Figure 2c and Figure 2d compare individual certification and joint certification (with the RD and ISLAND grouping methods) for rNN. Our empirical results validate that joint certification improves the certified accuracy upon individual certification. Moreover, our ISLAND grouping method outperforms the RD method. Impact of k and r: Figure 3 shows the impact of k and r on the certified accuracy of kNN and rNN, respectively. As the results show, k and r achieve tradeoffs between accuracy under no attacks (i.e., CA(0)) and robustness. Specifically, when k or r is smaller, the accuracy under no attacks, i.e., CA(0), is larger, but the certified accuracy decreases more quickly as the poisoning size e increases.

5. RELATED WORK

Data poisoning attacks have been proposed against various learning algorithms such as Bayes classifier (Nelson et al., 2008) , SVM (Biggio et al., 2012) , clustering (Biggio et al., 2013b; 2014) , collaborative filtering (Li et al., 2016 ), regression models (Xiao et al., 2015a; Mei & Zhu, 2015b; Jagielski et al., 2018) , LDA (Mei & Zhu, 2015a) , neural networks (Muñoz-González et al., 2017; Shafahi et al., 2018; Suciu et al., 2018; Demontis et al., 2019; Zhu et al., 2019; Huang et al., 2020), and others (Rubinstein et al., 2009; Vuurens et al., 2011) . To mitigate data poisoning attacks, many empirical defenses (Cretu et al., 2008; Rubinstein et al., 2009; Barreno et al., 2010; Biggio et al., 2011; Feng et al., 2014; Jagielski et al., 2018; Tran et al., 2018) have been proposed. Steinhardt et al. (2017) derived an upper bound of the loss function for data poisoning attacks when the model is learnt using examples in a feasible set. However, these defenses lack certified robustness guarantees. Recently, several certified defenses (Ma et al., 2019; Rosenfeld et al., 2020; Levine & Feizi, 2020; Jia et al., 2020) were proposed to defend against data poisoning attacks. These defenses provide certified accuracies for a testing dataset either probabilistically (Ma et al., 2019; Jia et al., 2020) or deterministically (Rosenfeld et al., 2020; Levine & Feizi, 2020) . All these defenses except (Ma et al., 2019) create majority vote mechanisms to predict the label of a testing example. In particular, a voter is a base classifier learnt on a perturbed version of the training dataset in randomized smoothing based defenses (Rosenfeld et al., 2020) , while a voter is a base classifier learnt on a subset of the trainig dataset in DPA (Levine & Feizi, 2020) and bagging (Jia et al., 2020) . Ma et al. (2019) showed that a differentially private learning algorithm achieves certified accuracy against data poisoning attacks. They also train multiple differentially private classifiers, but they are not used to predict the label of a testing example via majority vote. Instead, their average accuracy is used to estimate the certified accuracy. kNN and rNN have intrinsic majority vote mechanisms and we show that they provide deterministic certified accuracies against data poisoning attacks. Moreover, rNN enables joint certification. We note that DPA (Levine & Feizi, 2020) proposed to use a hash function to assign training examples into partitions, which is different from our use of hash function. In particular, we use a hash function to rank training examples. Moreover, both DPA and our work rank the labels to break ties when comparing them with respect to their votes. A line of works (Wilson, 1972; Guyon et al., 1996; Peri et al., 2019; Bahri et al., 2020) (Gao et al., 2018; Reeve & Kabán, 2019) studied the resistance of nearest neighbors to random noisy labels. For instance, Gao et al. (2018) analyzed the resistance of kNN to asymmetric label noise and introduced a Robust kNN to deal with noisy labels. Reeve & Kabán (2019) further analyzed the Robust kNN proposed by (Gao et al., 2018) in the setting with unknown asymmetric label noise. kNN and its variants have also been used to defend against adversarial examples (Wang et al., 2018; Sitawarin & Wagner, 2019a; Papernot & McDaniel, 2018; Sitawarin & Wagner, 2019b; Dubey et al., 2019; Yang et al., 2020; Cohen et al., 2020) . For instance, Wang et al. (2018) analyzed the robustness of nearest neighbors to adversarial examples and proposed a more robust 1-nearest neighbor. Several works (Amsaleg et al., 2017; Wang et al., 2018; 2019; Yang et al., 2020) proposed adversarial examples to nearest neighbors, e.g., Wang et al. (2019) proposed adversarial examples against 1-nearest neighbor. These works are orthogonal to ours as we focus on analyzing the certified robustness of kNN and rNN against general data poisoning attacks.

6. CONCLUSION AND FUTURE WORK

In this work, we derive the certified robustness of nearest neighbor algorithms, including kNN and rNN, against data poisoning attacks. Moreover, we derive a better lower bound of certified accuracy for rNN via jointly certifying multiple testing examples. Our evaluation results show that 1) both kNN and rNN outperform state-of-the-art certified defenses against data poisoning attacks, and 2) joint certification outperforms individual certification. Interesting future work includes 1) extending joint certification to other learning algorithms, 2) improving joint certification via new grouping methods, and 3) improving certified accuracy of kNN and rNN via new distance metrics. A PROOF OF THEOREM 2 Proof. We have the following: CA(e) = min  where the last step is based on applying Theorem 1 to testing input x i . B PROOF OF THEOREM 4 Proof. We have the following: CA(e) = min  where we have Equation ( 15) from ( 14) based on applying Theorem 3 to group U j .



Public implementation: https://scikit-image.org/docs/dev/api/skimage.feature.html#skimage.feature.hog



Figure 1: An example to illustrate individual certification vs. joint certification. Suppose rNN correctly classifies the two testing examples without attack. An attacker can poison 3 training examples. The attacker can make rNN misclassify each testing example individually. However, the attacker cannot make rNN misclassify both testing examples jointly.

that aim to poison (i.e., modify, add, and/or remove) some carefully selected training examples in D tr such that the corrupted classifier has a low accuracy for testing examples indiscriminately. For simplicity, we use D * tr to denote the poisoned training dataset. Moreover, we define the poisoning size e as the minimal number of modified/added/removed training examples that can turn D tr into D * tr . We use S(D tr , e) to denote the set of poisoned training datasets whose poisoning sizes are at most e. Formally, we define S(D tr , e) as follows: S(D tr , e) = {D * tr | max{|D * tr |, |D tr |} -|D * tr ∩ D tr | ≤ e}, (1) where max{|D * tr |, |D tr |} -|D * tr ∩ D tr | is the poisoning size of D * tr . Note that modifying a training example is equivalent to removing a training example and adding a new one. Given a training dataset D tr and a poisoning size e, an attacker aims to craft a poisoned training dataset D * tr to minimize the testing accuracy of the classifier learnt by algorithm M on D * tr .

Given a training dataset D tr (or a poisoned training dataset D * tr ) and a testing input x, we use N (D tr , x) (or N (D * tr , x)) to denote the set of nearest neighbors of x in D tr (or D * tr ) for kNN or rNN. We note that there may exist ties when determining the nearest neighbors for kNN, i.e., multiple training examples may have the same distance to the testing input. Usually, kNN breaks such ties uniformly at random. However, such random ties breaking method introduces randomness, i.e., the difference of nearest neighbors before and after poisoned training examples (i.e., N (D tr , x) vs. N (D * tr , x)) depends on the randomness in breaking ties. Such randomness makes it challenging to certify the robustness of the predicted label against poisoned training examples. To address the challenge, we propose to define a deterministic ranking of training examples and break ties via choosing the training examples with larger ranks. Moreover, such ranking between clean training examples does not depend on poisoned ones. For instance, we can use a cryptographic hash function (e.g., SHA-1) that is very unlikely to have collisions to hash each training example based on its input feature vector and label, and then we rank the training examples based on their hash values.

When an attacker poisons at most e training examples, the number of changed nearest neighbors in N (D tr , x) is at most e. We denote by s * l = (xj ,yj )∈N (D * tr ,x) I(l = y j ) the number of votes for label l among the nearest neighbors N (D * tr , x) in the poisoned training dataset, where l

Figure 2: (a)-(b) comparing kNN and rNN with state-of-the-art methods. (c)-(d) comparing individual certification with joint certification for rNN.

Figure 2a and Figure 2b show the comparison results of DPA, bagging, kNN, and rNN. DPA divides a training dataset into ζ disjoint partitions and learns a base classifier on each of them. Then, DPA takes a majority vote among the base classifiers to predict the label of a testing example. Bagging learns N base classifiers, each of which is learnt on a random subsample with ξ training examples of the training dataset. Moreover, bagging's certified accuracy is correct with a confidence level 1 -α.

yi)∈Dte I(M(D * tr , x i ) = y i ) |D te | (8) ≥ (xi,yi)∈Dte min D * tr ∈S(Dtr,e) I(M(D * tr , x i ) = y i ) |D te | (9) = (xi,yi)∈Dte I(a i = y i ) min D * tr ∈S(Dtr,e) I(M(D * tr , x i ) = a i ) |D te | (10) = (xi,yi)∈Dte I(a i = y i )I(e ≤ e * i ) |D te | ,

yi)∈Dte I(M(D * tr , x i ) = y i ) yi)∈Uj I(M(D * tr , x i ) = y i ) yi)∈Uj I(M(D * tr , x i ) = y i )

e., predicted label for a testing example) is robust against more poisoned training examples in kNN and rNN. Furthermore, we show that rNN enables joint certification of multiple testing examples. Figure 1 illustrates an example of individual certification and joint certification with two testing examples in rNN. When we treat the two testing examples individually, an attacker

We propose joint certification of multiple testing examples to derive a better certified robustness guarantee for rNN. rNN is the first method that supports joint certification of multiple testing examples against data poisoning attacks. • We evaluate our methods and compare them with state-of-the-art on MNIST and CIFAR10. Assuming we have a training dataset D tr with n training examples. We denote by M a learning algorithm. Moreover, we denote by M(D tr , x) the label predicted for a testing input x by a classifier learnt by M on the training dataset D tr . For instance, given a training dataset D tr and a testing input x, kNN finds the k training examples in D tr that are the closest to x as the nearest neighbors, while rNN finds the training examples in D tr

derive a better lower bound of the certified accuracy via jointly considering multiple testing examples. Our intuition is that, given a group of testing examples and a poisoning size e, an attacker may not be able to make a learning algorithm misclassify all the testing examples jointly even if it can make the learning algorithm misclassify each of them individually. In particular, rNN enables such joint certification. It is challenging to perform joint certification for kNN because of the complex interactions between the nearest neighbors of different testing examples (see our proof of Theorem 3 for specific reasons). Next, we first derive a lower bound of CA(e) on a group of testing examples for rNN. Then, we derive a lower bound of CA(e) on the testing dataset D te via dividing it into groups. Finally, we discuss different strategies to divide the testing dataset into groups, which may lead to different lower bounds of CA(e).

Proof. When an attacker can poison at most e training examples, the attacker can add at most e new nearest neighbors and remove e existing ones in N (D tr , x i ) (equivalent to modifying e training examples) in the worst-case scenario. We denote by s * ai and s * bi respectively the number of votes for labels a i and b i among the nearest neighbors N (D * tr , x i ). First, we have s * bi ≤ s bi + e for ∀i ∈ {1, 2, • • • , m} since at most e new nearest neighbors are added. Second, we have s * ai ≥ s ai -e i in rNN

w-1 |U | .Deriving a lower bound of CA(e) for a testing dataset: Based on Theorem 3, we can derive a lower bound of CA(e) for a testing dataset via dividing it into disjoint groups, each of which includes testing examples with different predicted labels in rNN. Formally, we have the following theorem: Theorem 4 (Joint Certification). Given a testing dataset D te , we divide it into λ disjoint groups, i.e., U 1 , U 2 , • • • , U λ , where the testing examples in each group have different predicted labels in rNN. Then, we have the following lower bound of CA(e):where µ j is the lower bound of the certified accuracy at poisoning size e on group U j , which we can obtain by invoking Theorem 3. cannot be certifiably correctly classified at poisoning size e no matter which group it belongs to. Therefore, D 0 te includes such testing examples. Moreover, based on Theorem 1, a testing example (x i , y i ) that satisfies e ≤ sa i -s b i +I(ai>bi) 2

leveraged nearest neighbors to clean a training dataset. For instance,Wilson (1972) proposed to remove a training example whose label is not the same as the majority vote among the labels of its 3 nearest neighbors.Peri et al. (2019) proposed to remove a training example whose label is not the mode amongst labels of its k nearest neighbors in the feature space.Bahri et al. (2020) combined kNN with an intermediate layer of a preliminary deep neural network model to filter suspiciously-labeled training examples. Another line of works

