CERTIFIED ROBUSTNESS OF NEAREST NEIGHBORS AGAINST DATA POISONING ATTACKS

Abstract

Data poisoning attacks aim to corrupt a machine learning model via modifying, adding, and/or removing some carefully selected training examples, such that the corrupted model predicts any or attacker-chosen incorrect labels for testing examples. The key idea of state-of-the-art certified defenses against data poisoning attacks is to create a majority vote mechanism to predict the label of a testing example. Moreover, each voter is a base classifier trained on a subset of the training dataset. Nearest neighbor algorithms such as k nearest neighbors (kNN) and radius nearest neighbors (rNN) have intrinsic majority vote mechanisms. In this work, we show that the intrinsic majority vote mechanisms in kNN and rNN already provide certified robustness guarantees against general data poisoning attacks. Moreover, our empirical evaluation results on MNIST and CIFAR10 show that the intrinsic certified robustness guarantees of kNN and rNN outperform those provided by state-of-the-art certified defenses.

1. INTRODUCTION

Data poisoning attacks (Barreno et al., 2006; Nelson et al., 2008; Biggio et al., 2012; 2013a; Xiao et al., 2015b; Steinhardt et al., 2017; Shafahi et al., 2018) aim to corrupt the training phase of a machine learning system via carefully poisoning its training dataset including modifying, adding, and/or removing some training examples. The corrupted model predicts incorrect labels for testing examples. Data poisoning attacks pose severe security concerns to machine learning in critical application domains such as autonomous driving (Gu et al., 2017) , cybersecurity (Rubinstein et al., 2009; Suciu et al., 2018; Chen et al., 2017) , and healthcare analytics (Mozaffari-Kermani et al., 2014) . Unlike adversarial examples (Szegedy et al., 2014; Goodfellow et al., 2014; Carlini & Wagner, 2017) , which add perturbation to each testing example to induce misclassification, data poisoning attacks corrupt the model such that it misclassifies many clean testing examples. Multiple certifiably robust learning algorithms (Ma et al., 2019; Rosenfeld et al., 2020; Levine & Feizi, 2020; Jia et al., 2020) against data poisoning attacks were recently developed. A learning algorithm is certifiably robust against data poisoning attacks if it can learn a classifier on a training dataset that achieves a certified accuracy on a testing dataset when the number of poisoned training examples is no more than a threshold (called poisoning size). The certified accuracy of a learning algorithm is a lower bound of the accuracy of its learnt classifier no matter how an attacker poisons the training examples with the given poisoning size. The key idea of state-of-the-art certifiably robust learning algorithms (Levine & Feizi, 2020; Jia et al., 2020) is to create a majority vote mechanism to predict the label of a testing example. In particular, each voter votes a label for a testing example and the final predicted label is the majority vote among multiple voters. For instance, Deep Partition Aggregation (DPA) (Levine & Feizi, 2020) divides the training dataset into disjoint partitions and learns a base classifier (i.e., a voter) on each partition. Bagging (Jia et al., 2020 ) also learns multiple base classifiers (i.e., voters), but each of them is learnt on a random subsample of the training dataset. We denote by a and b the labels with the largest and second largest number of votes, respectively. Moreover, s a and s b respectively are the number of votes for labels a and b when there are no corrupted voters. The corrupted voters change their votes from a to b in the worst-case scenario. Therefore, the majority vote result (i.e., the predicted label for a testing example) remains to be a when the number of corrupted voters is no larger than sa-s b 2 -1. In other words, the number of corrupted voters that a majority vote mechanism can tolerate depends on the gap s a -s b between the largest and the second largest number of votes. In summary, we make the following contributions:



Figure 1: An example to illustrate individual certification vs. joint certification. Suppose rNN correctly classifies the two testing examples without attack. An attacker can poison 3 training examples. The attacker can make rNN misclassify each testing example individually. However, the attacker cannot make rNN misclassify both testing examples jointly.

Our major contribution in this work is that we show the intrinsic majority vote mechanisms in kNN and rNN make them certifiably robust against data poisoning attacks. Moreover, kNN and rNN address the limitations of state-of-the-art certifiably robust learning algorithms. Specifically, each poisoned training example leads to only one corrupted voter in the worse-case scenario in kNN and rNN. Therefore, given the same gap s a -s b , the majority vote result (i.e., predicted label for a testing example) is robust against more poisoned training examples in kNN and rNN. Furthermore, we show that rNN enables joint certification of multiple testing examples. Figure1illustrates an example of individual certification and joint certification with two testing examples in rNN. When we treat the two testing examples individually, an attacker can poison 3 training examples such that rNN misclassifies each of them. However, when we treat them jointly, an attacker cannot poison 3 training examples to misclassify both of them. We propose such joint certification to derive a better certified accuracy for rNN. Specifically, we design methods to group testing examples in a testing dataset such that we can perform joint certification for each group of testing examples.We evaluate our methods on MNIST and CIFAR10 datasets. We use 1 distance metric to calculate nearest neighbors. First, our methods substantially outperform state-of-the-art certifiably robust learning algorithms. For instance, when an attacker can arbitrarily poison 1,000 training examples on MNIST, the certified accuracy of rNN with r = 4 is 40.8% and 33.5% higher than those of DPA (Levine & Feizi, 2020) and bagging(Jia et al., 2020), respectively. Second, our joint certification improves certified accuracy. For instance, our joint certification improves the certified accuracy of rNN by 15.1% when an attacker can arbitrarily poison 1,000 training examples on MNIST.

