ATTACKING FEW-SHOT CLASSIFIERS WITH ADVERSARIAL SUPPORT SETS

Abstract

Few-shot learning systems, especially those based on meta-learning, have recently made significant advances, and are now being considered for real world problems in healthcare, personalization, and science. In this paper, we examine the robustness of such deployed few-shot learning systems when they are fed an imperceptibly perturbed few-shot dataset, showing that the resulting predictions on test inputs can become worse than chance. This is achieved by developing a novel Adversarial Support Set Attack which crafts a poisoned set of examples. When even a small subset of malicious data points is inserted into the support set of a metalearner, accuracy is significantly reduced. For example, the average classification accuracy of CNAPS on the Aircraft dataset in the META-DATASET benchmark drops from 69.2% to 9.1% when only 20% of the support set is poisoned by imperceptible perturbations. We evaluate the new attack on a variety of few-shot classification algorithms including MAML, prototypical networks, and CNAPS, on both small scale (miniImageNet) and large scale (META-DATASET) few-shot classification problems. Interestingly, adversarial support sets produced by attacking a meta-learning based few-shot classifier can also reduce the accuracy of a fine-tuning based classifier when both models use similar feature extractors.

1. INTRODUCTION

Standard deep learning approaches suffer from poor sample efficiency (Krizhevsky et al., 2012) which is problematic in tasks where data collection is difficult or expensive. Recently, few-shot learners have been developed which address this shortcoming by supporting rapid adaptation to a new task using only a few labeled examples (Finn et al., 2017; Snell et al., 2017) . This success has meant that few-shot learners are becoming increasingly attractive for real-life applications. They have been applied to user personalization in recommender systems (Lee et al., 2019) , matching potential users to businesses (Li et al., 2020) , personalized talking head models (Zakharov et al., 2019) , and on-device gaze estimation (He et al., 2019) . As few-shot learners improve, they are also being applied to increasingly sensitive applications where the repercussions of confidentlywrong predictions are severe. Examples include clinical risk assessment (Sheryl Zhang et al., 2019 ), glaucoma diagnosis (Kim et al., 2017) , identification of diseases in skin lesions (Mahajan et al., 2020) , and tissue slide annotation in cancer immuno-therapy biomarker research (Lahiani et al., 2018) . As few-shot learners gain popularity, it is essential to understand how robust they are and whether there are potential avenues for their exploitation. It is well known that standard classifiers are vulnerable to inputs that have been purposefully modified in a minor way to cause incorrect predictions (Biggio & Roli, 2017) . Such examples may be presented to a model either at test time, called evasion attacks (Biggio et al., 2017) or adversarial examples (Szegedy et al., 2014) , or at training time, which is referred to as poisoning (Newsome et al., 2006; Rubinstein et al., 2009) . While previous work has considered adversarial attacks on few shot learners, data poisoning attacks have not been studied and are the focus of this paper. Data poisoning attacks are of particular relevance in the few-shot learning setting for two reasons. First, since the datasets are small, a handful of poisoned patterns might have a significant effect. Second, many applications of few-shot learning require labeled data from users to adapt the system to a new task, essentially providing a direct interface for outsiders to influence the model's behaviour. If few-shot learning systems are not robust to poisoning of their training dataset, then this weakness could be exploited. An attacker performing a man-in-the-middle (Conti et al., 2016) data poisoning attack could cause a recommender system's personalization to perform badly or suggest certain results to influence a user's decision. Applied at scale to many users, an attacker could cause significant damage. Similarly, a doctor attempting to commit medical insurance fraud may submit images causing a benign skin condition to be incorrectly classified as a skin disease requiring expensive treatment; or a malicious party may ruin a research study that uses automated annotation of samples by tampering imperceptibly with only a few images. If these attacks could be achieved with malicious patterns that cannot be reliably distinguished from real training data, it would be difficult to defend against them. Before detailing the key contributions of the paper, it is necessary to briefly introduce the lexicon of few-shot learning. During training, few-shot learners are typically presented with many different tasks. The model must learn to perform well on each task, hopefully arriving at a point where it can adapt effectively to a new task at test time. At test time, the model is presented with an unseen task containing a few labeled examples, the support set, and a number of unlabeled examples to classify, called the query set. The paper makes the following contributions: 1. We define a novel attack on few-shot classifiers, called an Adversarial Support Set Attack, which applies adversarial perturbations to the support set that are calculated to minimize model accuracy over a set of query points. To the best of the authors' knowledge, this is the first work considering the impact of poisoning attacks on trained few-shot classifiers. 2. We demonstrate that few-shot classifiers are surprisingly vulnerable to Adversarial Support Set attacks. The adversarial support set attack is more effective than the baselines considered, and generalizes well, i.e. the compromised classifier is highly likely to be inaccurate on a randomly sampled query set from the task domain. 3. We demonstrate the effectiveness of our approach against a variety of few-shot classifiers including MAML (Finn et al., 2017) 4. We show that adversarial support sets transfer effectively to fine-tuning based few-shot classifiers when the few-shot classifier and the fine-tuner utilize similar feature extractors. The rest of the paper proceeds as follows: Section 2 provides background about the meta-learning models under consideration, relevant adversarial attack methods, and the threat model under consideration. Section 3 discusses how evasion and poisoning attacks may be generalized to few-shot learners. Section 4 presents the experimental results and Section 5 concludes the paper. Additional results and experimental details are in the Appendix.

2. BACKGROUND

In this section we lay the necessary groundwork for adversarial support set attacks. We focus on image classification. We denote input images x ∈ R ch×W ×H where W is the image width, H the image height, ch the number of image channels and image labels y ∈ {1, . . . , C} where C is the number of image classes. We use bold x and y to denote a set of images and labels, respectively.

2.1. META-LEARNING

We consider the few-shot image classification scenario using a meta-learning approach. Rather than a single, large dataset D, we assume access to a dataset D = {τ t } K t=1 comprising a large number of training tasks τ t , drawn i.i.d. from a distribution p(τ ). The data for a task consists of a support set D S = {(x n , y n )} N n=1 comprising N elements, with the inputs x n and labels y n observed, and a query set D Q = {(x * m , y * m )} M m=1 with M elements for which we wish to make predictions. We may use the shorthand D S = {x, y} and D Q = {x * , y * } for brevity. Here the inputs x * are observed and the labels y * are only observed during meta-training (i.e. training of the meta-learning algorithm). Note that the query set examples are drawn from the same set of labels as the examples in the support set. At meta-test time, the classifier f is required to make predictions for query set



, ProtoNets (Snell et al., 2017), and CNAPS (Requeima et al., 2019a), on both small scale (miniImageNet (Vinyals et al., 2016)) and large scale (META-DATASET (Triantafillou et al., 2020)) few-shot classification benchmarks.

