ATTACKING FEW-SHOT CLASSIFIERS WITH ADVERSARIAL SUPPORT SETS

Abstract

Few-shot learning systems, especially those based on meta-learning, have recently made significant advances, and are now being considered for real world problems in healthcare, personalization, and science. In this paper, we examine the robustness of such deployed few-shot learning systems when they are fed an imperceptibly perturbed few-shot dataset, showing that the resulting predictions on test inputs can become worse than chance. This is achieved by developing a novel Adversarial Support Set Attack which crafts a poisoned set of examples. When even a small subset of malicious data points is inserted into the support set of a metalearner, accuracy is significantly reduced. For example, the average classification accuracy of CNAPS on the Aircraft dataset in the META-DATASET benchmark drops from 69.2% to 9.1% when only 20% of the support set is poisoned by imperceptible perturbations. We evaluate the new attack on a variety of few-shot classification algorithms including MAML, prototypical networks, and CNAPS, on both small scale (miniImageNet) and large scale (META-DATASET) few-shot classification problems. Interestingly, adversarial support sets produced by attacking a meta-learning based few-shot classifier can also reduce the accuracy of a fine-tuning based classifier when both models use similar feature extractors.

1. INTRODUCTION

Standard deep learning approaches suffer from poor sample efficiency (Krizhevsky et al., 2012) which is problematic in tasks where data collection is difficult or expensive. Recently, few-shot learners have been developed which address this shortcoming by supporting rapid adaptation to a new task using only a few labeled examples (Finn et al., 2017; Snell et al., 2017) . This success has meant that few-shot learners are becoming increasingly attractive for real-life applications. They have been applied to user personalization in recommender systems (Lee et al., 2019) , matching potential users to businesses (Li et al., 2020) , personalized talking head models (Zakharov et al., 2019) , and on-device gaze estimation (He et al., 2019) . As few-shot learners improve, they are also being applied to increasingly sensitive applications where the repercussions of confidentlywrong predictions are severe. Examples include clinical risk assessment (Sheryl Zhang et al., 2019) , glaucoma diagnosis (Kim et al., 2017) , identification of diseases in skin lesions (Mahajan et al., 2020) , and tissue slide annotation in cancer immuno-therapy biomarker research (Lahiani et al., 2018) . As few-shot learners gain popularity, it is essential to understand how robust they are and whether there are potential avenues for their exploitation. It is well known that standard classifiers are vulnerable to inputs that have been purposefully modified in a minor way to cause incorrect predictions (Biggio & Roli, 2017) . Such examples may be presented to a model either at test time, called evasion attacks (Biggio et al., 2017) or adversarial examples (Szegedy et al., 2014) , or at training time, which is referred to as poisoning (Newsome et al., 2006; Rubinstein et al., 2009) . While previous work has considered adversarial attacks on few shot learners, data poisoning attacks have not been studied and are the focus of this paper. Data poisoning attacks are of particular relevance in the few-shot learning setting for two reasons. First, since the datasets are small, a handful of poisoned patterns might have a significant effect. Second, many applications of few-shot learning require labeled data from users to adapt the system to a new task, essentially providing a direct interface for outsiders to influence the model's behaviour. 1

