POISONING GENERATIVE MODELS TO PROMOTE CATASTROPHIC FORGETTING

Abstract

Generative models have grown into the workhorse of many state-of-the-art machine learning methods. However, their vulnerability under poisoning attacks has been largely understudied. In this work, we investigate this issue in the context of continual learning, where generative replayers are utilized to tackle catastrophic forgetting. By developing a novel customization of dirty-label input-aware backdoors to the online setting, our attacker manages to stealthily promote forgetting while retaining high accuracy at the current task and sustaining strong defenders. Our approach taps into an intriguing property of generative models, namely that they cannot well capture input-dependent triggers. Experiments on four standard datasets corroborate the poisoner's effectiveness.

1. INTRODUCTION

The vulnerability of machine learning systems must be scrutinized before they can be deployed to security-critical applications. The common evasion attack assumes that clean target instances can be manipulated at test time, which can be unrealistic in many scenarios. In contrast, poisoning attacks only make malicious and imperceptible modifications to the training set, so that the prediction on test examples can be mistaken. The threat models may insert poison examples (Chen et al., 2017) , flip the training labels (Xiao et al., 2012; Levine & Feizi, 2021) , or modify the training example inputs (Biggio et al., 2012; Shafahi et al., 2018) . The increasing penetration of generative models in machine learning urges the investigation of poisoning attacks in a broader range of learning paradigms In this work, we focus on continual learning, a prominent setting where tasks arrive in streams and each of them corresponds to a discriminative learning problem such as classification (Chen & Liu, 2018). Since the tasks are streamed and cannot be stored, the running classifier often suffers from catastrophic forgetting, where the performance on older tasks gradually deteriorates (McCloskey & Cohen, 1989) . Deep generative replay (DGR) is a natural tool to bring back the memory of the previous tasks by learning a generative model to fit the data of these tasks (Shin et al., 2017; Cong et al., 2020) . Despite their effectiveness, new vulnerabilities are also opened up where misleading examples can be injected to the training data D t for the current task t, so that catastrophic forgetting can be promoted when such poisoned D t is used for training both the replayer G t and the classifier. In this work, we seek practical and stealthy poisoning attacks on DGR that achieve three objectives: O1 After moving past task t, the classifier will soon forget what was learned from it (i.e., perform poorly on clean test examples drawn from it) despite using a replayer for all the tasks seen so far. O2 During task t, the classifier trained from the poisoned data does not suffer degradation of test accuracy on task t itself. This is important because poor performance on the current task can raise significant and immediate suspicion. In contrast, by promoting forgetting, the harm will



Although poisoning attacks have been extensively studied under discriminative learning, their potential risk in generative learning has been largely understudied. Ding et al. (2019) poisons the training examples so that the learned generator covertly changes some important part of the output image, e.g., turning a red light into green. Salem et al. (2020) enables the adversary to control the output image by planting a trigger in the input image or noise. Both are backdoor attacks requiring write access to test data, and work in batched learning scenarios.

