DYNAMIC BACKDOOR ATTACKS AGAINST DEEP NEU-RAL NETWORKS

Abstract

Current Deep Neural Network (DNN) backdooring attacks rely on adding static triggers (with fixed patterns and locations) on model inputs that are prone to detection. In this paper, we propose the first class of dynamic backdooring techniques: Random Backdoor, Backdoor Generating Network (BaN), and conditional Backdoor Generating Network (c-BaN). Triggers generated by our techniques have random patterns and locations. In particular, BaN and c-BaN based on a novel generative network are the first two schemes that algorithmically generate triggers. Moreover, c-BaN is the first conditional backdooring technique that given a target label, it can generate a target-specific trigger. Both BaN and c-BaN are essentially a general framework which renders the adversary the flexibility for further customizing backdoor attacks. We extensively evaluate our techniques on three benchmark datasets and show that our techniques achieve almost perfect attack performance on backdoored data with a negligible utility loss. More importantly, our techniques can bypass state-of-the-art defense mechanisms.

1. INTRODUCTION

Recent research has shown that deep neural network (DNN) models are vulnerable to various security and privacy attacks (Papernot et al., 2016; 2017; Shokri et al., 2017; Salem et al., 2019; 2020; Tramèr et al., 2016; Oh et al., 2018) . One such attack that receives a large amount of attention is backdoor, where an adversary trains a DNN model which can intentionally misclassify any input with an added trigger (a secret pattern constructed from a set of neighboring pixels) to a specific target label. Backdoor attacks can cause severe security consequences. For instance, an adversary can implant a backdoor in an authentication system to grant herself unauthorized access. Existing backdoor attacks generate static triggers, in terms of fixed trigger pattern and location (on the model input). For instance, Figure 1a shows an example of triggers constructed by Bad-Nets (Gu et al., 2017) , one popular backdoor attack method, on the CelebA dataset (Liu et al., 2015) . As we can see, BadNets in this case uses a white square as a trigger and always places it in the top-left corner of an input. This static nature of triggers has been leveraged to create most of the current defenses against the backdoor attack ( Wang et al., 2019; Liu et al., 2019a; Gao et al., 2019) . In this paper, we propose the first class of backdooring techniques against deep neural network models that generate dynamic triggers, in terms of trigger pattern and location. We refer to our techniques as dynamic backdoor attacks. Figure 1b shows an example. Dynamic backdoor attacks offer the adversary more flexibility, as they allow triggers to have different patterns and locations. Moreover, our techniques largely reduce the efficacy of the current defense mechanisms demonstrated by our empirical evaluation. In addition, we extend our techniques to work for all labels of the backdoored DNN model, while the current backdoor attacks only focus on a single or a few target labels. This further increases the difficulty of our backdoors being mitigated.



Figure 1: A comparison between static and dynamic backdoors on CelebA.

