DYNAMIC BACKDOOR ATTACKS AGAINST DEEP NEU-RAL NETWORKS

Abstract

Current Deep Neural Network (DNN) backdooring attacks rely on adding static triggers (with fixed patterns and locations) on model inputs that are prone to detection. In this paper, we propose the first class of dynamic backdooring techniques: Random Backdoor, Backdoor Generating Network (BaN), and conditional Backdoor Generating Network (c-BaN). Triggers generated by our techniques have random patterns and locations. In particular, BaN and c-BaN based on a novel generative network are the first two schemes that algorithmically generate triggers. Moreover, c-BaN is the first conditional backdooring technique that given a target label, it can generate a target-specific trigger. Both BaN and c-BaN are essentially a general framework which renders the adversary the flexibility for further customizing backdoor attacks. We extensively evaluate our techniques on three benchmark datasets and show that our techniques achieve almost perfect attack performance on backdoored data with a negligible utility loss. More importantly, our techniques can bypass state-of-the-art defense mechanisms.

1. INTRODUCTION

Recent research has shown that deep neural network (DNN) models are vulnerable to various security and privacy attacks (Papernot et al., 2016; 2017; Shokri et al., 2017; Salem et al., 2019; 2020; Tramèr et al., 2016; Oh et al., 2018) . One such attack that receives a large amount of attention is backdoor, where an adversary trains a DNN model which can intentionally misclassify any input with an added trigger (a secret pattern constructed from a set of neighboring pixels) to a specific target label. Backdoor attacks can cause severe security consequences. For instance, an adversary can implant a backdoor in an authentication system to grant herself unauthorized access. Existing backdoor attacks generate static triggers, in terms of fixed trigger pattern and location (on the model input). For instance, Figure 1a shows an example of triggers constructed by Bad-Nets (Gu et al., 2017) , one popular backdoor attack method, on the CelebA dataset (Liu et al., 2015) . As we can see, BadNets in this case uses a white square as a trigger and always places it in the top-left corner of an input. This static nature of triggers has been leveraged to create most of the current defenses against the backdoor attack ( Wang et al., 2019; Liu et al., 2019a; Gao et al., 2019) . In this paper, we propose the first class of backdooring techniques against deep neural network models that generate dynamic triggers, in terms of trigger pattern and location. We refer to our techniques as dynamic backdoor attacks. Figure 1b shows an example. Dynamic backdoor attacks offer the adversary more flexibility, as they allow triggers to have different patterns and locations. Moreover, our techniques largely reduce the efficacy of the current defense mechanisms demonstrated by our empirical evaluation. In addition, we extend our techniques to work for all labels of the backdoored DNN model, while the current backdoor attacks only focus on a single or a few target labels. This further increases the difficulty of our backdoors being mitigated. In total, we propose 3 different dynamic backdoor techniques, i.e., Random Backdoor, Backdoor Generating Network (BaN), and conditional Backdoor Generating Network (c-BaN). In particular, the latter two attacks algorithmically generate triggers to mount backdoor attacks which are first of their kind. To demonstrate the effectiveness of our proposed techniques, we perform empirical analysis with three DNN model architectures over three benchmark datasets. All of our techniques achieve almost a perfect backdoor accuracy, i.e., the accuracy of the backdoored model on the backdoored data is approximately 100%, with a negligible utility loss. Moreover, we show that our techniques can bypass three state-of-the-art backdoor defense techniques, namely Neural Cleanse (Wang et al., 2019) , ABS (Liu et al., 2019a), and STRIP (Gao et al., 2019) . In general, our contributions can be summarized as follows: 1) We broaden the class of backdoor attacks by introducing the dynamic backdoor attacks. 2) We propose both BaN and c-BaN, which are the first algorithmic backdoor paradigm. 3) Our dynamic backdoor attacks achieve strong performance while bypassing the current state-of-the-art backdoor defense techniques.

2. RELATED WORK

Backdoor Attacks: Gu et al. (Gu et al., 2017) introduce BadNets, the first backdoor attack on DNN models. BadNets uses the MNIST dataset and a square-like trigger with a fixed location, to show the applicability of the backdoor attacks in the DNN settings. Liu et al. (Liu et al., 2019b) later propose a more advanced backdooring technique, namely the Trojan attack. They simplify the threat model of BadNets by eliminating the need for access to the training data used to train the target model. The main difference between these two attacks (BadNets and Trojan attacks) and our work is that both attacks only consider static backdoors in terms of triggers' pattern and location. Our work extends the backdoor attacks to consider dynamic patterns and locations of the triggers. We focus on backdoor attacks against image classification models, but backdoor attacks can be extended to other scenarios, such as Federated Learning (Wang et al., 2020 ), Video Recognition (Zhao et al., 2020 ), Transfer Learning (Yao et al., 2019) , and Natural language processing (NLP) (Chen et al., 2020) . To increase the stealthiness of the backdoor, Saha et al. (Saha et al., 2020) propose to transform the backdoored images into benign-looking ones, which makes them harder to detect. Lie et al. (Liu et al., 2020) introduce another approach, namely, the reflection backdoor (Refool), which hides the triggers using mathematical modeling of the physical reflection property. Another line of research focuses on exploring different methods of implementing backdoors into target models. Rakin et al. (Rakin et al., 2020) introduce the Targeted Bit Trojan (TBT) technique, which instead of training the target model, flips some bits in the target models' weights to make it misclassify all the inputs. Tang et al. (Tang et al., 2020) present a different approach, where the adversary appends a small Trojan module (TrojanNet) to the target model instead of fully retraining it. Defenses Against Backdoor Attacks: Wang et al. (Wang et al., 2019) propose Neural Cleanse (NC), a backdoor defense method based on reverse engineering. For each output label, NC tries to generate the smallest trigger, which converts the output of all inputs applied with this trigger to that label. NC then uses anomaly detection to find if any of the generated triggers are actually a backdoor or not. Later, Liu et al. (Liu et al., 2019a) propose another model-based defense, namely, ABS. ABS detects if a target model contains a backdoor or not, by analyzing the behaviour of the target model's inner neurons when introducing different levels of stimulation. Also, Gao et al. (Gao et al., 2019) propose STRIP, a backdoor defense method based on manipulating the input, to find out if it is backdoored or not. More concretely, STRIP fuses the input with multiple clean data, one at a time. Then it queries the target model with the generated inputs, and calculate the entropy of the output labels. Backdoored inputs tend to have lower entropy than the clean ones. Besides the above, there are multiple other types of attacks against deep neural network models, such as adversarial examples (Vorobeychik & Li, 2014; Carlini & Wagner, 2017; Li & Vorobeychik, 2015; Tramèr et al., 2017; Xu et al., 2018 ), poisoning attack (Jagielski et al., 2018; Suciu et al., 2018; Biggio et al., 2012) , and property inference (Ganju et al., 2018; Melis et al., 2019) .



Figure 1: A comparison between static and dynamic backdoors on CelebA.

