TARGETED ATTACK AGAINST DEEP NEURAL NET-WORKS VIA FLIPPING LIMITED WEIGHT BITS

Abstract

To explore the vulnerability of deep neural networks (DNNs), many attack paradigms have been well studied, such as the poisoning-based backdoor attack in the training stage and the adversarial attack in the inference stage. In this paper, we study a novel attack paradigm, which modifies model parameters in the deployment stage for malicious purposes. Specifically, our goal is to misclassify a specific sample into a target class without any sample modification, while not significantly reduce the prediction accuracy of other samples to ensure the stealthiness. To this end, we formulate this problem as a binary integer programming (BIP), since the parameters are stored as binary bits (i.e., 0 and 1) in the memory. By utilizing the latest technique in integer programming, we equivalently reformulate this BIP problem as a continuous optimization problem, which can be effectively and efficiently solved using the alternating direction method of multipliers (ADMM) method. Consequently, the flipped critical bits can be easily determined through optimization, rather than using a heuristic strategy. Extensive experiments demonstrate the superiority of our method in attacking DNNs.

1. INTRODUCTION

Due to the great success of deep neural networks (DNNs), its vulnerability (Szegedy et al., 2014; Gu et al., 2019) has attracted great attention, especially for security-critical applications (e.g., face recognition (Dong et al., 2019) and autonomous driving (Eykholt et al., 2018) ). For example, backdoor attack (Saha et al., 2020; Xie et al., 2019) manipulates the behavior of the DNN model by mainly poisoning some training data in the training stage; adversarial attack (Goodfellow et al., 2015; Moosavi-Dezfooli et al., 2017) aims to fool the DNN model by adding malicious perturbations onto the input in the inference stage. Compared to the backdoor attack and adversarial attack, a novel attack paradigm, dubbed weight attack (Breier et al., 2018) , has been rarely studied. It assumes that the attacker has full access to the memory of a device, such that he/she can directly change the parameters of a deployed model to achieve some malicious purposes (e.g., crushing a fully functional DNN and converting it to a random output generator (Rakin et al., 2019) ). Since weight attack neither modifies the input nor control the training process, both the service provider and the user are difficult to realize the existence of the attack. In practice, since the deployed DNN model is stored as binary bits in the memory, the attacker can modify the model parameters using some physical fault injection techniques, such as Row Hammer Attack (Agoyan et al., 2010; Selmke et al., 2015) and Laser Beam Attack (Kim et al., 2014) . These techniques can precisely flip any bit of the data in the memory. Some previous works (Rakin et al., 2019; 2020a; b) have demonstrated that it is feasible to change the model weights via bit flipping to achieve some malicious purposes. However, the critical bits are identified mostly 1 1 1  0  1 0 0  1  1 0 0  1  0 1 1  0 0 1 0  0 1 1 1  0  1 0 1  1  1 0 1  0  1 1 0  0  0 0 0  1 1 0 0  1 0 1 0  0 Attacker: identify and flip critical bits  0 1 0 1 1 0 0 1 0 1 0 0 1 1 0 0 0 1 1 0 1 0 0 1 1 1 1 1 0 1 0 1 0 0 1 0 1 1 1 0 0 0 1 1 0 1 0 1 1 0 1 0 0 1 0 1 0 1 0 0 1 0 0 1 1 0 0 1 1 0 0 0 0 1 0 1 1 0 0 1 0 1 0 0 1 1 0 0 0 1 1 0 1 0 0 1 1 0 1 1 0 1 0 1 0 0 1 0 1 1 1 0 0 0 1 1 0 1 0 1 1 0 1 0 0 1 0 1 0 1 1 0 1 0 0 1 1 0 0 1 1 0 0 0 1 1 1 0 1 1 0 1 1 0 0 1 0 1 1 0 0 1 0 0 1 1 1 0 1 0 1 1 1 0 1 0 1 1 0 0 0 0 0 1 1 0 0 1 0 1 0 0

DNN in the memory

Figure 1 : Demonstration of our proposed attack against a deployed DNN in the memory. By flipping critical bits (marked in red), our method can mislead a specific sample into the target class without any sample modification while not significantly reduce the prediction accuracy of other samples. using some heuristic strategies in their methods. For example, Rakin et al. ( 2019) combined gradient ranking and progressive search to identify the critical bits for flipping. This work also focuses on the bit-level weight attack against DNNs in the deployment stage, whereas with two different goals, including effectiveness and stealthiness. The effectiveness requires that the attacked model can misclassify a specific sample to a attacker-specified target class without any sample modification, while the stealthiness encourages that the prediction accuracy of other samples will not be significantly reduced. As shown in Fig. 1 , to achieve these goals, we propose to identify and flip bits that are critical to the prediction of the specific sample but not significantly impact the prediction of other samples. Specifically, we treat each bit in the memory as a binary variable, and our task is to determine its state (i.e., 0 or 1). Accordingly, it can be formulated as a binary integer programming (BIP) problem. To further improve the stealthiness, we also limit the number of flipped bits, which can be formulated as a cardinality constraint. However, how to solve the BIP problem with a cardinality constraint is a challenging problem. Fortunately, inspired by an advanced optimization method, the p -box ADMM (Wu & Ghanem, 2018) , this problem can be reformulated as a continuous optimization problem, which can further be efficiently and effectively solved by the alternating direction method of multipliers (ADMM) (Glowinski & Marroco, 1975; Gabay & Mercier, 1976) . Consequently, the flipped bits can be determined through optimization rather than the original heuristic strategy, which makes our attack more effective. Note that we also conduct attack against the quantized DNN models, following the setting in some related works (Rakin et al., 2019; 2020a) . Extensive experiments demonstrate the superiority of the proposed method over several existing weight attacks. For example, our method achieves a 100% attack success rate with 7.37 bit-flips and 0.09% accuracy degradation of the rest unspecific inputs in attacking a 8-bit quantized ResNet-18 model on ImageNet. Moreover, we also demonstrate that the proposed method is also more resistant to existing defense methods. The main contributions of this work are three-fold. 1) We explore a novel attack scenario where the attacker enforces a specific sample to be predicted as a target class by modifying the weights of a deployed model via bit flipping without any sample modification. 2) We formulate the attack as a BIP problem with the cardinality constraint and propose an effective and efficient method to solve this problem. 3) Extensive experiments verify the superiority of the proposed method against DNNs with or without defenses.

2. RELATED WORKS

Neural Network Weight Attack. How to perturb the weights of a trained DNN for malicious purposes received extensive attention (Liu et al., 2017a; 2018b; Hong et al., 2019) . Liu et al. (2017a) firstly proposed two schemes to modify model parameters for misclassification without and with considering stealthiness, which is dubbed single bias attack (SBA) and gradient descent

availability

//github.

