TARGETED ATTACK AGAINST DEEP NEURAL NET-WORKS VIA FLIPPING LIMITED WEIGHT BITS

Abstract

To explore the vulnerability of deep neural networks (DNNs), many attack paradigms have been well studied, such as the poisoning-based backdoor attack in the training stage and the adversarial attack in the inference stage. In this paper, we study a novel attack paradigm, which modifies model parameters in the deployment stage for malicious purposes. Specifically, our goal is to misclassify a specific sample into a target class without any sample modification, while not significantly reduce the prediction accuracy of other samples to ensure the stealthiness. To this end, we formulate this problem as a binary integer programming (BIP), since the parameters are stored as binary bits (i.e., 0 and 1) in the memory. By utilizing the latest technique in integer programming, we equivalently reformulate this BIP problem as a continuous optimization problem, which can be effectively and efficiently solved using the alternating direction method of multipliers (ADMM) method. Consequently, the flipped critical bits can be easily determined through optimization, rather than using a heuristic strategy. Extensive experiments demonstrate the superiority of our method in attacking DNNs.

1. INTRODUCTION

Due to the great success of deep neural networks (DNNs), its vulnerability (Szegedy et al., 2014; Gu et al., 2019) has attracted great attention, especially for security-critical applications (e.g., face recognition (Dong et al., 2019) and autonomous driving (Eykholt et al., 2018) ). For example, backdoor attack (Saha et al., 2020; Xie et al., 2019) manipulates the behavior of the DNN model by mainly poisoning some training data in the training stage; adversarial attack (Goodfellow et al., 2015; Moosavi-Dezfooli et al., 2017) aims to fool the DNN model by adding malicious perturbations onto the input in the inference stage. Compared to the backdoor attack and adversarial attack, a novel attack paradigm, dubbed weight attack (Breier et al., 2018) , has been rarely studied. It assumes that the attacker has full access to the memory of a device, such that he/she can directly change the parameters of a deployed model to achieve some malicious purposes (e.g., crushing a fully functional DNN and converting it to a random output generator (Rakin et al., 2019) ). Since weight attack neither modifies the input nor control the training process, both the service provider and the user are difficult to realize the existence of the attack. In practice, since the deployed DNN model is stored as binary bits in the memory, the attacker can modify the model parameters using some physical fault injection techniques, such as Row Hammer Attack (Agoyan et al., 2010; Selmke et al., 2015) and Laser Beam Attack (Kim et al., 2014) . These techniques can precisely flip any bit of the data in the memory. Some previous works (Rakin et al., 2019; 2020a; b) have demonstrated that it is feasible to change the model weights via bit flipping to achieve some malicious purposes. However, the critical bits are identified mostly

availability

//github.

