TARGETED ATTACK AGAINST DEEP NEURAL NET-WORKS VIA FLIPPING LIMITED WEIGHT BITS

Abstract

To explore the vulnerability of deep neural networks (DNNs), many attack paradigms have been well studied, such as the poisoning-based backdoor attack in the training stage and the adversarial attack in the inference stage. In this paper, we study a novel attack paradigm, which modifies model parameters in the deployment stage for malicious purposes. Specifically, our goal is to misclassify a specific sample into a target class without any sample modification, while not significantly reduce the prediction accuracy of other samples to ensure the stealthiness. To this end, we formulate this problem as a binary integer programming (BIP), since the parameters are stored as binary bits (i.e., 0 and 1) in the memory. By utilizing the latest technique in integer programming, we equivalently reformulate this BIP problem as a continuous optimization problem, which can be effectively and efficiently solved using the alternating direction method of multipliers (ADMM) method. Consequently, the flipped critical bits can be easily determined through optimization, rather than using a heuristic strategy. Extensive experiments demonstrate the superiority of our method in attacking DNNs.

1. INTRODUCTION

Due to the great success of deep neural networks (DNNs), its vulnerability (Szegedy et al., 2014; Gu et al., 2019) has attracted great attention, especially for security-critical applications (e.g., face recognition (Dong et al., 2019) and autonomous driving (Eykholt et al., 2018) ). For example, backdoor attack (Saha et al., 2020; Xie et al., 2019) manipulates the behavior of the DNN model by mainly poisoning some training data in the training stage; adversarial attack (Goodfellow et al., 2015; Moosavi-Dezfooli et al., 2017) aims to fool the DNN model by adding malicious perturbations onto the input in the inference stage. Compared to the backdoor attack and adversarial attack, a novel attack paradigm, dubbed weight attack (Breier et al., 2018) , has been rarely studied. It assumes that the attacker has full access to the memory of a device, such that he/she can directly change the parameters of a deployed model to achieve some malicious purposes (e.g., crushing a fully functional DNN and converting it to a random output generator (Rakin et al., 2019) ). Since weight attack neither modifies the input nor control the training process, both the service provider and the user are difficult to realize the existence of the attack. In practice, since the deployed DNN model is stored as binary bits in the memory, the attacker can modify the model parameters using some physical fault injection techniques, such as Row Hammer Attack (Agoyan et al., 2010; Selmke et al., 2015) and Laser Beam Attack (Kim et al., 2014) . These techniques can precisely flip any bit of the data in the memory. Some previous works (Rakin et al., 2019; 2020a; b) have demonstrated that it is feasible to change the model weights via bit flipping to achieve some malicious purposes. However, the critical bits are identified mostly Figure 1 : Demonstration of our proposed attack against a deployed DNN in the memory. By flipping critical bits (marked in red), our method can mislead a specific sample into the target class without any sample modification while not significantly reduce the prediction accuracy of other samples. using some heuristic strategies in their methods. For example, Rakin et al. (2019) combined gradient ranking and progressive search to identify the critical bits for flipping. This work also focuses on the bit-level weight attack against DNNs in the deployment stage, whereas with two different goals, including effectiveness and stealthiness. The effectiveness requires that the attacked model can misclassify a specific sample to a attacker-specified target class without any sample modification, while the stealthiness encourages that the prediction accuracy of other samples will not be significantly reduced. As shown in Fig. 1 , to achieve these goals, we propose to identify and flip bits that are critical to the prediction of the specific sample but not significantly impact the prediction of other samples. Specifically, we treat each bit in the memory as a binary variable, and our task is to determine its state (i.e., 0 or 1). Accordingly, it can be formulated as a binary integer programming (BIP) problem. To further improve the stealthiness, we also limit the number of flipped bits, which can be formulated as a cardinality constraint. However, how to solve the BIP problem with a cardinality constraint is a challenging problem. Fortunately, inspired by an advanced optimization method, the p -box ADMM (Wu & Ghanem, 2018) , this problem can be reformulated as a continuous optimization problem, which can further be efficiently and effectively solved by the alternating direction method of multipliers (ADMM) (Glowinski & Marroco, 1975; Gabay & Mercier, 1976) . Consequently, the flipped bits can be determined through optimization rather than the original heuristic strategy, which makes our attack more effective. Note that we also conduct attack against the quantized DNN models, following the setting in some related works (Rakin et al., 2019; 2020a) . Extensive experiments demonstrate the superiority of the proposed method over several existing weight attacks. For example, our method achieves a 100% attack success rate with 7.37 bit-flips and 0.09% accuracy degradation of the rest unspecific inputs in attacking a 8-bit quantized ResNet-18 model on ImageNet. Moreover, we also demonstrate that the proposed method is also more resistant to existing defense methods. The main contributions of this work are three-fold. 1) We explore a novel attack scenario where the attacker enforces a specific sample to be predicted as a target class by modifying the weights of a deployed model via bit flipping without any sample modification. 2) We formulate the attack as a BIP problem with the cardinality constraint and propose an effective and efficient method to solve this problem. 3) Extensive experiments verify the superiority of the proposed method against DNNs with or without defenses.

2. RELATED WORKS

Neural Network Weight Attack. How to perturb the weights of a trained DNN for malicious purposes received extensive attention (Liu et al., 2017a; 2018b; Hong et al., 2019) . Liu et al. (2017a) firstly proposed two schemes to modify model parameters for misclassification without and with considering stealthiness, which is dubbed single bias attack (SBA) and gradient descent attack (GDA) respectively. After that, Trojan attack (Liu et al., 2018b) was proposed, which injects malicious behavior to the DNN by generating a general trojan trigger and then retraining the model. This method requires to change lots of parameters. Recently, fault sneaking attack (FSA) (Zhao et al., 2019) was proposed, which aims to misclassify certain samples into a target class by modifying the DNN parameters with two constraints, including maintaining the classification accuracy of other samples and minimizing parameter modifications. Note that all those methods are designed to misclassify multiple samples instead of a specific sample, which may probably modify lots of parameters or degrade the accuracy of other samples sharply. Bit-Flip based Attack. Recently, some physical fault injection techniques (Agoyan et al., 2010; Kim et al., 2014; Selmke et al., 2015) were proposed, which can be adopted to precisely flip any bit in the memory. Those techniques promote researchers to study how to modify model parameters at the bit-level. As a branch of weight attack, the bit-flip based attack was firstly explored in (Rakin et al., 2019) . It proposed an untargeted attack that can convert the attacked DNN to a random output generator with several bit-flips. Besides, Rakin et al. (2020a) proposed the targeted bit Trojan (TBT) to inject the fault into DNNs by flipping some critical bits. Specifically, the attacker flips the identified bits to force the network to classify all samples embedded with a trigger to a certain target class, while the network operates with normal inference accuracy with benign samples. Most recently, Rakin et al. (2020b) proposed the targeted bit-flip attack (T-BFA), which achieves malicious purposes without modifying samples. Specifically, T-BFA can mislead samples from single source class or all classes to a target class by flipping the identified weight bits. It is worth noting that the above bit-flip based attacks leverage heuristic strategies to identify critical weight bits. How to find critical bits for the bit-flip based attack method is still an important open question. 3 TARGETED ATTACK WITH LIMITED BIT-FLIPS (TA-LBF)

3.1. PRELIMINARIES

Storage and Calculation of Quantized DNNs. Currently, it is a widely-used technique to quantize DNNs before deploying on devices for efficiency and reducing storage size. For each weight in l-th layer of a Q-bit quantized DNN, it will be represented and then stored as the signed integer in two's complement representation (v = [v Q ; v Q-1 ; ...; v 1 ] ∈ {0, 1} Q ) in the memory. Attacker can modify the weights of DNNs through flipping the stored binary bits. In this work, we adopt the layer-wise uniform weight quantization scheme similar to Tensor-RT (Migacz, 2017) . Accordingly, each binary vector v can be converted to a real number by a function h(•), as follow: h(v) = (-2 Q-1 • v Q + Q-1 i=1 2 i-1 • v i ) • ∆ l , where l indicates which layer the weight is from, ∆ l > 0 is a known and stored constant which represents the step size of the l-th layer weight quantizer. Notations. We denote a Q-bit quantized DNN-based classification model as f : X → Y, where X ∈ R d being the input space and Y ∈ {1, 2, ..., K} being the K-class output space. Assuming that the last layer of this DNN model is a fully-connected layer with B ∈ {0, 1} K×C×Q being the quantized weights, where C is the dimension of last layer's input. Let B i,j ∈ {0, 1} Q be the two's complement representation of a single weight and B i ∈ {0, 1} C×Q denotes all the binary weights connected to the i-th output neuron. Given a test sample x with the ground-truth label s, f (x; Θ, B) ∈ [0, 1] K is the output probability vector and g(x; Θ) ∈ R C is the input of the last layer, where Θ denotes the model parameters without the last layer. Attack Scenario. In this paper, we focus on the white-box bit-flip based attack, which was first introduced in (Rakin et al., 2019) . Specifically, we assume that the attacker has full knowledge of the model (including it's architecture, parameters, and parameters' location in the memory), and can precisely flip any bit in the memory. Besides, we also assume that attackers can have access to a small portion of benign samples, but they can not tamper the training process and the training data. Attacker's Goals. Attackers have two main goals, including the effectiveness and the stealthiness. Specifically, effectiveness requires that the attacked model can misclassify a specific sample to a predefined target class without any sample modification, and the stealthiness requires that the prediction accuracy of other samples will not be significantly reduced.

3.2. THE PROPOSED METHOD

Loss for Ensuring Effectiveness. Recall that our first target is to force a specific image to be classified as the target class by modifying the model parameters at the bit-level. To this end, the most straightforward way is maximizing the logit of the target class while minimizing that of the source class. For a sample x, the logit of a class can be directly determined by the input of the last layer g(x; Θ) and weights connected to the node of that class. Accordingly, we can modify weights only connected to the source and target class to fulfill our purpose, as follows: Loss for Ensuring Stealthiness. As we mentioned in Section 3.1, we assume that the attacker can get access to an auxiliary sample set {(x i , y i )} N i=1 . Accordingly, the stealthiness of the attack can be formulated as follows: Besides, to better meet our goal, a straightforward additional approach is reducing the magnitude of the modification. In this paper, we constrain the number of bit-flips less than k. Physical bit flipping techniques can be time-consuming as discussed in (Van Der Veen et al., 2016; Zhao et al., 2019) . Moreover, such techniques lead to abnormal behaviors in the attacked system (e.g., suspicious cache activity of processes), which may be detected by some physical detection-based defenses (Gruss et al., 2018) . As such, minimizing the number of bit-flips is critical to make the attack more efficient and practical. L 1 (x; Θ, B, Bs , Bt ) = max m -p(x; Θ, Bt ) + δ, 0 + max p(x; Θ, Bs ) -m + δ, 0 , L 2 ( Bs , Bt ) = N i=1 (f (x i ; Θ, B {1,...,K}\{s,t} , Bs , Bt ), y i ), Overall Objective. In conclusion, the final objective function is as follows: min Bs, Bt L 1 ( Bs , Bt ) + λL 2 ( Bs , Bt ), s.t. Bs ∈ {0, 1} C×Q , Bt ∈ {0, 1} C×Q , d H (B s , Bs ) + d H (B t , Bt ) ≤ k, where d H (•, •) denotes the Hamming distance and λ > 0 is a trade-off parameter. For the sake of brevity, B s and B t are concatenated and further reshaped to the vector b ∈ {0, 1} 2CQ . Similarly, Bs and Bt are concatenated and further reshaped to the vector b ∈ {0, 1} 2CQ . Besides, for binary vector b and b, there exists a nice relationship between Hamming distance and Euclidean distance: d H (b, b) = ||b -b|| 2 2 . The new formulation of the objective is as follows: min b L 1 ( b) + λL 2 ( b), s.t. b ∈ {0, 1} 2CQ , ||b -b|| 2 2 -k ≤ 0. Problem ( 5) is denoted as TA-LBF (targeted attack with limited bit-flips). Note that TA-LBF is a binary integer programming (BIP) problem, whose optimization is challenging. We will introduce an effective and efficient method to solve it in the following section.

3.3. AN EFFECTIVE OPTIMIZATION METHOD FOR TA-LBF

To solve the challenging BIP problem (5), we adopt the generic solver for integer programming, dubbed p -Box ADMM (Wu & Ghanem, 2018) . The solver presents its superior performance in many tasks, e.g., model pruning (Li et al., 2019) , clustering (Bibi et al., 2019) , MAP inference (Wu et al., 2020a) , adversarial attack (Fan et al., 2020) , etc.. It proposed to replace the binary constraint equivalently by the intersection of two continuous constraints, as follows b ∈ {0, 1} 2CQ ⇔ b ∈ (S b ∩ S p ), where S b = [0, 1] 2CQ indicates the box constraint, and S p = { b : || b -1 2 || 2 2 = 2CQ 4 } denotes the 2 -sphere constraint. Utilizing (6), Problem (5) is equivalently reformulated as min b,u1∈S b ,u2∈Sp,u3∈R + L 1 ( b) + λL 2 ( b), s.t. b = u 1 , b = u 2 , ||b -b|| 2 2 -k + u 3 = 0, where two extra variables u 1 and u 2 are introduced to split the constraints w.r.t. b. Besides, the nonnegative slack variable u 3 ∈ R + is used to transform ||b-b|| 2 2 -k ≤ 0 in (5) into ||b-b|| 2 2 -k+u 3 = 0. The above constrained optimization problem can be efficiently solved by the alternating direction method of multipliers (ADMM) (Boyd et al., 2011) . Following the standard procedure of ADMM, we firstly present the augmented Lagrangian function of the above problem, as follows: L( b, u 1 , u 2 , u 3 , z 1 , z 2 , z 3 ) =L 1 ( b) + λL 2 ( b) + z 1 ( b -u 1 ) + z 2 ( b -u 2 ) +z 3 (||b -b|| 2 2 -k + u 3 ) + c 1 (u 1 ) + c 2 (u 2 ) + c 3 (u 3 ) + ρ 1 2 || b -u 1 || 2 2 + ρ 2 2 || b -u 2 || 2 2 + ρ 3 2 (||b -b|| 2 2 -k + u 3 ) 2 , where z 1 , z 2 ∈ R 2CQ and z 3 ∈ R are dual variables, and ρ 1 , ρ 2 , ρ 3 > 0 are penalty factors, which will be specified later. c 1 (u 1 ) = I {u1∈S b } , c 2 (u 2 ) = I {u2∈Sp} , and c 3 (u 3 ) = I {u3∈R + } capture the constraints S b , S p and R + , respectively. The indicator function I {a} = 0 if a is true; otherwise, I {a} = +∞. Based on the augmented Lagrangian function, the primary and dual variables are updated iteratively, with r indicating the iteration index. Given ( br , z r 1 , z r 2 , z r 3 ), update (u r+1 1 , u r+1 2 , u 3 ). Given ( br , z r 1 , z r 2 , z r 3 ), (u 1 , u 2 , u 3 ) are independent, and they can be optimized in parallel, as follows                  u r+1 1 = arg min u1∈S b (z r 1 ) ( br -u 1 ) + ρ1 2 || br -u 1 || 2 2 = P S b ( br + z r 1 ρ1 ), u r+1 2 = arg min u2∈Sp (z r 2 ) ( br -u 2 ) + ρ2 2 || br -u 2 || 2 2 = P Sp ( br + z r 2 ρ2 ), u r+1 3 = arg min u3∈R + z r 3 (||b -br || 2 2 -k + u 3 ) + ρ3 2 (||b -br || 2 2 -k + u 3 ) 2 = P R + (-||b -br || 2 2 + k - z r 3 ρ3 ), where P S b (a) = min((1, max(0, a)) with a ∈ R n is the projection onto the box constraint S b ; P Sp (a) = √ n 2 ā ||a|| + 1 2 with ā = a -1 2 indicates the projection onto the 2 -sphere constraint S p (Wu & Ghanem, 2018); P R + (a) = max(0, a) with a ∈ R indicates the projection onto R + . Given (u r+1 1 , u r+1 2 , u r+1 3 , z r 1 , z r 2 , z r 3 ), update br+1 . Although there is no closed-form solution to br+1 , it can be easily updated by the gradient descent method, as both L 1 ( b) and L 2 ( b) are differentiable w.r.t. b, as follows br+1 ← br -η • ∂L( b, u r+1 1 , u r+1 2 , u r+1 3 , z r 1 , z r 2 , z r 3 ) ∂ b b= br , ( ) where η > 0 denotes the step size. Note that we can run multiple steps of gradient descent in the above update. Both the number of steps and η will be specified in later experiments. Besides, due to the space limit, the detailed derivation of ∂L/∂ b will be presented in Appendix A. Given ( br+1 , u r+1 1 , u r+1 2 , u r+1 3 ), update (z r+1 1 , z r+1 2 , z 3 ). The dual variables are updated by the gradient ascent method, as follows    z r+1 1 = z r 1 + ρ 1 ( br+1 -u r+1 1 ), z r+1 2 = z r 2 + ρ 2 ( br+1 -u r+1 2 ), z r+1 3 = z r 3 + ρ 3 (||b -br+1 || 2 2 -k + u r+1 ). (11) Remarks. 1) Note that since (u r+1 1 , u r+1 2 , u 3 ) are updated in parallel, their updates belong to the same block. Thus, the above algorithm is a two-block ADMM algorithm. We provide the algorithm outline in Appendix B. 2) Except for the update of br+1 , all other updates are very simple and efficient. The computational cost of the whole algorithm will be analyzed in Appendix C. 3) Due to the inexact solution to br+1 using gradient descent, the theoretical convergence of the whole ADMM algorithm cannot be guaranteed. However, as demonstrated in many previous works (Gol'shtein & Tret'yakov, 1979; Eckstein & Bertsekas, 1992; Boyd et al., 2011) , the inexact two-block ADMM often shows good practical convergence, which is also the case in our later experiments. Besides, the numerical convergence analysis is presented in Appendix D. 4) The proper adjustment of (ρ 1 , ρ 2 , ρ 3 ) could accelerate the practical convergence, which will be specified later .

4.1. EVALUATION SETUP

Settings. We compare our method (TA-LBF) with GDA (Liu et al., 2017a) , FSA (Zhao et al., 2019) , T-BFA (Rakin et al., 2020b), and TBT (Rakin et al., 2020a) . All those methods can be adopted to misclassify a specific image into a target class. We also take the fine-tuning (FT) of the last fully-connected layer as a baseline method. We conduct experiments on CIFAR-10 ( Krizhevsky et al., 2009) and ImageNet (Russakovsky et al., 2015) . We randomly select 1,000 images from each dataset as the evaluation set for all methods. Specifically, for each of the 10 classes in CIFAR-10, we perform attacks on the 100 randomly selected validation images from the other 9 classes. For ImageNet, we randomly choose 50 target classes. For each target class, we perform attacks on 20 images randomly selected from the rest classes in the validation set. Besides, for all methods except GDA which does not employ auxiliary samples, we provide 128 and 512 auxiliary samples on CIFAR-10 and ImageNet, respectively. Following the setting in (Rakin et al., 2020a; b) , we adopt the quantized ResNet (He et al., 2016) and VGG (Simonyan & Zisserman, 2015) as the target models. For our TA-LBF, the trade-off parameter λ and the constraint parameter k affect the attack stealthiness and the attack success rate. We adopt a strategy for jointly searching λ and k, which is specified in Appendix E.3. More descriptions of our settings are provided in Appendix E. Evaluation Metrics. We adopt three metrics to evaluate the attack performance, i.e., the post attack accuracy (PA-ACC), the attack success rate (ASR), and the number of bit-flips (N flip ). PA-ACC denotes the post attack accuracy on the validation set except for the specific attacked sample and the auxiliary samples. ASR is defined as the ratio of attacked samples that are successfully attacked into the target class among all 1,000 attacked samples. N flip is the number of bit-flips required for an attack. A better attack performance corresponds to a higher PA-ACC and ASR, while a lower N flip . Besides, we also show the accuracy of the original model, denoted as ACC.

4.2. MAIN RESULTS

Results on CIFAR-10. The results of all methods on CIFAR-10 are shown in Table 1 . Our method achieves a 100% ASR with the fewest N flip for all the bit-widths and architectures. FT modifies the maximum number of bits among all methods since there is no limitation of parameter modifications. Due to the absence of the training data, the PA-ACC of FT is also poor. These results indicate that fine-tuning the trained DNN as an attack method is infeasible. Although T-BFA flips the secondfewest bits under three cases, it fails to achieve a higher ASR than GDA and FSA. In terms of PA-ACC, TA-LBF is comparable to other methods. Note that the PA-ACC of TA-LBF significantly outperforms that of GDA, which is the most competitive w.r.t. ASR and N flip among all the baseline methods. The PA-ACC of GDA is relatively poor, because it does not employ auxiliary samples. Achieving the highest ASR, the lowest N flip , and the comparable PA-ACC demonstrates that our optimization-based method is more superior than other heuristic methods (TBT, T-BFA and GDA). Table 1 : Results of all attack methods across different bit-widths and architectures on CIFAR-10 and ImageNet (bold: the best; underline: the second best). The mean and standard deviation of PA-ACC and N flip are calculated by attacking the 1,000 images. Our method is denoted as TA-LBF. shows very competitive performance compared to other methods. However, our method obtains the highest PA-ACC, the fewest bit-flips (less than 8), and a 100% ASR in attacking ResNet. For VGG, our method also achieves a 100% ASR with the fewest N flip for both bit-widths. The N flip results of our method are mainly attributed to the cardinality constraint on the number of bit-flips. Moreover, for our method, the average PA-ACC degradation over four cases on ImageNet is only 0.06%, which demonstrates the stealthiness of our attack. When comparing the results of ResNet and VGG, an interesting observation is that all methods require significantly more bit-flips for VGG. One reason is that VGG is much wider than ResNet. Similar to the claim in (He et al., 2020) , increasing the network width contributes to the robustness against the bit-flip based attack.

4.3. RESISTANCE TO DEFENSE METHODS

Resistance to Piece-wise Clustering. He et al. (2020) proposed a novel training technique, called piece-wise clustering, to enhance the network robustness against the bit-flip based attack. Such a training technique introduces an additional weight penalty to the inference loss, which has the effect of eliminating close-to-zero weights (He et al., 2020) . We test the resistance of all attack methods to the piece-wise clustering. We conduct experiments with the 8-bit quantized ResNet on CIFAR-10 and ImageNet. Following the ideal configuration in (He et al., 2020) , the clustering coefficient, which is a hyper-parameter of piece-wise clustering, is set to 0.001 in our evaluation. For our method, the initial k is set to 50 on ImageNet and the rest settings are the same as those in Section 4.1. Besides the three metrics in Section 4.1, we also present the number of increased N flip compared to the model without defense (i.e., results in Table 1 ), denoted as ∆N flip . The results of the resistance to the piece-wise clustering of all attack methods are shown in Table 2 . It shows that the model trained with piece-wise clustering can improve the number of required bit-flips for all attack methods. However, our method still achieves a 100% ASR with the least number of bit-flips on both two datasets. Although TBT achieves a smaller ∆N flip than ours on CIFAR-10, its ASR is only 52.3%, which also verifies the defense effectiveness of the piece-wise clustering. Compared with other methods, TA-LBF achieves the fewest ∆N flip on ImageNet and the best PA-ACC on both datasets. These results demonstrate the superiority of our method over other methods when attacking models trained with piece-wise clustering. Resistance to Larger Model Capacity. Previous studies (He et al., 2020; Rakin et al., 2020b) observed that increasing the network capacity can improve the robustness against the bit-flip based attack. Accordingly, we evaluate all attack methods against the models with a larger capacity using the 8-bit quantized ResNet on both datasets. Similar to the strategy in (He et al., 2020) , we increase the model capacity by varying the network width (i.e., 2× width in our experiments). All settings of our method are the same as those used in Section 4.1. The results are presented in Table 2 . We observe that all methods require more bit-flips to attack the model with the 2× width. To some extent, it demonstrates that the wider network with the same architecture is more robust against the bit-flip based attack. However, our method still achieves a 100% ASR with the fewest N flip and ∆N flip . Moreover, when comparing the two defense methods, we find that piece-wise clustering performs better than the model with a larger capacity in terms of ∆N flip . However, piece-wise clustering training also causes the accuracy decrease of the original model (e.g., from 92.16% to 91.01% on CIFAR-10). We provide more results in attacking models with defense under different settings in Appendix F. 

4.4. ABLATION STUDY

We perform ablation studies on parameters λ and k, and the number of auxiliary samples N . We use the 8-bit quantized ResNet on CIFAR-10 as the representative for analysis. We discuss the attack performance of TA-LBF under different values of λ while k is fixed at 20, and under different values of k while λ is fixed at 10. To analyze the effect of N , we configure N from 25 to 800 and keep other settings the same as those in Section 4.1. The results are presented in Fig. 2 . We observe that our method achieves a 100% ASR when λ is less than 20. As expected, the PA-ACC increases while the ASR decreases along with the increase of λ. The plot of parameter k presents that k can exactly limit the number of bit-flips, while other attack methods do not involve such constraint. This advantage is critical since it allows the attacker to identify limited bits to perform an attack when the budget is fixed. As shown in the figure, the number of auxiliary samples less than 200 has a marked positive impact on the PA-ACC. It's intuitive that more auxiliary samples can lead to a better PA-ACC. The observation also indicates that TA-LBF still works well without too many auxiliary samples.

4.5. VISUALIZATION OF DECISION BOUNDARY

To further compare FSA and GDA with our method, we visualize the decision boundaries of the original and the post attack models in Fig. 3 . We adopt a four-layer Multi-Layer Perceptron trained with the simulated 2-D Blob dataset from 4 classes. The original decision boundary indicates that the original model classifies all data points almost perfectly. The attacked sample is classified into Class 3 by all methods. Visually, GDA modifies the decision boundary drastically, especially for Class 0. However, our method modifies the decision boundary mainly around the attacked sample. Althoug FSA is comparable to ours visually in Fig. 3 , it flips 10× bits than GDA and TA-LBF. In terms of the numerical results, TA-LBF achieves the best PA-ACC and the fewest N flip . This finding verifies that our method can achieve a successful attack even only tweaking the original classifier.

5. CONCLUSION

In this work, we have presented a novel attack paradigm that the weights of a deployed DNN can be slightly changed via bit flipping in the memory, to give a target prediction for a specific sample, while the predictions on other samples are not significantly influenced. Since the weights are stored as binary bits in the memory, we formulate this attack as a binary integer programming (BIP) problem, which can be effectively and efficiently solved by a continuous algorithm. Since the critical bits are determined through optimization, the proposed method can achieve the attack goals by flipping a few bits, and it shows very good performance under different experimental settings. A UPDATE b BY GRADIENT DESCENT In this section, we derive the gradient of L w.r.t. b, which is adopted to update br+1 by gradient descent (see Eq. ( 10)). The derivation consists of the following parts. Derivation of ∂L 1 ( b)/∂ b. For clarity, here we firstly repeat some definitions, L 1 ( Bs , Bt ) = max m -p(x; Θ, Bt ) + δ, 0 + max p(x; Θ, Bs ) -m + δ, 0 , p(x; Θ, Bi ) = [h( Bi,1 ); h( Bi,2 ); ...; h( Bi,C )] g(x; Θ), ( 13) h(v) = (-2 Q-1 • v Q + Q-1 i=1 2 i-1 • v i ) • ∆ l . ( ) Then, we obtain that ∂p(x; Θ, Bi ) ∂ Bi = [g 1 (x; Θ) • ( ∇h( Bi,1 ) ∂ Bi,1 ) ; ...; g C (x; Θ) • ( ∇h( Bi,C ) ∇ Bi,C ) ], where ∇h(v) ∇v = [2 0 ; 2 1 , . . . , 2 Q-2 ; -2 Q-1 ] • ∆ l is a constant , and here l indicates the last layer; g j (x; Θ) denotes the j-th entry of the vector g(x; Θ). Utilizing (15), we have ∂L 1 ( Bs , Bt ) ∂ Bs = ∂p(x;Θ, Bs) ∂ Bs , if p(x; Θ, B s ) > m -δ 0, otherwise , ∂L 1 ( Bs , Bt ) ∂ Bt = -∂p(x;Θ, Bt) ∂ Bt , if p(x; Θ, B t ) < m + δ 0, otherwise . Thus, we obtain that ∂L 1 ( b) ∂ b = Reshape ∂L 1 ( Bs , Bt ) ∂ Bs ; Reshape ∂L 1 ( Bs , Bt ) ∂ Bt , ( ) where Reshape(•) elongates a matrix to a vector along the column. Derivation of ∂L 2 ( b)/∂ b. For clarity, here we firstly repeat the following definition L 2 ( Bs , Bt ) = N i=1 f (x i ; Θ, B {1,...,K}\{s,t} , Bs , Bt ), y i , where f j (x i ; Θ, B {1,...,K}\{s,t} , Bs , Bt ) = Softmax(p(x i ; Θ, Bj )) or Softmax(p(x i ; Θ, B j )) indicates the posterior probability of x i w.r.t. class j, and we simply denote f (x i ) ∈ [0, 1] K as the posterior probability vector of x i . {(x i , y i )} N i=1 denotes the auxiliary sample set. (•, •) is specified as the cross entropy loss. Then, we have ∂L 2 ( Bs , Bt ) ∂ Bs = N i=1 I(y i = s) -f s (x i ; Θ, B {1,...,K}\{s,t} , Bs , Bt ) • ∂p(x i ; Θ, Bs ) ∂ Bs , ∂L 2 ( Bs , Bt ) ∂ Bt = N i=1 I(y i = t) -f t (x i ; Θ, B {1,...,K}\{s,t} , Bs , Bt ) • ∂p(x i ; Θ, Bt ) ∂ Bt , where I(a) = 1 of a is true, otherwise I(a) = 0. Thus, we obtain ∂L 2 ( b) ∂ b = Reshape ∂L 2 ( Bs , Bt ) ∂ Bs ; Reshape ∂L 2 ( Bs , Bt ) ∂ Bt . ( ) Derivation of ∂L( b)/∂ b. According to Eq. ( 8), and utilizing Eqs. ( 17) and ( 21), we obtain ∂L( b) ∂ b = ∂L 1 ( b) ∂ b + ∂L 2 ( b) ∂ b +z 1 +z 2 +ρ 1 ( b-u 1 )+ρ 2 ( b-u 2 )+2( b-b)• z 3 +ρ 3 || b-b|| 2 2 -k+u 3 .

B ALGORITHM OUTLINE

Algorithm 1 Continuous optimization for the BIP problem (5).

Input:

The original quantized DNN model f with weights Θ, B, attacked sample x with groundtruth label s, target class t, auxiliary sample set {(x i , y i )} N i=1 , hyper-parameters λ, k, and δ. Output: b. 1: Initial u 0 1 , u 0 2 , u 0 3 , z 0 1 , z 0 2 , z 3 , b0 and let r ← 0; 2: while not converged do The computational complexity of the proposed algorithm (i.e., Algorithm 1) consists of two parts, the forward and backward pass. In terms of the forward pass, since Θ, B {1,...,K}\{s,t} are fixed during the optimization, their involved terms, including g(x; Θ) and p(x; Θ, B i )| i =s,t , are calculated only one time. The main cost from Bs and Bt is O(2(N + 1)C 2 Q) per iteration, as there are N + 1 samples. In terms of the backward pass, the main cost is from the update of br+1 , which is O(2(N + 1)CQ) per iteration in the gradient descent. Since all other updates are very simple, their costs are omitted here. Thus, the overall computational cost is O T outer [2(N + 1)CQ • (C + T inner )] , with T outer being the iteration of the overall algorithm and T inner indicating the number of gradient steps in updating br+1 . As shown in Section D, the proposed method TA-LBF always converges very fast in our experiments, thus T outer is not very large. As demonstrated in Section E.3, T inner is set to 5 in our experiments. In short, the proposed method can be optimized very efficiently. Besides, we also compare the computational complexity of different attacks empirically. Specifically, we compare the running time of attacking one image of different methods against the 8-bit quantized ResNet on CIFAR-10 and ImageNet dataset. As shown in Table 3 , TBT is the most timeconsuming method among all attacks. Although the proposed TA-LBF is not superior to T-BFA, FSA, and GDA in running time, this gap can be tolerated when attacking a single image in the deployment stage. Besides, our method performs better in terms of PA-ACC, ASR, and N flip as demonstrated in our experiments.

D NUMERICAL CONVERGENCE ANALYSIS

We present the numerical convergence of TA-LBF in exceed the maximum number of iterations (i.e., 2000) . The numerical results demonstrate the fast convergence of our method in practice.

E EVALUATION SETUP E.1 BASELINE METHODS

Since GDA (Liu et al., 2017a) and FSA (Zhao et al., 2019) are originally designed for attacking the full-precision network, we adapt these two methods to attack the quantized network by applying quantization-aware training (Jacob et al., 2018) . We adopt the 0 -norm for FSA (Liu et al., 2017a) and modification compression for GDA (Zhao et al., 2019) to reduce the number of the modified parameters. Among three types of T-BFA (Rakin et al., 2020b) , we compare to the most comparable method: the 1-to-1 stealthy attack scheme. The purpose of this attack scheme is to misclassify samples of a single source class into the target class while maintaining the prediction accuracy of other samples. Besides, we take the fine-tuning (FT) of the last fully-connected layer as a basic attack and present its results. We perform attack once for each selected image except TBT (Rakin et al., 2020a) and totally 1,000 attacks on each dataset. The attack objective of TBT is that the attacked DNN model misclassifies all inputs with a trigger to a certain target class. Due to such objective, the number of attacks for TBT is equal to the number of target classes (i.e., 10 attacks on CIFAR-10 and 50 attacks on ImageNet).

E.2 TARGET MODELS

According to the setting in (Rakin et al., 2020a; b) , we adopt two popular network architectures: ResNet (He et al., 2016) and VGG (Simonyan & Zisserman, 2015) for evaluation. On CIFAR-10, we perform experiments on ResNet-20 and VGG-16. On ImageNet, we use the pre-trained ResNet-18 * and VGG-16 † network. We quantize all networks to the 4-bit and 8-bit quantization level using the layer-wise uniform weight quantization scheme, which is similar to the one involved in the Tensor-RT solution (Migacz, 2017) .

E.3 PARAMETER SETTINGS OF TA-LBF

For each attack, we adopt a strategy for jointly searching λ and k. Specifically, for an initially given k, we search λ from a relatively large initial value and divide it by 2 if the attack does not succeed. The maximum search times of λ for a fixed k is set to 8. If it exceeds the maximum search times, Table 4 : Results of all attack methods against models trained with piece-wise clustering on CIFAR-10 (bold: the best; underline: the second best). We adopt different clustering coefficients, including 0.0005, 0.005, and 0.01. The mean and standard deviation of PA-ACC and N flip are calculated by attacking the 1,000 images. Our method is denoted as TA-LBF. ∆N flip denotes the increased N flip compared to the corresponding result in Table 1 we double k and search λ from the relatively large initial value. The maximum search times of k is set to 4. On CIFAR-10, the initial k and λ are set to 5 and 100. On ImageNet, λ is initialized as 10 4 ; k is initialized as 5 and 50 for ResNet and VGG, respectively. On CIFAR-10, the δ in L 1 is set to 10. On ImageNet, the δ is set to 3 and increased to 10 if the attack fails. u 1 and u 2 are initialized as b and u 3 is initialized as 0. z 1 and z 2 are initialized as 0 and z 3 is initialized as 0. b is initialized as b. During each iteration, the number of gradient steps for updating b is 5 and the step size is set to 0.01 on both datasets. Hyper-parameters (ρ 1 , ρ 2 , ρ 3 ) (see Eq. ( 11)) are initialized as (10 -4 , 10 -4 , 10 -5 ) on both datasets, and increase by ρ i ← ρ i × 1.01, i = 1, 2, 3 after each iteration. The maximum values of (ρ 1 , ρ 2 , ρ 3 ) are set to (50, 50, 5) 2 . Our method achieves a 100% ASR with the fewest N flip under the three clustering coefficients. Although TBT obtains a smaller ∆N flip than our method, it fails to achieve a satisfactory ASR. For example, TBT achieves only a 10.1% ASR when the clustering coefficient is set to 0.01. We observe that for all clustering coefficients, piece-wise clustering reduces the original accuracy. Such a phenomenon is more significant as the clustering coefficient increases. The results also show that there is no guarantee that if the clustering coefficient is larger (e.g., 0.01), the model is more robust, which is consistent with the finding in (He et al., 2020) . 

G DISCUSSIONS

G.1 COMPARING BACKDOOR, ADVERSARIAL, AND WEIGHT ATTACK An attacker can achieve malicious purposes utilizing backdoor, adversarial, and weight attacks. In this section, we emphasize the differences among them. Backdoor attack happens in the training stage and requires that the attacker can tamper the training data even the training process (Liu et al., 2020b; Li et al., 2020) . Through poisoning some training samples with a trigger, the attacker can control the behavior of the attacked DNN in the inference stage. For example, images with reflections are misclassified into a target class, while benign images are classified normally (Liu et al., 2020a) . However, such an attack paradigm causes the accuracy degradation on benign samples, which makes it detectable for users. Besides, these methods also require to modify samples in the inference stage, which is sometimes impossible for the attacker. Many defense methods against backdoor attack have been proposed, such as the preprocessingbased defense (Liu et al., 2017b) , the model reconstruction-based defense (Liu et al., 2018a) , and the trigger synthesis-based defense (Wang et al., 2019) . Adversarial attack modifies samples in the inference stage by adding small perturbations that remain imperceptible to the human vision system (Akhtar & Mian, 2018) . Since adversarial attack only modifies inputs while keeping the model unchanged, it has no effect on the benign samples. Besides the basic white-box attack, the black-box attack (Wu et al., 2020b; Chen et al., 2020) and universal attack (Zhang et al., 2020b; a) have attracted wide attention. Inspired by its success in the classification, it also has been extended to other tasks, including image captioning (Xu et al., 2019) , retrieval (Bai et al., 2020; Feng et al., 2020) , etc.. Similarly, recent studies have demonstrated many defense methods against adversarial attack, including the preprocessing-based defense (Xie et al., 2018) , the detection-based defense (Xu et al., 2017) , and the adversarial learning-based defense (Carmon et al., 2019; Wu et al., 2020c) . Weight attack modifies model parameters in the deployment stage, which is the studied paradigm in this work. Weight attack generally aims at misleading the DNN model on the selected sample(s), while having a minor effect on other samples (Zhao et al., 2019; Rakin et al., 2020b) . Many studies (Yao et al., 2020; Breier et al., 2018; Pan, 2020) have demonstrated that the DNN parameters can be modified in the bit-level in memory using fault injection techniques (Agoyan et al., 2010; Kim et al., 2014; Selmke et al., 2015) in practice. Note that the defense methods against weight attack have been not well studied. Although some defense methods (He et al., 2020) were proposed, they cannot achieve satisfactory performance. For example, our method can still achieve a 100% attack success rate against two proposed defense methods. Our work would encourage further investigation on the security of the model parameters from both attack and defense sides.

G.2 COMPARING TA-LBF WITH OTHER WEIGHT ATTACKS

We compare our TA-LBF with other weight attack methods, including TBT (Rakin et al., 2020a) , T-BFA (Rakin et al., 2020b) , GDA (Liu et al., 2017a) , and FSA (Zhao et al., 2019) in this section. TBT tampers both the test sample and the model parameters. Specifically, it first locates critical bits and generates a trigger, and then flips these bits to classify all inputs embedded with the trigger to a target class. However, the malicious samples are easily detected by human inspection or many detection methods (Tran et al., 2018; Du et al., 2020) . We do not modify the samples to perform TA-LBF, which makes the attack more stealthy. Rakin et al. (2020b) proposed T-BFA which misclassifies all samples (N-to-1 version) or samples from a source class (1-to-1 version) into a target class. Our method aims at misclassifying a specific sample, which meets the attacker's requirement in some scenarios. For example, the attacker wants to manipulate the behavior of a face recognition engine on a specific input. Since it affects multiple samples, T-BFA maybe not stealthy enough in attacking real-world applications. GDA (Liu et al., 2017a) and FSA (Zhao et al., 2019) modify model parameters at the weight-level rather than bit-level. They are designed for misclassifying multiple samples from arbitrary classes, which makes it infeasible for them to only modify the parameters connected to the source and target class. They modify more parameters than our method as shown in the experiments, it might be due to the reason discussed above. Besides, TBT, T-BFA, and GDA determine the critical weights to modify using heuristic strategies, while our TA-LBF adopts optimization-based methods. Although FSA applies ADMM for solving the optimization problem, it has no explicit constraint to control the number of modified parameters, which makes it intends to modify more parameters than GDA and our TA-LBF. In this section, we investigate the trade-off between three adopted evaluation metrics (i.e., PA-ACC, ASR, and N flip ) for our attack. All experiments are conducted on CIFAR-10 and ImageNet dataset in attacking the 8-bit quantized ResNet.

H TRADE-OFF BETWEEN THREE EVALUATION METRICS

We firstly discuss the trade-off between PA-ACC and N flip by fixing the ASR as 100% using the search strategy in Appendix E.3 and adjusting the initial λ and k to obtain different attack results. The two curves on the left show that increasing the N flip can improve the PA-ACC when N flip is relatively small; the PA-ACC decreases with the increase of N flip when N flip is greater than a threshold. This phenomenon demonstrates that constraining the number of bit-flips is essential to ensure the attack stealthiness, as mentioned in Section 3.2. To study the trade-off between PA-ACC



† Downloaded from https://download.pytorch.org/models/vgg16_bn-6c64b313.pth



Figure 2: Results of TA-LBF with different parameters λ, k, and the number of auxiliary samples N on CIFAR-10. Regions in shadow indicate the standard deviation of attacking the 1,000 images.

Figure 3: Visualization of decision boundaries of the original model and the post attack models. The attacked sample from Class 3 is misclassified into the Class 1 by FSA, GDA, and our method.

Figure 4: Numerical convergence analysis of TA-LBF w.r.t. the attacked sample on CIFAR-10 and ImageNet, respectively. We present the values of || bu 1 || 2 2 , || bu 2 || 2 2 and L 1 + λL 2 at different iterations in attacking 8-bit quantized ResNet. Note that λ in the left figure is 100 and λ in the right figure is 10 4 .

Figure 5: Curves of the trade-off between PA-ACC and N flip and the trade-off between PA-ACC and ASR for the proposed TA-LBF on two datasets.

indicates a slack variable, which will be specified in later experiments. The first term of L 1 aims at increasing the logit of the target class, while the second term is to decrease the logit of the source class. The loss L 1 is 0 only when the output on target class is more than m + δ and the output on source class is less than m -δ. That is, the prediction on x of the target model is the predefined target class. Note that Bs , Bt ∈ {0, 1} C×Q are two variables we want to optimize, corresponding to the weights of the fully-connected layer w.r.t. class s and t, respectively, in the target DNN model. B ∈ {0, 1} K×C×Q denotes the weights of the fully-connected layer of the original DNN model, and it is a constant tensor in L 1 . For clarity, hereafter we simplify L 1 (x; Θ, B, Bs , Bt ) as L 1 ( Bs , Bt ), since x and Θ are also provided input and weights.

{1,...,K}\{s,t} denotes {B 1 , B 2 , ..., B K }\{B s , B t }, and f j (x i ; Θ, B {1,...,K}\{s,t} , Bs , Bt )

Results of all attack methods against the models with defense on CIFAR-10 and ImageNet (bold: the best; underline: the second best). The mean and standard deviation of PA-ACC and N flip are calculated by attacking the 1,000 images. Our method is denoted as TA-LBF. ∆N flip denotes the increased N flip compared to the corresponding result in Table1.

Running time (seconds) of attacking one image for different methods. The mean and standard deviation are calculated by 10 attacks.

on both datasets. Besides the maximum number of iterations(i.e., 2000), we also set another stopping criterion, i.e., || bu 1 || 2 2 ≤ 10 -4 and || bu 2 || 2 2 ≤ 10 -4 . We set the maximum search times of k to 5 for clustering coefficient 0.005 and 0.01 and keep the rest settings the same as those in Section 4.1. The results are presented in Table4. As shown in the table, all values of N flip are larger than attacking models without defense for all methods, which is similar to Table

Results of all attack methods against models with a larger capacity on CIFAR-10. We adopt 3× and 4× width networks. The mean and standard deviation of PA-ACC and N flip are calculated by attacking the 1,000 images. ∆N flip denotes the increased N flip compared to the corresponding result in Table1.Besides the results of networks with a 2× width shown in Section 4.3, we also evaluate all methods against models with a 3× and 4× width. All settings are the same as those used in Section 4.1. The results are provided in Table5. Among all attack methods, our method is least affected by increasing the network width. Especially for the network with a 4× width, our ∆N flip is only 2.80. The results demonstrate the superiority of the formulated BIP problem and optimization. Moreover, compared with piece-wise clustering, having a larger model capacity can improve the original accuracy, but increases the model size and the computation complexity.

ACKNOWLEDGMENTS

This work is supported in part by the National Key Research and Development Program of China under Grant 2018YFB1800204, the National Natural Science Foundation of China under Grant 61771273, the R&D Program of Shenzhen under Grant JCYJ20180508152204044. Baoyuan Wu is supported by the Natural Science Foundation of China under grant No. 62076213, and the university development fund of the Chinese University of Hong Kong, Shenzhen under grant No. 01001810.

availability

//github.

annex

and ASR, we fix the parameter k as 10 for approximately 10 bit-flips and adjust the parameter λ to obtain different PA-ACC and ASR results. The trade-off curves between PA-ACC and ASR show that increasing ASR can decrease the PA-ACC significantly. Therefore, how to achieve high ASR and PA-ACC simultaneously is still an important open problem.

