IMPROVED GRADIENT BASED ADVERSARIAL ATTACKS FOR QUANTIZED NETWORKS

Abstract

Neural network quantization has become increasingly popular due to efficient memory consumption and faster computation resulting from bitwise operations on the quantized networks. Even though they exhibit excellent generalization capabilities, their robustness properties are not well-understood. In this work, we systematically study the robustness of quantized networks against gradient based adversarial attacks and demonstrate that these quantized models suffer from gradient vanishing issues and show a fake sense of robustness. By attributing gradient vanishing to poor forward-backward signal propagation in the trained network, we introduce a simple temperature scaling approach to mitigate this issue while preserving the decision boundary. Despite being a simple modification to existing gradient based adversarial attacks, experiments on CIFAR-10/100 datasets with multiple network architectures demonstrate that our temperature scaled attacks obtain near-perfect success rate on quantized networks while outperforming original attacks on adversarially trained models as well as floating-point networks.

1. INTRODUCTION

Neural Network (NN) quantization has become increasingly popular due to reduced memory and time complexity enabling real-time applications and inference on resource-limited devices. Such quantized networks often exhibit excellent generalization capabilities despite having low capacity due to reduced precision for parameters and activations. However, their robustness properties are not wellunderstood. In particular, while parameter quantized networks are claimed to have better robustness against gradient based adversarial attacks (Galloway et al. (2018) ), activation only quantized methods are shown to be vulnerable (Lin et al. (2019) ). In this work, we consider the extreme case of Binary Neural Networks (BNNs) and systematically study the robustness properties of parameter quantized models, as well as both parameter and activation quantized models against gradient based adversarial attacks. Our analysis reveals that these quantized models suffer from gradient masking issues (Athalye et al. (2018) ) (especially vanishing gradients) and in turn show fake robustness. We attribute this vanishing gradients issue to poor forward-backward signal propagation caused by trained binary weights, and our idea is to improve signal propagation of the network without affecting the prediction of the classifier. There is a body of work on improving signal propagation in a neural network (e.g., Glorot & Bengio (2010) ; Pennington et al. (2017) ; Lu et al. (2020) ), however, we are facing a unique challenge of improving signal propagation while preserving the decision boundary, since our ultimate objective is to generate adversarial attacks. To this end, we first discuss the conditions to ensure informative gradients and then resort to a temperature scaling approach (Guo et al. (2017) ) (which scales the logits before applying softmax cross-entropy) to show that, even with a single positive scalar the vanishing gradients issue in BNNs can be alleviated achieving near perfect success rate in all tested cases. Specifically, we introduce two techniques to choose the temperature scale: 1) based on the singular values of the input-output Jacobian, 2) by maximizing the norm of the Hessian of the loss with respect to the input. The justification for the first case is that if the singular values of input-output Jacobian are concentrated around 1 (defined as dynamical isometry (Pennington et al. (2017) )) then the network is said to have good signal propagation and we intend to make the mean of singular values to be 1. On the other hand, the intuition for maximizing the Hessian norm is that if the Hessian norm is large, then the gradient of the loss with respect to the input is sensitive to an infinitesimal change in the input. This is a sufficient condition for the network to have good signal propagation as well as informative gradients under the assumption that the network does not have any randomized or non-differentiable components. We evaluated our improved gradient based adversarial attacks using BNNs with weight quantized (BNN-WQ) and weight and activation quantized (BNN-WAQ), floating point networks (REF), and adversarially trained models. We employ quantized and floating point networks trained on CIFAR-10/100 datasets using several architectures. In all tested BNNs, both versions of our temperature scaled attacks obtained near-perfect success rate outperforming gradient based attacks (FGSM (Goodfellow et al. (2014) ), PGD (Madry et al. (2017) )). Furthermore, this temperature scaling improved gradient based attacks even on adversarially trained models (both high-precision and quantized) as well as floating point networks, showing the significance of signal propagation for adversarial attacks.

2. PRELIMINARIES

We first provide some background on the neural network quantization and adversarial attacks.

2.1. NEURAL NETWORK QUANTIZATION

Neural Network (NN) quantization is defined as training networks with parameters constrained to a minimal, discrete set of quantization levels. This primarily relies on the hypothesis that since NNs are usually overparametrized, it is possible to obtain a quantized network with performance comparable to the floating point network. Given a dataset D = {x i , y i } n i=1 , NN quantization can be written as: min w∈Q m L(w; D) := 1 n n i=1 (w; (x i , y i )) . Here, (•) denotes the input-output mapping composed with a standard loss function (e.g., crossentropy loss), w is the m dimensional parameter vector, and Q is a predefined discrete set representing quantization levels (e.g., Q = {-1, 1} in the binary case). To this end, the algorithms differ in the choice of quantization set (e.g., keep it discrete (Courbariaux et al. (2015) ), relax it to the convex hull (Bai et al. (2019) ) or convert the problem into a lifted probability space (Ajanthan et al. (2019a))), the projection used, and how differentiation through projection is performed. In the case when the constraint set is relaxed, a gradually increasing annealing hyperparameter is used to enforce a quantized solution (Ajanthan et al. (2019a; b) ; Bai et al. (2019) ). We refer the interested reader to respective papers for more detail. In this paper, we use BNN-WQ obtained using MD-tanh-S (Ajanthan et al. (2019b) ) and BNN-WAQ obtained using Hubara et al. (2017) .

2.2. ADVERSARIAL ATTACKS

Adversarial examples consist of imperceptible perturbations to the data that alter the model's prediction with high confidence. Existing attacks can be categorized into white-box and black-box attacks where the difference lies in the knowledge of the adversaries. White-box attacks allow the adversaries access to the target model's architecture and parameters, whereas black-box attacks can only query the model. Since white-box gradient based attacks are popular, we summarize them below. First-order gradient based attacks can be compactly written as Projected Gradient Descent (PGD) on the negative of the loss function (Madry et al. (2017) ). Formally, let x 0 ∈ IR N be the input image, then at iteration t, the PGD update can be written as: x t+1 = P x t + η g t x , where P : IR N → X is a projection, X ⊂ IR N is the constraint set that bounds the perturbations, η > 0 is the step size, and g t x is a form of gradient of the loss with respect to the input x evaluated at x t . With this general form, the popular gradient based adversarial attacks can be specified: • Fast Gradient Sign Method (FGSM): This is a one step attack introduced in Goodfellow et al. (2014) . Here, P is the identity mapping, η is the maximum allowed perturbation magnitude, and g t x = sign (∇ x (w * ; (x t , y))), where denotes the loss function, w * is the trained weights and y is the ground truth label corresponding to the image x 0 . • PGD with L ∞ bound: Arguably the most popular adversarial attack introduced in Madry et al. (2017) and sometimes referred to as Iterative Fast Gradient Sign Method (IFGSM). Here, P is the L ∞ norm based projection, η is a chosen step size, and g t x = sign (∇ x (w * ; (x t , y))), the sign of gradient same as FGSM. • PGD with L 2 bound: This is also introduced in Madry et al. (2017) which performs the standard PGD in the Euclidean space. Here, P is the L 2 norm based projection, η is a chosen step size, and g t x = ∇ x (w * ; (x t , y)) is simply the gradient of the loss with respect to the input. These attacks have been further strengthened by a random initial step (Tramèr et al. (2017) ). In this paper, we perform this single random initialization for all experiments with FGSM/PGD attack unless otherwise mentioned.

3. ROBUSTNESS EVALUATION OF BINARY NEURAL NETWORKS

We start by evaluating the adversarial accuracy (i.e. accuracy on the perturbed data) of BNNs using the PGD attack with L ∞ bound. • PGD attack details: perturbation bound of 8 pixels (assuming each pixel in the image is in [0, 255]) with respect to L ∞ norm, step size η = 2 and the total number of iterations T = 20. The attack details are the same in all evaluated settings unless stated otherwise. We perform experiments on CIFAR-10 dataset using ResNet-18 and VGG-16 architectures and report the clean accuracy and PGD adversarial accuracy with 1 and 20 random restarts in Table 1 . It can be clearly and consistently observed that binary networks have high adversarial accuracy compared to the floating point counterparts. Even with 20 random restarts, BNNs clearly outperform floating point networks in terms of adversarial accuracy. Since this result is surprising, we investigate this phenomenon further to understand whether BNNs are actually robust to adversarial perturbations or they show a fake sense of security due to some form of obfuscated gradients (Athalye et al. (2018) ).

3.1. IDENTIFYING OBFUSCATED GRADIENTS

Recently, it has been shown that several defense mechanisms intentionally or unintentionally break gradient descent and cause obfuscated gradients and thus exhibit a false sense of security (Athalye et al. (2018) ). Several gradient based adversarial attacks tend to fail to produce adversarial perturbations in scenarios where the gradients are uninformative, referred to as gradient masking. Gradient masking can occur due to shattered gradients, stochastic gradients or exploding and vanishing gradients. We try to identify gradient masking in binary networks based on the empirical checks provided in Athalye et al. (2018) . If any of these checks fail, it indicates gradient masking issue in BNNs. To 

4. SIGNAL PROPAGATION OF NEURAL NETWORKS

We first describe how poor signal propagation in neural networks can cause vanishing or exploding gradients. Then we discuss the idea of introducing a single scalar to improve the existing gradient based attacks without affecting the prediction (i.e., decision boundary) of the trained models. We consider a neural network f w for an input x 0 , having logits a K = f w (x 0 ). Now, since softmax cross-entropy is usually used as the loss function, we can write: (a K , y) = -y T log(p) , p = softmax(a K ) , where y ∈ IR d is the one-hot encoded target label and log is applied elementwise. For various gradient based adversarial attacks discussed in Sec. 2.2, gradient of the loss is used with respect to the input x 0 , which can also be formulated using chain rule as, ∂ (a K , y) ∂x 0 = ∂ (a K , y) ∂a K ∂a K ∂x 0 = ψ(a K , y) J , where ψ denotes the error signal and J ∈ R d×N is the input-output Jacobian. Here we use the convention that ∂v/∂u is of the form v-size × u-size. Notice there are two components that influence the gradients, 1) the Jacobian J and 2) the error signal ψ. Gradient based attacks would fail if either the Jacobian is poorly conditioned or the error signal has saturating gradients, both of these will lead to vanishing gradients in ∂ /∂x 0 . The effects of Jacobian on the signal propagation is studied in dynamical isometry and mean-field theory literature (Pennington et al. (2017) ; Saxe et al. ( 2013)) and it is known that a network is said to satisfy dynamical isometry if the singular values of J are concentrated near 1. Under this condition, error signals ψ backpropagate isometrically through the network, approximately preserving its norm and all angles between error vectors. Thus, as dynamical isometry improves the trainability of the floating point networks, a similar technique can be useful for gradient based attacks as well. In fact, almost all initialization techniques (e.g., Glorot & Bengio (2010) ) approximately ensures that the Jacobian J is well-conditioned for better trainability and it is hypothesized that approximate isometry is preserved even at the end of the training. But, for BNNs, the weights are constrained to be {-1, 1} and hence the weight distribution at end of training is completely different from the random initialization. Furthermore, it is not clear that fully-quantized networks can achieve well-conditioned Jacobian, which guided some research activity in utilizing layerwise scalars (either predefined or learned) to improve BNN training (McDonnell (2018) ; Rastegari et al. (2016) ). We would like to point out that the focus of this paper is to improve gradient based attacks on already trained BNNs. To this end learning a new scalar to improve signal propagation at each layer is not useful as it can alter the decision boundary of the network and thus cannot be used in practice on already trained model.

4.1. TEMPERATURE SCALING FOR BETTER SIGNAL PROPAGATION

In this paper, we propose to use a single scalar per network to improve the signal propagation of the network using temperature scaling. In fact, one could replace softmax with a monotonic function such that the prediction is not altered, however, we will show in our experiments that a single scalar with softmax has enough flexibility to improve signal propagation and yields almost 100% success rate with PGD attacks. Essentially, we can use a scalar, β > 0 without changing the decision boundary of the network by preserving the relative order of the logits. Precisely, we consider the following: p(β) = softmax(ā K ) , āK = β a K . Here, we write the softmax output probabilities p as a function of β to emphasize that they are softmax output of temperature scaled logits. Now since in this context, the only variable is the temperature scale β, we denote the loss and the error signal as functions of only β. With this simplified notation the gradient of the temperature scaled loss with respect to the inputs can be written as: ∂ (β) ∂x 0 = ∂ (β) ∂ā K ∂ā K ∂a K ∂a K ∂x 0 = ψ(β)β J . Note that β affects the input-output Jacobian linearly while it nonlinearly affects the error signal ψ. To this end, we hope to obtain a β that ensures the error signal is useful (i.e., not all zero) as well as the Jacobian is well-conditioned to allow the error signal to propagate to the input. We acknowledge that while one can find a β > 0 to obtain softmax output ranging from a uniform distribution (β = 0) to one-hot vectors (β → ∞), β only scales the Jacobian. Therefore, if the Jacobian J has zero singular values, our approach has no effect in those dimensions. However, since most of the modern networks consist of ReLU nonlinearities (generally positive homogeneous functions), the effect of a single scalar would be equivalent (ignoring the biases) to having layerwise scalars such as in McDonnell (2018) . Thus, we believe a single scalar is sufficient for our purpose. 

5. IMPROVED GRADIENTS FOR ADVERSARIAL ATTACKS

ψ(β) = ∂ (β) ∂p(β) ∂p(β) ∂ā K = -(y -p(β)) T . ( ) where y is the one-hot encoded target label, and p(β) is the softmax output of scaled logits. For adversarial attacks, we only consider the correctly classified images (i.e., argmax j y j = argmax j p j (β)) as there is no need to generate adversarial examples corresponding to misclassified samples. From the above formula, it is clear that when p(β) is one-hot encoding then the error signal is 0. This is one of the reason for vanishing gradient issue in BNNs. Even if this does not happen for a given image, one can increase β → ∞ to make this error signal 0. Similarly, when p(β) is the uniform distribution, the norm of the error signal is at the maximum. This can be obtained by setting β = 0. However, this would also make ∂ (β)/∂x 0 = 0 as the singular values of the input-output Jacobian would all be 0. How error signal is affected by β is illustrated in Fig. 2 . This analysis indicates that the optimal β cannot be obtained by simply maximizing the norm of the error signal and we need to balance both the Jacobian as well as the error signal. To summarize, the scalar β should be chosen such that the following properties are satisfied: 1. ψ(β) 2 > ρ for some ρ > 0. 2. The Jacobian β J is well-conditioned, i.e., the singular values of β J is concentrated around 1.

5.1. NETWORK JACOBIAN SCALING (NJS)

We now discuss a straightforward, two-step approach to attain the aforementioned properties. Firstly, to ensure βJ is well-conditioned, we simply choose β to be the inverse of the mean of singular values of J. This guarantees that the mean of singular values of βJ is 1. After this scaling, it is possible that the resulting error signal is very small. To ensure that ψ(β) 2 > ρ > 0, we ensure that the softmax output p k (β) corresponding to the ground truth class k is at least ρ away from 1. We now state it as a proposition to derive β given a lowerbound on 1 -p k (β). Proposition 1. Let a K ∈ IR d with d > 1 and a K 1 ≥ a K 2 ≥ . . . ≥ a K d and a K 1 -a K d = γ. For a given 0 < ρ < (d -1)/d, there exists a β > 0 such that 1 -softmax(βa K 1 ) > ρ, then β < -log(ρ/(d -1)(1 -ρ))/γ. Proof. This is derived via a simple algebraic manipulation of softmax. Please refer to Appendix. This β can be used together with the one computed using inverse of mean Jacobian Singular Values (JSV). We provide the pseudocode for our proposed PGD++ (NJS) attack in Appendix. Similar approach can also be applied for FGSM++. Notice that, this approach is simple and it adds negligible overhead to the standard PGD attacks. However, it has a hyperparameter ρ which is hand designed. To mitigate this, next we discuss a hyperparameter-free approach to obtain β.

5.2. HESSIAN NORM SCALING (HNS)

We now discuss another approach to obtain informative gradients. Our idea is to maximize the Frobenius norm of the Hessian of the loss with respect to the input, where the intuition is that if the Hessian norm is large, then the gradient ∂ /∂x 0 is sensitive to an infinitesimal change in x 0 . This means, the infinitesimal perturbation in the input is propagated in the forward pass to the last layer and propagated back to the input layer without attenuation (i.e., the returned signal is not zero), assuming there are no randomized or non-differentiable components in the network. This clearly indicates that the network has good signal propagation as well as the error signals are not all zero. This objective can now be written as: The derivation is provided in Appendix. Note, since J does not depend on β, J and ∂J/∂x 0 are computed only once, β is optimized using grid search as it involves only a single scalar. In fact, it is easy to see from the above equation that, when the Hessian is maximized, β cannot be zero. Similarly, ψ(β) cannot be zero because if it is zero, then the prediction p(β) is one-hot encoding (Eq. ( 7)), consequently ∂p(β)/∂ā K = 0 and this cannot be a maximum for the Hessian norm. Hence, this ensures that ψ(β * ) 2 > ρ for some ρ > 0 and β * is bounded according to Proposition 1. Therefore, the maximum is obtained for a finite value of β. Even though, it is not clear how exactly this approach would affect the singular values of the input-output Jacobian (β J), we know that they are finite and not zero. How Hessian norm is influenced by β is illustrated in Fig. 3 . 2019)) show that adversarial training makes the loss surface locally linear around the vicinity of training samples and enforcing local linearity constraint on loss curvature can achieve better robust to adversarial attacks. On the contrary, our idea of maximizing the Hessian, i.e., increasing the nonlinearity of , could make the network more prone to adversarial attacks and we intend to exploit that. The psuedocode for PGD++ attack with HNS is summarized in Appendix. β * = argmax β>0 ∂ 2 (β) ∂(x 0 ) 2 F = argmax β>0 β ψ(β) ∂J ∂x 0 + β ∂p(β) ∂ā K J T J F .

6. EXPERIMENTS

We (Brendel et al. (2019) ). Among our variants, even though they perform similarly in our experiments, Hessian based scaling (HNS) outperforms Jacobian based scaling (NJS) in majority of the cases and this difference is significant for one step FGSM attacks. This indicates that nonlinearity of the network indeed has some relationship to its adversarial robustness. We use state of the art models trained for binary quantization (where all layers are quantized) for our experimental evaluations. We provide adversarial attack parameters used for FGSM/PGD in Appendix and for other attacks, we use default parameters used in Foolbox (Rauber et al. (2017) ). For our HNS variant, we sweep β from a range such that the hessian norm is maximized for each image, as explained in Appendix. For our NJS variant, we set the value of ρ = 0.01. In fact, our attacks are not very sensitive to ρ and we provide the ablation study in the Appendix. The PyTorch (Paszke et al. (2017) ) implementation of our algorithm will be released upon publication.

6.1. RESULTS

We first compared the original PGD (L 2 /L ∞ ) and FGSM attack with both versions (NJS and 2020)) perform even worse than their original form on adversarially trained models. Similarly, for one step FGSM attack, our modified versions outperform original FGSM attacks by a significant margin consistently for both datasets on various network architectures. We would like to point out such an improvement in the above two attacks is considerably interesting, knowing the fact that FGSM, PGD with L ∞ attacks only use the sign of the gradients so improved performance indicates, our temperature scaling indeed makes some zero elements in the gradient nonzero. We would like to point out here that one can use several random restarts to increase the success rate of original form of FGSM/PGD attack further but to keep comparisons fair we use single random restart for both original and modified attacks. Nevertheless, as it has been observed in Table 1 even with 20 random restarts PGD adversarial accuracies for BNNs cannot reach zero, whereas our proposed PGD++ variants consistently achieve perfect success rate. ImageNet. For other large scale datasets such as ImageNet, BNNs are hard to train with full binarization of parameters and result in poor performance. Thus, most existing works (Yang et al. (2019) ) on BNNs keep the first and the last layers floating point and introduce several layerwise scalars to achieve good results on ImageNet. In such experimental setups, according to our experiments, trained BNNs do not exhibit gradient masking issues or poor signal propagation and thus are easier to attack using original FGSM/PGD attacks with complete success rate. In such experiments, our modified versions perform equally well compared to the original forms of these attacks. The adversarial accuracies of REF and BNN-WAQ trained on CIFAR-10 using ResNet-18/50, VGG-16 and DenseNet-121 for our variants against original counterparts are reported in To further demonstrate the efficacy, we first adversarially trained the BNN-WQs (quantized using BC (Courbariaux et al. (2015) ), GD-tanh/MD-tanh-S (Ajanthan et al. ( 2019b))) and floating point networks in a similar manner as in Madry et al. (2017) , using L ∞ bounded PGD with T = 7 iterations, η = 2 and = 8. We report the adversarial accuracies of L ∞ bounded attacks and our variants on CIFAR-10 using ResNet-18 in Table 4 . These results further strengthens the usefulness of our proposed PGD++ variants. Moreover, with a heuristic choice of β = 0.1 to scale down the logits before performing gradient based attacks performs even worse. Finally, even against stronger attacks (DeepFool (Moosavi-Dezfooli et al. ( 2016)), BBA (Brendel et al. (2019) )) under the same L ∞ perturbation bound, our variants outperform consistently on these adversarially trained models. We would like to point out that our variants have negligible computational overhead over the original gradient based attacks, whereas stronger attacks are much slower in practice requiring 100-1000 iterations with an adversarial starting point (instead of random initial perturbation). To illustrate the effectiveness of our proposed variants in improving signal propagation, we compare against gradient based attacks performed using recently proposed Difference of Logits Ratio (DLR) loss (Croce & Hein (2020) ) that aims to avoid the issue of saturating error signals. We show these experimental comparisons performed on ResNet-18 models trained on CIFAR-10 dataset in Table 5 . The attack parameters are same as used for the other experiments. It can be clearly observed that in almost all cases our proposed variants are much better than original form of gradient based attacks performed with DLR loss. The margin of difference is significant in case of FGSM attack and adversarial trained models. Infact, it is important to note that gradient based attacks with DLR loss perform worse on adversarially trained models than the original form of gradient based attacks.

7. RELATED WORK

Adversarial examples are first observed in Szegedy et al. (2014) and subsequently efficient gradient based attacks such as FGSM (Goodfellow et al. (2014) ) and PGD (Madry et 2020) later shows the presented defense method can be attacked with minor modification in the loss function. In short, although it has been hinted that there might be some sort of gradient masking in BNNs (especially in activation quantized networks), a thorough understanding is lacking on whether BNNs are robust, if not what is the reason for the inferior performance of most commonly used gradient based attacks on binary networks. We answer this question in this paper and introduce improved gradient based attacks.

8. CONCLUSION

In this work, we have shown that both BNN-WQ and BNN-WAQ tend to show a fake sense of robustness on gradient based attacks due to poor signal propagation. To tackle this issue, we introduced our two variants of PGD++ attack, namely NJS and HNS. Our proposed PGD++ variants not only possess near-complete success rate on binarized networks but also outperform standard L ∞ and L 2 bounded PGD attacks on floating point networks. We finally show improvement in attack success rate on adversarially trained REF and BNN-WQ against stronger attacks (DeepFool and BBA). In future, we intend to focus more on improving the robustness of the BNNs with provable robustness guarantees. 

Appendices

Here, we first provide the pseudocodes, proof of the proposition and the derivation of Hessian. Later we give additional experiments, analysis and the details of our experimental setting.

A PSEUDOCODE

We provide pseudocode for PGD++ with NJS in Algorithm 1 and PGD++ with HNS in Algorithm 2. Algorithm 1 PGD++ with NJS with L ∞ , T iterations, radius , step size η, network f w * , input x 0 , label k, one-hot y ∈ {0, 1} d , gradient threshold ρ. Require: T, , η, ρ, x 0 , y, k Ensure: x T +1 -x 0 ∞ ≤ 1: β 1 = (M d)/ M i=1 d j=1 µ j (J i ) β 1 computed using Network Jacobian. 2: x 1 = P ∞ (x 0 + Uniform(-1, 1)) Random Initialization with Projection 3: for t ← 1, . . . T do 4: β 2 = 1.0 5: p = softmax(β 1 (f w * (x t ))) 6: if 1 -p k ≤ ρ then ρ = 0.01 7: β 2 = -log(ρ/(d -1)(1 -ρ))/γ γ computed using Proposition 2 8: = -y T log(softmax(β 2 β 1 (f w * (x t )))) 9: x t+1 = P ∞ (x t + η sign(∇ x (x t )))

Update Step with Projection

Algorithm 2 PGD++ with HNS with L ∞ , T iterations, radius , step size η, network f w * , input x 0 , label k, one-hot y ∈ {0, 1} d , gradient threshold ρ. Require: T, , η, x 0 , y, k Ensure: x T +1 -x 0 ∞ ≤ 1: x 1 = P ∞ (x 0 + Uniform(-1, 1)) Random Initialization with Projection 2: β * = argmax β>0 ∂ 2 (β)/∂(x 0 ) 2 F Grid Search 3: for t ← 1, . . . T do 4: = -y T log(softmax(β * (f w * (x t )))) 5: x t+1 = P ∞ (x t + η sign(∇ x (x t ))) Update Step with Projection B DERIVATIONS B.1 DERIVING β GIVEN A LOWERBOUND ON 1 -p k (β) Proposition 2. Let a K ∈ IR d with d > 1 and a K 1 ≥ a K 2 ≥ . . . ≥ a K d and a K 1 -a K d = γ. For a given 0 < ρ < (d -1)/d, there exists a β > 0 such that 1 -softmax(βa K 1 ) > ρ, then β < -log(ρ/(d -1)(1 -ρ))/γ. Proof. Assuming a K 1 -a K d = γ , we derive a condition on β such that 1 -softmax(βa K 1 ) > ρ. 1 -softmax(βa K 1 ) > ρ , (9) softmax(βa K 1 ) < 1 -ρ , exp(βa K 1 )/ d λ=1 exp(βa K λ ) < 1 -ρ , 1/ 1 + d λ=2 exp(β(a K λ -a K 1 )) < 1 -ρ . Since, a K 1 -a K λ ≤ γ for all λ > 1, 1/ 1 + d λ=2 exp(β(a K λ -a K 1 )) ≤ 1/ 1 + d λ=2 exp(-βγ) . Therefore, to ensure 1/ 1 + d λ=2 exp(β(a K λ -a K 1 )) < 1 -ρ, we consider, 1/ 1 + d λ=2 exp(-βγ) < 1 -ρ , a K 1 -a K λ ≤ γ for all λ > 1 , ( ) 1/ 1 + (d -1) exp(-βγ) < 1 -ρ , exp(-βγ) > ρ/(d -1)(1 -ρ) , -βγ > log(ρ/(d -1)(1 -ρ)) , exp is monotone , β < -log(ρ/(d -1)(1 -ρ))/γ . Therefore for any β < -log(ρ/(d -1)(1 -ρ))/γ, the above inequality 1 -softmax(βa K 1 ) > ρ is satisfied.

B.2 DERIVATION OF HESSIAN

We now derive the Hessian of the input mentioned in Eq. ( 8) of the paper. The input gradients can be written as: ∂ (β) ∂x 0 = ∂ (β) ∂p(β) ∂p(β) ∂ā K (β) βJ = ψ(β)βJ . Now by product rule of differentiation, input hessian can be written as: ∂ 2 (β) ∂(x 0 ) 2 = β ψ(β) ∂J ∂x 0 + ∂ψ(β) ∂x 0 T J , ( ) = β ψ(β) ∂J ∂x 0 + ∂p(β) ∂x 0 T J , ψ(β) = -(y -p(β)) T , = β ψ(β) ∂J ∂x 0 + β ∂p(β) ∂ā K J T J .

C ADDITIONAL EXPERIMENTS

In this section we first provide more experimental details and then some ablation studies. (Brendel et al. (2019) ). Both our variants outperform stronger attacks. Note, DeepFool and BBA are much slower in practise requiring 100-1000 iterations. BBA specifically requires even an adversarial start point that needs to be computed using another adversarial attack.

C.1 EXPERIMENTAL DETAILS

BBA (Brendel et al. (2019) ) on BNN-WAQ trained on CIFAR-10 dataset in Table 9 . In this experiment, our proposed variants again outperform even the stronger attacks which take 100-1000 iterations with adversarial start point (instead of random initial perturbation). It should be noted that although BBA performs much better than DeepFool and PGD, it still has inferior success rate than ours considering the fact that it takes multiple hours to run BBA whereas our proposed variants are almost as efficient as PGD attack. Step Size Tuning for PGD attack. We would like to point out that step size η and temperature scale β have different effects in the attacks performed. Notice, PGD and FGSM attack under L ∞ bound only use the sign of input gradients in each gradient ascent step. Thus, if the input gradients are completely saturated (which is the case for BNNs), original forms of PGD or FGSM will not work irrespective of the step size used. To illustrate this, we performed extensive step size tuning for original form of PGD attack on different ResNet-18 models trained on CIFAR-10 dataset and the adversarial accuracies are reported in Fig. 4 . It can be observed clearly that although tuning the step size lowers adversarial accuracy a bit in some cases but still cannot reach zero for BNNs unlike our proposed variants. 2018)). We compare our variants namely NJS and HNS against heuristic choice of small β = 0.01 and original CLEVER Scores for BNN-WQ and BNN-WAQ (trained on CIFAR-10 using ResNet-18) in Table 10 . It can be clearly seen that our proposed variants improve the robustness bounds computed using CLEVER whereas a heuristic choice of β = 0.01 performs even worse.

C.4 STABILITY OF PGD++ WITH NJS WITH VARIATIONS IN ρ

We perform ablation studies with varying ρ for PGD++ with NJS in Table 11 for CIFAR-10 dataset using ResNet-18 architecture. It clearly illustrates that our NJS variant is quite robust to the choice of ρ as we are able to achieve near perfect success rate with PGD++ with different values of ρ. As long as value of ρ is large enough to avoid one-hot encoding on softmax outputs (in turn avoid ψ(β) to be zero) of correctly classified sample, our approach with NJS variant is quite stable.

C.5 SIGNAL PROPAGATION AND INPUT GRADIENT ANALYSIS USING NJS AND HNS

We first provide an example illustration in Fig. 5 to better understand how the input gradient norm i.e., ∂ (β)/∂x 0 2 , and norm of sign of input gradient, i.e., sign(∂ (β)/∂x 0 ) 2 is influenced by β. It clearly shows that both the plots have a concave behavior where an optimal β can maximize the input gradient. Also, it can be quite evidently seen in Fig. 5 (b) that within an optimal range of β, gradient vanishing issue can be avoided. If β → 0 or β → ∞, it changes all the values in input gradient matrix to zero and inturn sign(∂ (β)/∂x 0 ) 2 = 0. We also provide the signal propagation properties as well as analysis on input gradient norm before and after using the β estimated based on NJS and HNS in Table 12 . For binarized networks as well floating point networks tested on CIFAR-10 dataset using ResNet-18 architecture, our HNS and NJS variants result in larger values for ψ 2 , ∂ (β)/∂x 0 2 and sign(∂ (β)/∂x 0 ) 2 . This reflects the efficacy of our method in overcoming the gradient vanishing issue. It can be also noted that our variants also improves the signal propagation of the networks by bringing the mean JSV values closer to 1.

C.6 ABLATION FOR ρ VS. PGD++ ACCURACY

In this subsection, we provide the analysis on the effect of bounding the gradients of the network output of ground truth class k, i.e. ∂ (β)/∂ā K k . Here, we compute β using Proposition 1 for all correctly classified images such that 1 -softmax(βa K k ) > ρ with different values of ρ and report the PGD++ adversarial accuracy in Table 13 . It can be observed that there is an optimum value of ρ at which PGD++ success rate is maximized, especially on the adversarially trained models. This can also be seen in connection with the non-linearity of the network where at an optimum value of Table 12 : Mean and standard deviation of Jacobian Singular Values (JSV), mean ψ 2 , mean ∂ /∂x 0 2 and mean sign(∂ /∂x 0 ) 2 for different methods on CIFAR-10 with ResNet-18 computed with 500 correctly classified samples. Note here for NJS and HNS, JSV is computed for scaled jacobian i.e. βJ. Also note that, values of ψ 2 , ∂ (β)/∂x 0 2 and sign(∂ (β)/∂x 0 ) 2 are larger for our NJS and HNS variant (for most of the networks) as compared with network with no β, which clearly indicates better gradients for performing gradient based attacks. 2019)) networks such as adversarially trained models, non-linearity can be maximized and better success rate for gradient based attacks can be achieved. Our HNS variant essentially tries to achieve the same objective while trying to estimate β for each example.



BNN-WQ have binary weights, but there is no non-differentiable or randomized component once trained. https://github.com/Trusted-AI/adversarial-robustness-toolbox



NN quantization approaches (Ajanthan et al. (2019a;b); Bai et al. (2019); Hubara et al. (2017)) convert the above problem into an unconstrained problem by introducing auxiliary variables and optimize via (stochastic) gradient descent.

Figure 1: Gradient marking checks in ResNet-18 on CIFAR-10 for PGD attack with L ∞ bound: (a) varying iterations, (b) varying radius, and (c) black-box attacks on ResNet-18 and VGG-16. While (a), (c) show signs of gradient masking, (b) does not. We attribute this discrepancy to the random initial step before PGD.

Figure 2: Error signal (ψ(β)) and Jacobian of softmax (∂p(β)/∂ā K ) vs. β for a random correctly classified logits.

Figure 3: Hessian norm vs. β on a random correctly classified image. The plot clearly shows a concave behaviour. s

, there are some recent works (Moosavi-Dezfooli et al. (2019); Qin et al. (

al. (2017)) are introduced. There exist recent stronger attacks such as Moosavi-Dezfooli et al. (2016); Carlini & Wagner (2017); Yao et al. (2019); Finlay et al. (2019); Brendel et al. (2019), however, compared to PGD, they are much slower to be used for adversarial training in practice. For a comprehensive survey related to adversarial attacks, we refer the reader toChakraborty et al. (2018). Some recent works focus on the adversarial robustness of BNNs(Bernhard et  al. (2019); Sen et al. (2020); Galloway et al. (2018); Khalil et al. (2019); Lin et al. (2019)), however, a strong consensus on the robustness properties of quantized networks is lacking. In particular, whileGalloway et al. (2018) claims parameter quantized networks are robust to gradient based attacks based on empirical evidence,(Lin et al. (2019)) shows activation quantized networks are vulnerable to such attacks and proposes a defense strategy assuming the parameters are floating-point.Differently, Khalil  et al. (2019)  proposes a combinatorial attack hinting that activation quantized networks would have obfuscated gradients issue.Sen et al. (2020) shows ensemble of mixed precision networks to be more robust than original floating point networks; howeverTramer et al. (

Figure 5: Plots to show how variation in β affects (a) norm of input gradient, i.e., ∂ (β)/∂x 0 2 , (b) norm of sign of input gradient, i.e., sign(∂ (β)/∂x 0 ) 2 on a random correctly classified image. Notice that, both input gradient and signed input gradient norm behave similarly, showing a concave behaviour. This plot is computed for BNN-WQ network on CIFAR-10, ResNet-18. (b) clearly illustrates how optimum β can avoid vanishing gradient issue since sign(∂ (β)/∂x 0 ) 2 will only be zero if input gradient matrix has only zeros.

β, even for robust (locally linear) (Moosavi-Dezfooli et al. (2019); Qin et al. (

Clean and adversarial accuracy (PGD attack with L ∞ bound) on the test set of CIFAR-10 using ResNet-18 and VGG-16. In brackets, we mention number of random restarts used to perform the attack. Note, BNNs outperform adversarial accuracy of floating point networks consistently.

illustrate this, we analyse the effects of varying different hyperparameters of PGD attack on BNNs

evaluate robustness accuracies of BNNs with weight quantized (BNN-WQ), weight and activation quantized (BNN-WAQ) floating point networks (REF), and adversarially trained networks. We evaluate our two PGD++ variants corresponding to Hessian Norm Scaling (HNS) and Network Jacobian Scaling (NJS) on CIFAR-10 and CIFAR-100 datasets with multiple network architectures. Briefly, our results indicate that both of our proposed attack variants yield attack success rate much higher than original PGD attacks not only on L ∞ bounded attack but also on L 2 bounded attacks on both floating point networks and binarized networks. Our proposed PGD++ variants also reduce PGD Adversarial accuracy on the test set for BNN-WQ. Both our NJS and HNS variants consistently outperform original L ∞ bounded FGSM and PGD attack, and L 2 bounded PGD attack.

Adversarial accuracy on the test set of CIFAR-10 for REF and BNN-WAQ. Both our NJS and

Adversarial accuracy on the test set of CIFAR-10 with ResNet-18 for adversarially trained REF and BNN-WQ using different quantization methods (BC, GD-tanh, MD-tanh-S). Our improved attacks are compared against FGSM, L ∞ bounded PGD, a heuristic choice of β = 0.1, DeepFool and BBA. Albeit on adversarially trained networks, our methods outperform all the comparable methods.

Adversarial accuracy for REF, BNN-WQ, and BNN-WAQ trained on CIFAR-10 using ResNet-18. Here

Overall, for both REF and BNN-WAQ, our variants outperform the original counterparts consistently. Particularly interesting, PGD++ variants improve the attack success rate on REF networks. This effectively expands the applicability of our PGD++ variants and encourages to consider signal propagation of any trained network to improve gradient based attacks.

Zhewei Yao, Amir Gholami, Qi Lei, Kurt Keutzer, and Michael W Mahoney. Hessian-based analysis of large batch training and robustness to adversaries. In Advances in Neural Information Processing Systems, pp. 4949-4959, 2018. Zhewei Yao, Amir Gholami, Peng Xu, Kurt Keutzer, and Michael W Mahoney. Trust region based adversarial attack on neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11350-11359, 2019.

Adversarial accuracy on the test set of CIFAR-100 for REF (floating point networks). Both our NJS and HNS variants consistently outperform original FGSM and PGD (L ∞ /L 2 bounded) attacks.

Adversarial accuracy on the test set of CIFAR-10 for BNN-WAQ. Here, we compare our proposed variants against much stronger attacks namely DeepFool (Moosavi-Dezfooli et al. (2016)) and BBA

CLEVERScores (Weng et al. (2018)) for BNN-WQ and BNN-WAQ trained on CIFAR-10 using ResNet-18. We compare CLEVER Scores returned for L 1 norm perturbation using different ways of temperature scaling applied. Here, Original refers to original network without temperature scaling and Heuristic denotes temperature scale with small β = 0.01.

Adversarial accuracy on the test set for binary neural networks using L ∞ bounded PGD++ attack using NJS with varying ρ. For different values of ρ, our approach is quite stable.of batches to 50, batch size to 10, radius to 5, and chose L 1 norm as hyperparameters (based on the Weng et al. (

annex

We first mention the hyperparameters used to perform FGSM and PGD attack for all the experiments in the paper in Table 6 . To make a fair comparison, we keep the attack parameters same for our proposed variants of FGSM++ and PGD++ attacks. For PGD++ with HNS variant, we maximize Frobenius norm of Hessian with respect to the input as specified in Eq. ( 8) of the paper by grid search for the optimum β. We would like to point out that since only ψ(β) and p(β) terms are dependent on β, we do not need to do forward and backward pass of the network multiple times during the grid search. This significantly reduces the computational overhead during the grid search. We can simply use the same network outputs a K and network jacobian J (as computed without using β) for the grid search, while computing the other terms at each iteration of grid search. We apply grid search to find the optimum beta between 100 equally spaced intervals of β starting from β 1 to β 2 . Here, β 1 and β 2 are computed based on Proposition 1 in the paper where ρ = 1e -72 and ρ = 1 -(1/d) -(1e -2) respectively, where d is number of classes and γ = a K 1 -a K 2 so that 1 -softmax(βa K 1 ) < ρ. Also, note that we estimate the optimum β for each test sample only at the start of the first iteration of an iterative attack and then use the same β for the next iterations.Computational Overhead of NJS and HNS. Our Jacobian calculation takes just a single backward pass through the network and thus adds a negligible overhead. Our NJS approach for scaling estimates β as inverse of mean JSV using 100 random test samples, which is similar to 100 backward passes. For HNS, in Eq. ( 8) Jacobian J can be computed in single backward pass. Moreover, for piecewise linear networks (eg, relu activations), ∂J/∂x 0 = 0 almost everywhere (Yao et al. (2018) ). Thus PGD++ with NJS and HNS is almost as efficient as PGD.

C.2 COMPARISONS AGAINST AUTO-PGD ATTACK AND GRADIENT FREE ATTACK

We also compared our proposed PGD++ variants against recently proposed Auto-PGD (APGD) with Difference of Logits Ratio (DLR) loss (Croce & Hein (2020) ) and gradient free Square Attack (Andriushchenko et al. ( 2020)) on different networks trained using ResNet-18 and VGG-16 on CIFAR-10 dataset and the results are reported in Table 7 . The attack parameters for this experiment are the same as reported in the paper. It can be clearly seen that our proposed variants perform much better than both APGD with DLR loss and Square Attack, consistently achieving 0% adversarial accuracy. Infact, much computationally expensive Square attack is unable to achieve 0% adversarial accuracy in any of the cases under the enforced L ∞ bound.

C.3 OTHER EXPERIMENTS

We provide adversarial accuracy comparisons for different attack methods on CIFAR-100 using in Table 8 . Again similar to the results in the paper, our proposed PGD++ and FGSM++ outperform original form of PGD and FGSM consistently in all the experiments on floating point networks. We also provide adversarial accuracy comparison of our proposed variants against stronger attacks namely DeepFool (Moosavi-Dezfooli et al. (2016) 

