IMPROVED GRADIENT BASED ADVERSARIAL ATTACKS FOR QUANTIZED NETWORKS

Abstract

Neural network quantization has become increasingly popular due to efficient memory consumption and faster computation resulting from bitwise operations on the quantized networks. Even though they exhibit excellent generalization capabilities, their robustness properties are not well-understood. In this work, we systematically study the robustness of quantized networks against gradient based adversarial attacks and demonstrate that these quantized models suffer from gradient vanishing issues and show a fake sense of robustness. By attributing gradient vanishing to poor forward-backward signal propagation in the trained network, we introduce a simple temperature scaling approach to mitigate this issue while preserving the decision boundary. Despite being a simple modification to existing gradient based adversarial attacks, experiments on CIFAR-10/100 datasets with multiple network architectures demonstrate that our temperature scaled attacks obtain near-perfect success rate on quantized networks while outperforming original attacks on adversarially trained models as well as floating-point networks.

1. INTRODUCTION

Neural Network (NN) quantization has become increasingly popular due to reduced memory and time complexity enabling real-time applications and inference on resource-limited devices. Such quantized networks often exhibit excellent generalization capabilities despite having low capacity due to reduced precision for parameters and activations. However, their robustness properties are not wellunderstood. In particular, while parameter quantized networks are claimed to have better robustness against gradient based adversarial attacks (Galloway et al. (2018) ), activation only quantized methods are shown to be vulnerable (Lin et al. (2019) ). In this work, we consider the extreme case of Binary Neural Networks (BNNs) and systematically study the robustness properties of parameter quantized models, as well as both parameter and activation quantized models against gradient based adversarial attacks. Our analysis reveals that these quantized models suffer from gradient masking issues (Athalye et al. ( 2018)) (especially vanishing gradients) and in turn show fake robustness. We attribute this vanishing gradients issue to poor forward-backward signal propagation caused by trained binary weights, and our idea is to improve signal propagation of the network without affecting the prediction of the classifier. There is a body of work on improving signal propagation in a neural network (e.g., Glorot & Bengio (2010); Pennington et al. (2017); Lu et al. (2020) ), however, we are facing a unique challenge of improving signal propagation while preserving the decision boundary, since our ultimate objective is to generate adversarial attacks. To this end, we first discuss the conditions to ensure informative gradients and then resort to a temperature scaling approach (Guo et al. ( 2017)) (which scales the logits before applying softmax cross-entropy) to show that, even with a single positive scalar the vanishing gradients issue in BNNs can be alleviated achieving near perfect success rate in all tested cases. Specifically, we introduce two techniques to choose the temperature scale: 1) based on the singular values of the input-output Jacobian, 2) by maximizing the norm of the Hessian of the loss with respect to the input. The justification for the first case is that if the singular values of input-output Jacobian are concentrated around 1 (defined as dynamical isometry (Pennington et al. ( 2017))) then the network is said to have good signal propagation and we intend to make the mean of singular values to be 1. On the other hand, the intuition for maximizing the Hessian norm is that if the Hessian norm is large, then the gradient of the loss with respect to the input is sensitive to an infinitesimal change in the input. This is a sufficient condition for the network to have good signal propagation as well as informative gradients under the assumption that the network does not have any randomized or non-differentiable components. We evaluated our improved gradient based adversarial attacks using BNNs with weight quantized (BNN-WQ) and weight and activation quantized (BNN-WAQ), floating point networks (REF), and adversarially trained models. We employ quantized and floating point networks trained on CIFAR-10/100 datasets using several architectures. In all tested BNNs, both versions of our temperature scaled attacks obtained near-perfect success rate outperforming gradient based attacks (FGSM (Goodfellow et al. ( 2014)), PGD (Madry et al. ( 2017))). Furthermore, this temperature scaling improved gradient based attacks even on adversarially trained models (both high-precision and quantized) as well as floating point networks, showing the significance of signal propagation for adversarial attacks.

2. PRELIMINARIES

We first provide some background on the neural network quantization and adversarial attacks.

2.1. NEURAL NETWORK QUANTIZATION

Neural Network (NN) quantization is defined as training networks with parameters constrained to a minimal, discrete set of quantization levels. This primarily relies on the hypothesis that since NNs are usually overparametrized, it is possible to obtain a quantized network with performance comparable to the floating point network. Given a dataset D = {x i , y i } n i=1 , NN quantization can be written as: min w∈Q m L(w; D) := 1 n n i=1 (w; (x i , y i )) . Here, (•) denotes the input-output mapping composed with a standard loss function (e.g., crossentropy loss), w is the m dimensional parameter vector, and Q is a predefined discrete set representing quantization levels (e.g., Q = {-1, 1} in the binary case). 

2.2. ADVERSARIAL ATTACKS

Adversarial examples consist of imperceptible perturbations to the data that alter the model's prediction with high confidence. Existing attacks can be categorized into white-box and black-box attacks where the difference lies in the knowledge of the adversaries. White-box attacks allow the adversaries access to the target model's architecture and parameters, whereas black-box attacks can only query the model. Since white-box gradient based attacks are popular, we summarize them below. First-order gradient based attacks can be compactly written as Projected Gradient Descent (PGD) on the negative of the loss function (Madry et al. (2017) ). Formally, let x 0 ∈ IR N be the input image, then at iteration t, the PGD update can be written as: x t+1 = P x t + η g t x , where P : IR N → X is a projection, X ⊂ IR N is the constraint set that bounds the perturbations, η > 0 is the step size, and g t x is a form of gradient of the loss with respect to the input x evaluated at x t . With this general form, the popular gradient based adversarial attacks can be specified: • Fast Gradient Sign Method (FGSM): This is a one step attack introduced in Goodfellow et al. (2014). Here, P is the identity mapping, η is the maximum allowed perturbation magnitude, and



NN quantization approaches (Ajanthan et al. (2019a;b); Bai et al. (2019); Hubara et al. (2017)) convert the above problem into an unconstrained problem by introducing auxiliary variables and optimize via (stochastic) gradient descent. To this end, the algorithms differ in the choice of quantization set (e.g., keep it discrete (Courbariaux et al. (2015)), relax it to the convex hull (Bai et al. (2019)) or convert the problem into a lifted probability space (Ajanthan et al. (2019a))), the projection used, and how differentiation through projection is performed. In the case when the constraint set is relaxed, a gradually increasing annealing hyperparameter is used to enforce a quantized solution(Ajanthan et al. (2019a;b); Bai et al. (2019)). We refer the interested reader to respective papers for more detail. In this paper, we use BNN-WQ obtained using MD-tanh-S(Ajanthan  et al. (2019b)) and BNN-WAQ obtained usingHubara et al. (2017).

