TOWARDS ADVERSARIAL ROBUSTNESS OF BAYESIAN NEURAL NETWORK THROUGH HIERARCHICAL VARI-ATIONAL INFERENCE

Abstract

Recent works have applied Bayesian Neural Network (BNN) to adversarial training, and shown the improvement of adversarial robustness via the BNN's strength of stochastic gradient defense. However, we have found that in general, the BNN loses its stochasticity after its training with the BNN's posterior. As a result, the lack of the stochasticity leads to weak regularization effect to the BNN, which increases KL divergence in ELBO from variational inference. In this paper, we propose an enhanced Bayesian regularizer through hierarchical variational inference in order to boost adversarial robustness against gradient-based attack. Furthermore, we also prove that the proposed method allows the BNN's stochasticity to be elevated with the reduced KL divergence. Exhaustive experiment results demonstrate the effectiveness of the proposed method by showing the improvement of adversarial robustness, compared with adversarial training (Madry et al., 2018) and adversarial-BNN (Liu et al., 2019) under PGD attack and EOT-PGD attack to the L ∞ perturbation on CIFAR-10/100, STL-10, and Tiny-ImageNet.

1. INTRODUCTION

Deep neural networks have achieved impressive performance in a wide variety of machine learning tasks. Despite the breakthrough outcomes, deep neural networks are easily deceived from adversarial attack with the carefully crafted perturbations (Szegedy et al., 2014; Goodfellow et al., 2015; Chen et al., 2017; Carlini & Wagner, 2017; Papernot et al., 2017; Eykholt et al., 2018; Madry et al., 2018) . Injecting these perturbations into clean inputs (i.e., adversarial examples), which are imperceptible to the human eyes, fools the estimators in the deep neural networks. Weak reliability due to the invisible perturbations has affected security problems in deep learning applications (Apruzzese et al., 2019; Wang et al., 2019b; Sagduyu et al., 2019; Rosenberg et al., 2020) . To defend such adversarial examples, many algorithms have been studied to improve adversarial robustness so far. Adversarial training, where deep neural networks are trained on adversarial examples, is one of the few defense strategies against strong adversarial attacks ( Huang et al., 2015; Zantedeschi et al., 2017; Kurakin et al., 2017; Madry et al., 2018; Athalye et al., 2018a; Liu et al., 2018) . Among them, Madry et al. (2018) has shown that adversarially trained networks can be robust to white-box attacks with the knowledge of the network parameters. Besides, most of the above studies have agreed with that adversarial training shows an effective adversarial robustness against several white-box attacks. Meanwhile, adversarial training and BNN have been combined to improve adversarial robustness with stochastic approach through variational inference. In fact, the variational inference maximizes Evidence Lower Bound (ELBO) to find an approximate posterior closely following the true posterior for the machine learning tasks (Graves, 2011; Kingma & Welling, 2014; Blundell et al., 2015; Hernández-Lobato & Adams, 2015) . Based on the variational inference, adversarial training with BNN has accomplished the achievement of adversarial robustness by implicitly using the approximate posterior against the adversarial perturbations (Mescheder et al., 2017; Ye & Zhu, 2018) . They have focused on training the network parameters or spaces itself on the maximum ELBO without obtaining the approximate posterior directly. Contrary to the above studies, Liu et al. ( 2019) presents an adversarial training with BNN, called "adversarial-BNN" to deal with the approximate posterior explicitly. They straightforwardly learn Gaussian parameters (e.g., mean and variance) of the approximate posterior as follows: w ∼ N (µ, σ 2 ), instead of the weight parameters. Alternatively, the weight parameters are sampled by the approximate posterior, such that w = µ + σ , where the stochastic sampler is gererated from ∼ N (0, 1). The stochastic sampler provides the change of the weight parameters with the learned Gaussian parameters. The variation of them creates stochastic gradient, helping the improvement of adversarial robustness (Carbone et al., 2020) . However, we find that in general, the approximate posterior's variance converges to zero-like small value as follows: w ∼ N (µ, σ 2 ≈ 0) after training the BNN with its posterior. Although the stochastic sampler helps the weight parameters to change, they become fixed-like parameters, such that w = µ + σ(≈ 0) . The lack of their stochasticity causes the BNN's stochasticity to be vanished so that the BNN cannot easily respond to slightly different inputs within the same class. In other words, the vanished stochasticity breaks the regularization effect in the BNN, which increases KL divergence in the ELBO. The broken BNN regularizer produces an ill-posed posterior, thus resulting in weak adversarial robustness. This is because the broken regularizer hinders the maximum ELBO from approximating the true posterior against the adversarial perturbations. Therefore, an enhanced BNN regularizer is required to better approximate the true posterior for adversarial robustness. In this paper, we present the enhanced Bayesian regularizer through hierarchical variational inference in order to boost adversarial robustness compared to the BNN regularizer from variational inference. Furthermore, we also prove that the proposed method significantly intensifies the BNN's stochasticity by introducing a closed form approximation of conjugate prior for the true posterior. In the end, we validate the effectiveness of the proposed method by showing the improvement of adversarial robustness, compared with adversarial training (Madry et al., 2018) and adversarial-BNN (Liu et al., 2019) under PGD attack as well as EOT-PGD attack to the L ∞ perturbation on CIFAR-10/100, STL-10, and Tiny-ImageNet. Our contributions of this paper can be summarized into two-fold as follows. • We newly design an enhanced Bayesian regularizer through hierarchical variational inference built with a concept of the conjugate prior, and verify that the proposed method further strengthens the BNN's stochasticity, compared to the BNN regularizer based on variational inference. • We conduct exhaustive experiments to validate the effectiveness of the proposed method by adversarial robustness, and exhibit the outstanding performance compared with adversarial training and adversarial-BNN under both PGD attack and EOT-PGD attack on four benchmark datasets: CIFAR-10/100, STL-10, and Tiny-ImageNet.

2. PRELIMINARY

In this section, we specify the notations used in our paper at first and summarize the related works on adversarial attack/defense, and adversarial training with BNN. 



Let x denote the clean image from a given dataset, and y denote the class label corresponding to the clean image. Let D and D adv indicate each clean dataset and adversarial dataset, such that (x, y) ∼ D and (x adv , y) ∼ D adv . A deep neural network f parameterized by weight parameters w is denoted by f w (x). Adversarial examples are represented by x adv = x + δ, where δ denotes the adversarial perturbations. In order to align the experiments in the previous works, we use the cross-entropy loss J (f w (x), y) for image classification. Moreover, we regard δ as the L ∞ perturbation within γ-ball, such that δ ∞ ≤ γ. Here, • ∞ describes the L ∞ . 2.1 ADVERSARIAL ATTACK/DEFENSE Adversarial Attacks. The goal of adversarial attacks is generating the adversarial examples to deceive the prediction of the deep neural networks. Most of them produce the adversarial examples by the gradient of the loss function over the input. Goodfellow et al. (2015) introduces a single-step attack called Fast Gradient Sign Method (FGSM). Kurakin et al. (2017) proposes iterative-FGSM with multiple-step attack. Further, Carlini & Wagner (2017) presents C&W attack to overcome

