TOWARDS ADVERSARIAL ROBUSTNESS OF BAYESIAN NEURAL NETWORK THROUGH HIERARCHICAL VARI-ATIONAL INFERENCE

Abstract

Recent works have applied Bayesian Neural Network (BNN) to adversarial training, and shown the improvement of adversarial robustness via the BNN's strength of stochastic gradient defense. However, we have found that in general, the BNN loses its stochasticity after its training with the BNN's posterior. As a result, the lack of the stochasticity leads to weak regularization effect to the BNN, which increases KL divergence in ELBO from variational inference. In this paper, we propose an enhanced Bayesian regularizer through hierarchical variational inference in order to boost adversarial robustness against gradient-based attack. Furthermore, we also prove that the proposed method allows the BNN's stochasticity to be elevated with the reduced KL divergence. Exhaustive experiment results demonstrate the effectiveness of the proposed method by showing the improvement of adversarial robustness, compared with adversarial training (Madry et al., 2018) and adversarial-BNN (Liu et al., 2019) under PGD attack and EOT-PGD attack to the L ∞ perturbation on CIFAR-10/100, STL-10, and Tiny-ImageNet.

1. INTRODUCTION

Deep neural networks have achieved impressive performance in a wide variety of machine learning tasks. Despite the breakthrough outcomes, deep neural networks are easily deceived from adversarial attack with the carefully crafted perturbations (Szegedy et al., 2014; Goodfellow et al., 2015; Chen et al., 2017; Carlini & Wagner, 2017; Papernot et al., 2017; Eykholt et al., 2018; Madry et al., 2018) . Injecting these perturbations into clean inputs (i.e., adversarial examples), which are imperceptible to the human eyes, fools the estimators in the deep neural networks. Weak reliability due to the invisible perturbations has affected security problems in deep learning applications (Apruzzese et al., 2019; Wang et al., 2019b; Sagduyu et al., 2019; Rosenberg et al., 2020) . To defend such adversarial examples, many algorithms have been studied to improve adversarial robustness so far. Adversarial training, where deep neural networks are trained on adversarial examples, is one of the few defense strategies against strong adversarial attacks ( Huang et al., 2015; Zantedeschi et al., 2017; Kurakin et al., 2017; Madry et al., 2018; Athalye et al., 2018a; Liu et al., 2018) . Among them, Madry et al. ( 2018) has shown that adversarially trained networks can be robust to white-box attacks with the knowledge of the network parameters. Besides, most of the above studies have agreed with that adversarial training shows an effective adversarial robustness against several white-box attacks. Meanwhile, adversarial training and BNN have been combined to improve adversarial robustness with stochastic approach through variational inference. In fact, the variational inference maximizes Evidence Lower Bound (ELBO) to find an approximate posterior closely following the true posterior for the machine learning tasks (Graves, 2011; Kingma & Welling, 2014; Blundell et al., 2015; Hernández-Lobato & Adams, 2015) . Based on the variational inference, adversarial training with BNN has accomplished the achievement of adversarial robustness by implicitly using the approximate posterior against the adversarial perturbations (Mescheder et al., 2017; Ye & Zhu, 2018) . They have focused on training the network parameters or spaces itself on the maximum ELBO without obtaining the approximate posterior directly.

