SQUEEZE TRAINING FOR ADVERSARIAL ROBUSTNESS

Abstract

The vulnerability of deep neural networks (DNNs) to adversarial examples has attracted great attention in the machine learning community. The problem is related to non-flatness and non-smoothness of normally obtained loss landscapes. Training augmented with adversarial examples (a.k.a., adversarial training) is considered as an effective remedy. In this paper, we highlight that some collaborative examples, nearly perceptually indistinguishable from both adversarial and benign examples yet show extremely lower prediction loss, can be utilized to enhance adversarial training. A novel method is therefore proposed to achieve new state-of-the-arts in adversarial robustness.

1. INTRODUCTION

Adversarial examples (Szegedy et al., 2013; Biggio et al., 2013) In this paper, to gain a deeper understanding of DNNs, robust or not, we examine the valley of their loss landscapes and explore the existence of collaborative examples in the ϵ-bounded neighborhood of benign examples, which demonstrate extremely lower prediction loss in comparison to that of their neighbors. Somewhat unsurprisingly, the existence of such examples can be related to the adversarial robustness of DNNs. In particular, if given a model that was trained to be adversarially more robust, then it is less likely to discover a collaborative example. Moreover, incorporating such collaborative examples into model training seemingly also improves the obtained adversarial robustness. On this point, we advocate squeeze training (ST), in which adversarial examples and collaborative examples of each benign example are jointly and equally optimized in a novel procedure, such that their maximum possible prediction discrepancy is constrained. Extensive experimental results verify the effectiveness of our method. We demonstrate that ST outperforms state-of-the-arts remarkably on several benchmark datasets, achieving an absolute robust accuracy gain of >+1.00% without utilizing additional data on CIFAR-10. It can also be readily combined with a variety of recent efforts, e.g., RST (Carmon et al., 2019) and RWP (Wu et al., 2020b) , to further improve the performance.

2. BACKGROUND AND RELATED WORK

2.1 ADVERSARIAL EXAMPLES Let x i and y i denote a benign example (e.g., a natural image) and its label from S = {(x i , y i )} n i=1 , where x i ∈ X and y i ∈ Y = {0, . . . , C -1}. We use B ϵ [x i ] = {x ′ | ∥x ′ -x i ∥ ∞ ≤ ϵ} to represent the ϵ-bounded l ∞ neighborhood of x i . A DNN parameterized by Θ can be defined as a function f Θ (•) : X → R C . Without ambiguity, we will drop the subscript Θ in f Θ (•) and write it as f (•). In general, adversarial examples are almost perceptually indistinguishable to benign examples, yet they lead to arbitrary predictions on the victim models. One typical formulation of generating an * Work was done under co-supervision of Yiwen Guo and Wangmeng Zuo who are in correspondence. 1



crafted by adding imperceptible perturbations to benign examples are capable of fooling DNNs to make incorrect predictions. The existence of such adversarial examples has raised security concerns and attracted great attention. Much endeavour has been devoted to improve the adversarial robustness of DNNs. As one of the most effective methods, adversarial training (Madry et al., 2018) introduces powerful and adaptive adversarial examples during model training and encourages the model to classify them correctly.

availability

https://github.com/

