PARETO ADVERSARIAL ROBUSTNESS: BALANCING SPATIAL ROBUSTNESS AND SENSITIVITY-BASED RO-BUSTNESS

Abstract

Adversarial robustness, mainly including sensitivity-based robustness and spatial robustness, plays an integral part in the robust generalization. In this paper, we endeavor to design strategies to achieve comprehensive adversarial robustness. To hit this target, firstly we investigate the less-studied spatial robustness and then integrate existing spatial robustness methods by incorporating both local and global spatial vulnerability into one spatial attack design. Based on this exploration, we further present a comprehensive relationship between natural accuracy, sensitivity-based and different spatial robustness, supported by the strong evidence from the perspective of representation. More importantly, in order to balance these mutual impact within different robustness into one unified framework, we incorporate the Pareto criterion into the adversarial robustness analysis, yielding a novel strategy towards comprehensive robustness called Pareto Adversarial Training. The resulting Pareto front, the set of optimal solutions, provides the set of optimal balance among natural accuracy and different adversarial robustness, shedding light on solutions towards comprehensive robustness in the future. To the best of our knowledge, we are the first to consider comprehensive robustness via the multi-objective optimization.

1. INTRODUCTION

Robust generalization can serve as an extension of tradition generalization, i.e., Empirical Risk Minimization in the case of i.i.d. data (Vapnik & Chervonenkis, 2015) , where the test environments might differ slightly or dramatically from the training environment (Krueger et al., 2020) . Improving the robustness of deep neural networks has been one of the crucial research topics, with various different threads of research, including adversarial robustness (Goodfellow et al., 2014; Szegedy et al., 2013) , non-adversarial robustness (Hendrycks & Dietterich, 2019; Yin et al., 2019) , Bayesian deep learning (Neal, 2012; Gal, 2016) and causality (Arjovsky et al., 2019) . In this paper, we focus on the adversarial robustness where adversarial examples are carefully manipulated by human to drastically fool the machine learning models, e.g., deep neural networks, posing a serious threat especially on safety-critical applications. Currently, adversarial training (Goodfellow et al., 2014; Madry et al., 2017; Ding et al., 2018 ) is regarded as one promising and widely accepted strategy to address this issue. However, similar to Out-of-Distribution (OoD) robustness, one crucial issue is that adversarial robustness also has many aspects (Hendrycks et al., 2020) , mainly including sensitivity-based robustness (Tramèr et al., 2020) , i.e. robustness against pixel-wise perturbations (normally within the constraints of an l p ball), and spatial robustness, i.e., robustness against multiple spatial transformations. Firstly, in the computer vision and graphics literature, there are two main factors that determine the appearance of a pictured object (Xiao et al., 2018; Szeliski, 2010) : (1) the lighting and materials, and (2) geometry. Most previous adversarial robustness focus on the (1) factor (Xiao et al., 2018) based on pixel-wise perturbations, e.g., Projected Gradient Descent (PGD) attacks, assuming the underlying geometry stays the same after the adversarial perturbation. The other rising research branch tackled with the second factor, such as Flow-based (Xiao et al., 2018) and Rotation-Translation (RT)-based attacks (Engstrom et al., 2017; 2019) . Secondly, by explicitly exploring the human perception, Sharif et al. ( 2018) pointed out that sensitivity-based robustness, i.e., l p -distance measured robustness, is not sufficient to adversarial robustness in order to maintain the perceptual similarity. This is owing to the fact that although spatial attacks or geometric transformations also result in small perceptual differences, they yield large l p distances. In order to head towards the comprehensive adversarial robustness, we find that the crucial issue to investigate the aforementioned whole part of adversarial robustness is the relationships among accuracy, sensitivity-based robustness and spatial robustness. Prior to our work, a clear trade-off between sensitivity-based robustness and accuracy has been revealed by a series of works (Zhang et al., 2019; Tsipras et al., 2018; Raghunathan et al., 2020) . Besides, recent work (Tramèr & Boneh, 2019; Kamath et al., 2020) exhibited that there seems to exist an obscure trade-off between Rotation-Translation and sensitivity-based robustness. However, this conclusion lacks considering Flowbased attacks (Xiao et al., 2018; Zhang & Wang, 2019) , another non-negligible part in the spatial robustness evaluation, making the previous conclusion less comprehensive or reliable. As such, the comprehensive relationships among all the quantities mentioned above are still unclear and remain to be further explored. More importantly, new robust strategy that can harmonize all the considered correlations is needed, in order to achieve optimal balance within the comprehensive robustness. In this paper, in order to design a new approach towards comprehensive robustness, we firstly explore the two main branches in the spatial robustness, i.e., Flow-based spatial attack (Xiao et al., 2018) and Rotation-Translation (RT) attack (Engstrom et al., 2019) . By investigating the different impacts of these two attacks on the spatial sensitivity, we propose an integrated differentiable spatial attack framework, considering both local and global spatial vulnerability. Based on that, we present a comprehensive relationship among accuracy, sensitivity-based robustness and two branches of spatial robustness. Especially we show that the trade-off between sensitivity-based and RT robustness is fundamental trade-off as opposed to the highly interwoven correlation between sensitivity-based and Flow-based spatial robustness. We further provide strong evidence based on their different saliency maps from the perspectives of shape-bias, sparse or dense representation. Lastly, to balance these different kinds of mutual impacts within a unified adversarial training framework, we introduce the Pareto criterion (Kim & De Weck, 2005; 2006; Zeleny, 2012) in the multi-objective optimization, thus developing an optimal balance between the interplay of natural accuracy and different adversarial robustness. By additionally incorporating the two-moment term capturing the interaction between losses of accuracy and different robustness, we finally propose a bi-level optimization framework called Pareto Adversarial Training. The resulting Pareto front provides the set of optimal solutions that balance perfectly all the considered relationships, outperforming other existing strategies. Our contributions are summarized as follows: • We propose an integrated spatial attack framework that incorporates both local and global spatial vulnerability based on Flow-based and RT attacks, paving the way towards the comprehensive spatial robustness analysis in the future. • We present comprehensive relationships within accuracy, sensitivity-based, different spatial robustness, supported by strong evidence from the perspective of representation. • We incorporate the Pareto criterion into adversarial robustness analysis, and are the first attempt to consider multiple adversarial robustness via the multi-objective optimization.

2. TOWARDS COMPREHENSIVE SPATIAL ROBUSTNESS 2.1 MOTIVATION

In order to better investigate the relationships between accuracy and different kinds of adversarial robustness, we need to firstly provide a fine-grained understanding of spatial robustness, which has been less studied as opposed to sensitivity-based robustness. We summarize two major branches among a flurry of related work about spatial robustness (Engstrom et al., 2017; 2019; Xiao et al., 2018; Zhang & Wang, 2019; Tramèr & Boneh, 2019; Kamath et al., 2020) : (1) Flow-based Attacks, and (2) Rotation-Translation (RT) Attacks. Specifically, we find that the former mainly focuses on the local spatial vulnerability while the latter tends to capture the global spatial sensitivity. Our motivation is to firstly shed light on the fundamental difference between these two approaches, and then propose an integrated spatial robustness evaluation metric.

