PARETO ADVERSARIAL ROBUSTNESS: BALANCING SPATIAL ROBUSTNESS AND SENSITIVITY-BASED RO-BUSTNESS

Abstract

Adversarial robustness, mainly including sensitivity-based robustness and spatial robustness, plays an integral part in the robust generalization. In this paper, we endeavor to design strategies to achieve comprehensive adversarial robustness. To hit this target, firstly we investigate the less-studied spatial robustness and then integrate existing spatial robustness methods by incorporating both local and global spatial vulnerability into one spatial attack design. Based on this exploration, we further present a comprehensive relationship between natural accuracy, sensitivity-based and different spatial robustness, supported by the strong evidence from the perspective of representation. More importantly, in order to balance these mutual impact within different robustness into one unified framework, we incorporate the Pareto criterion into the adversarial robustness analysis, yielding a novel strategy towards comprehensive robustness called Pareto Adversarial Training. The resulting Pareto front, the set of optimal solutions, provides the set of optimal balance among natural accuracy and different adversarial robustness, shedding light on solutions towards comprehensive robustness in the future. To the best of our knowledge, we are the first to consider comprehensive robustness via the multi-objective optimization.

1. INTRODUCTION

Robust generalization can serve as an extension of tradition generalization, i.e., Empirical Risk Minimization in the case of i.i.d. data (Vapnik & Chervonenkis, 2015) , where the test environments might differ slightly or dramatically from the training environment (Krueger et al., 2020) . Improving the robustness of deep neural networks has been one of the crucial research topics, with various different threads of research, including adversarial robustness (Goodfellow et al., 2014; Szegedy et al., 2013) , non-adversarial robustness (Hendrycks & Dietterich, 2019; Yin et al., 2019) , Bayesian deep learning (Neal, 2012; Gal, 2016) and causality (Arjovsky et al., 2019) . In this paper, we focus on the adversarial robustness where adversarial examples are carefully manipulated by human to drastically fool the machine learning models, e.g., deep neural networks, posing a serious threat especially on safety-critical applications. Currently, adversarial training (Goodfellow et al., 2014; Madry et al., 2017; Ding et al., 2018 ) is regarded as one promising and widely accepted strategy to address this issue. However, similar to Out-of-Distribution (OoD) robustness, one crucial issue is that adversarial robustness also has many aspects (Hendrycks et al., 2020) , mainly including sensitivity-based robustness (Tramèr et al., 2020) , i.e. robustness against pixel-wise perturbations (normally within the constraints of an l p ball), and spatial robustness, i.e., robustness against multiple spatial transformations. Firstly, in the computer vision and graphics literature, there are two main factors that determine the appearance of a pictured object (Xiao et al., 2018; Szeliski, 2010) : (1) the lighting and materials, and (2) geometry. Most previous adversarial robustness focus on the (1) factor (Xiao et al., 2018) based on pixel-wise perturbations, e.g., Projected Gradient Descent (PGD) attacks, assuming the underlying geometry stays the same after the adversarial perturbation. The other rising research branch tackled with the second factor, such as Flow-based (Xiao et al., 2018) and Rotation-Translation (RT)-based attacks (Engstrom et al., 2017; 2019) . Secondly, by explicitly exploring the human perception, Sharif et al. (2018) pointed out that sensitivity-based robustness, i.e., l p -distance 1

