JUST AVOID ROBUST INACCURACY: BOOSTING RO-BUSTNESS WITHOUT SACRIFICING ACCURACY

Abstract

While current methods for training robust deep learning models optimize robust accuracy, they significantly reduce natural accuracy, hindering their adoption in practice. Further, the resulting models are often both robust and inaccurate on numerous samples, providing a false sense of safety for those. In this work, we extend prior works in three main directions. First, we explicitly train the models to jointly maximize robust accuracy and minimize robust inaccuracy. Second, since the resulting models are trained to be robust only if they are accurate, we leverage robustness as a principled abstain mechanism. Finally, this abstain mechanism allows us to combine models in a compositional architecture that significantly boosts overall robustness without sacrificing accuracy. We demonstrate the effectiveness of our approach for empirical and certified robustness on six recent state-of-theart models and four datasets. For example, on CIFAR-10 with ε ∞ = 1/255, we successfully enhanced the robust accuracy of a pre-trained model from 26.2% to 87.8% while even slightly increasing its natural accuracy from 97.8% to 98.0%.

1. INTRODUCTION

In recent years, there has been a significant amount of work that studies and improves adversarial (Carlini & Wagner, 2017; Croce & Hein, 2020b; Goodfellow et al., 2014; Madry et al., 2018; Szegedy et al., 2013) and certified robustness (Balunovic & Vechev, 2019; Cohen et al., 2019; Salman et al., 2019; Xu et al., 2020; Zhai et al., 2020; Zhang et al., 2019b) of neural networks. However, currently, there is a key limitation that hinders the wider adoption of robust models in practice. Robustness vs Accuracy Tradeoff Despite substantial progress in training robust models, existing robust training methods typically improve model robustness at the cost of decreased standard accuracy. To address this limitation, a number of recent works study this issue in detail and propose new methods to mitigate it (Mueller et al., 2020; Raghunathan et al., 2020; Stutz et al., 2019; Yang et al., 2020) .

Our Work

In this work, we advance the line of work that aims to boost robustness without sacrificing accuracy, but we approach the problem from a new perspective -by avoiding robust inaccuracy. Concretely, we propose a new training method that jointly maximizes robust accuracy while minimizing robust inaccuracy. We illustrate the effect of our training on a synthetic dataset (three classes sampled from Gaussian distributions) in Figure 1 , showing the decision boundaries of three models, trained using standard training L std , adversarial training L TRADES (Zhang et al., 2019a) , and our training L ERA (Equation 4). First, observe that while the L std trained model achieves 100% accuracy, only 91.1% of these samples are robust (and accurate). When using L TRADES , we can observe the Table 1 : Improvement of applying our approach to models trained to optimize natural accuracy only. Here, R acc rob denotes the robust accuracy and R nat denotes the standard (non-adversarial) accuracy. CIFAR-10 CIFAR-100 MTSD SBB Zhao et al. (2020), B ∞ 1/255 (WideResNet-28-10), B ∞ 2/255 (ResNet-50), B ∞ 2/255 (ResNet-50), B ∞ 2/255 R acc rob 26.2 +61.6% ------→ 87.8 3.1 +38.8% ------→ 41.9 40.7

+29.2%

------→ 69.9 44.7 +37.7% ------→ 82.4 Rnat 97.8

+0.2%

-----→ 98.0 80.17 4). Here, our L ERA achieves the same robust accuracy as L TRADES but avoids all robust inaccurate samples by making them non-robust. Note that all models predict over all three classes, however, the decision regions for class 2 of the L TRADES and L ERA trained models are too small to be visible. For more details, please refer to Appendix A.2. robustness vs accuracy tradeoff -the robust accuracy improves to 98.4% at the expense of 1.6% (robust) inaccuracy. In contrast, using L ERA , we retain the high robust accuracy of 98.4% but avoid all robust inaccurate samples by appropriately shifting the decision boundary, rendering them non-robust. Since our models are trained to be robust only if they are accurate, we leverage robustness as a principled abstain mechanism. This abstain mechanism then allows us to combine models in a compositional architecture that significantly boosts overall robustness without sacrificing accuracy. Concretely, in Figure 1 , we would define a selector model that abstains on all non-robust samples. Then, the abstained (non-robust) samples are evaluated by the standard trained model L std , while the selected samples are evaluated using the robust model L ERA . This allows us to achieve the best of both models -high robust accuracy (98.4%), high natural accuracy (100%), and no robust inaccuracy. We show the practical effectiveness of our approach by instantiating it over several datasets and existing robust models for both empirical and certified robustness. Table 1 summarizes the main results of our approach, showing that we significantly improve the robust accuracy R acc rob of standard trained non-compositional models, with minimal loss of standard accuracy R nat . In fact, in most of the cases, the compositional architecture even slightly improves the standard accuracy. We release our code at: https://anonymous.4open.science/r/robust-abstain-09DD.

2. RELATED WORK

There is a growing body of work that extends models with an abstain option. Existing approaches include selection mechanisms such as entropy selection (Mueller et al., 2020) , selection function (Cortes et al., 2016; Geifman & El-Yaniv, 2019; Mueller et al., 2020) , softmax response (Geifman & El-Yaniv, 2017; Stutz et al., 2020) , or explicit abstain class (Laidlaw & Feizi, 2019; Liu et al., 2019) . In our work, we explore an alternative selection mechanism that uses model robustness. The advantage of this formulation is that the selector provides strong guarantees for each sample and never produces false-positive selections. The disadvantage is that it introduces a significant runtime overhead, compared to many other methods that require only a single forward pass. Simultaneously, several recent works investigate the robustness and accuracy tradeoff both theoretically (Dobriban et al., 2020; Yang et al., 2020) and practically by proposing new methods to mitigate it. Stutz et al. (2019) considers a new method based on on-manifold adversarial examples, which are more aligned with the true data distribution than the ℓ p -norm noise models. Mueller et al. (2020) focuses on deterministic certification and proposes using compositional models to control the robustness and accuracy tradeoff. In our work, we also use compositional models, but we focus on



SERI), D) = 91.1% rob((F, SERI), D) = 100.0% F, SERI), D) = 100.0% rob((F, SERI), D) = 98.4%

Figure 1: Decision regions for models trained via standard training L std , adversarial training L TRADES (Zhang et al., 2019a), and our training L ERA (Equation4). Here, our L ERA achieves the same robust accuracy as L TRADES but avoids all robust inaccurate samples by making them non-robust. Note that all models predict over all three classes, however, the decision regions for class 2 of the L TRADES and L ERA trained models are too small to be visible. For more details, please refer to Appendix A.2.

Other recent works address adversarial examples through model calibration. Stutz et al. (2020) proposes biasing models towards low confidence predictions on adversarial examples, which allows rejecting them through a softmax response selector. An alternative approach is taken by Gal & Ghahramani (2018); Kingma et al. (2015); Molchanov et al. (2017), which train Bayesian neural networks to estimate prediction uncertainty by approximating the moments of the posterior predictive distribution, or by Sensoy et al. (2018), which estimates the posterior distribution using a deterministic neural network from data. Instead of calibrating model confidence, in our work, we calibrate model robustness, by optimizing the model towards non-robust predictions on misclassified examples.

