JUST AVOID ROBUST INACCURACY: BOOSTING RO-BUSTNESS WITHOUT SACRIFICING ACCURACY

Abstract

While current methods for training robust deep learning models optimize robust accuracy, they significantly reduce natural accuracy, hindering their adoption in practice. Further, the resulting models are often both robust and inaccurate on numerous samples, providing a false sense of safety for those. In this work, we extend prior works in three main directions. First, we explicitly train the models to jointly maximize robust accuracy and minimize robust inaccuracy. Second, since the resulting models are trained to be robust only if they are accurate, we leverage robustness as a principled abstain mechanism. Finally, this abstain mechanism allows us to combine models in a compositional architecture that significantly boosts overall robustness without sacrificing accuracy. We demonstrate the effectiveness of our approach for empirical and certified robustness on six recent state-of-theart models and four datasets. For example, on CIFAR-10 with ε ∞ = 1/255, we successfully enhanced the robust accuracy of a pre-trained model from 26.2% to 87.8% while even slightly increasing its natural accuracy from 97.8% to 98.0%.

1. INTRODUCTION

In recent years, there has been a significant amount of work that studies and improves adversarial (Carlini & Wagner, 2017; Croce & Hein, 2020b; Goodfellow et al., 2014; Madry et al., 2018; Szegedy et al., 2013) and certified robustness (Balunovic & Vechev, 2019; Cohen et al., 2019; Salman et al., 2019; Xu et al., 2020; Zhai et al., 2020; Zhang et al., 2019b) of neural networks. However, currently, there is a key limitation that hinders the wider adoption of robust models in practice. Robustness vs Accuracy Tradeoff Despite substantial progress in training robust models, existing robust training methods typically improve model robustness at the cost of decreased standard accuracy. To address this limitation, a number of recent works study this issue in detail and propose new methods to mitigate it (Mueller et al., 2020; Raghunathan et al., 2020; Stutz et al., 2019; Yang et al., 2020) .

Our Work

In this work, we advance the line of work that aims to boost robustness without sacrificing accuracy, but we approach the problem from a new perspective -by avoiding robust inaccuracy. Concretely, we propose a new training method that jointly maximizes robust accuracy while minimizing robust inaccuracy. We illustrate the effect of our training on a synthetic dataset (three classes sampled from Gaussian distributions) in Figure 1 , showing the decision boundaries of three models, trained using standard training L std , adversarial training L TRADES (Zhang et al., 2019a) , and our training L ERA (Equation 4). First, observe that while the L std trained model achieves 100% accuracy, only 91.1% of these samples are robust (and accurate). When using L TRADES , we can observe the Table 1 : Improvement of applying our approach to models trained to optimize natural accuracy only. Here, R acc rob denotes the robust accuracy and R nat denotes the standard (non-adversarial) accuracy. (Zhang et al., 2019a) , and our training L ERA (Equation 4). Here, our L ERA achieves the same robust accuracy as L TRADES but avoids all robust inaccurate samples by making them non-robust. Note that all models predict over all three classes, however, the decision regions for class 2 of the L TRADES and L ERA trained models are too small to be visible. For more details, please refer to Appendix A.2. robustness vs accuracy tradeoff -the robust accuracy improves to 98.4% at the expense of 1.6% (robust) inaccuracy. In contrast, using L ERA , we retain the high robust accuracy of 98.4% but avoid all robust inaccurate samples by appropriately shifting the decision boundary, rendering them non-robust. Since our models are trained to be robust only if they are accurate, we leverage robustness as a principled abstain mechanism. This abstain mechanism then allows us to combine models in a compositional architecture that significantly boosts overall robustness without sacrificing accuracy. Concretely, in Figure 1 , we would define a selector model that abstains on all non-robust samples. Then, the abstained (non-robust) samples are evaluated by the standard trained model L std , while the selected samples are evaluated using the robust model L ERA . This allows us to achieve the best of both models -high robust accuracy (98.4%), high natural accuracy (100%), and no robust inaccuracy. We show the practical effectiveness of our approach by instantiating it over several datasets and existing robust models for both empirical and certified robustness. Table 1 summarizes the main results of our approach, showing that we significantly improve the robust accuracy R acc rob of standard trained non-compositional models, with minimal loss of standard accuracy R nat . In fact, in most of the cases, the compositional architecture even slightly improves the standard accuracy. We release our code at: https://anonymous.4open.science/r/robust-abstain-09DD.

2. RELATED WORK

There is a growing body of work that extends models with an abstain option. Existing approaches include selection mechanisms such as entropy selection (Mueller et al., 2020) , selection function (Cortes et al., 2016; Geifman & El-Yaniv, 2019; Mueller et al., 2020) , softmax response (Geifman & El-Yaniv, 2017; Stutz et al., 2020) , or explicit abstain class (Laidlaw & Feizi, 2019; Liu et al., 2019) . In our work, we explore an alternative selection mechanism that uses model robustness. The advantage of this formulation is that the selector provides strong guarantees for each sample and never produces false-positive selections. The disadvantage is that it introduces a significant runtime overhead, compared to many other methods that require only a single forward pass. Other recent works address adversarial examples through model calibration. Stutz et al. (2020) proposes biasing models towards low confidence predictions on adversarial examples, which allows rejecting them through a softmax response selector. An alternative approach is taken by Gal & Ghahramani (2018); Kingma et al. (2015) ; Molchanov et al. (2017) , which train Bayesian neural networks to estimate prediction uncertainty by approximating the moments of the posterior predictive distribution, or by Sensoy et al. (2018) , which estimates the posterior distribution using a deterministic neural network from data. Instead of calibrating model confidence, in our work, we calibrate model robustness, by optimizing the model towards non-robust predictions on misclassified examples. Simultaneously, several recent works investigate the robustness and accuracy tradeoff both theoretically (Dobriban et al., 2020; Yang et al., 2020) and practically by proposing new methods to mitigate it. Stutz et al. (2019) considers a new method based on on-manifold adversarial examples, which are more aligned with the true data distribution than the ℓ p -norm noise models. Mueller et al. (2020) focuses on deterministic certification and proposes using compositional models to control the robustness and accuracy tradeoff. In our work, we also use compositional models, but we focus on empirical and probabilistic certified robustness. Our selector formulation is based on a new training that minimizes robust inaccuracy and can be used to fine-tune any existing robust model. Further, we provide individual robustness at inference time, rather than distributional robustness considered in prior works. Finally, some recent works also consider learning on misclassified examples. For example, MMA (Ding et al., 2018) maximizes the margins of correctly classified examples while minimizing the classification loss on misclassified examples. MART (Wang et al., 2019) combines the standard adversarial risk with a consistency loss that optimizes misclassified examples towards robust predictions. Note, that this formulation actively encourages the model toward robust inaccurate predictions, while our work does the opposite -we minimize robust inaccuracy by penalizing robust misclassified examples.

3. PRELIMINARIES

Let f θ : R d → R k be a neural network classifying inputs x ∈ X ⊆ R d to outputs R k (e.g., logits or probabilities). The hard classifier induced by the network is given as F θ (x) = arg max i∈Y f θ (x) i , where f θ (x) i is the output for the i-th class and Y, |Y| = k is the finite set of discrete labels. Natural Accuracy Given a distribution over input-label pairs D and a classifier F θ : X → Y, an input-label pair (x, y) is considered accurate iff the classifier F θ predicts the correct label y for x: R nat (F θ ) = E (x,y)∼D 1{F θ (x) = y} Robust Accuracy Given an input-label pair (x, y), we say that the classifier F θ is robust and accurate iff it predicts the correct label y for all samples from a predefined region B p ε (x), such as a ℓ p -norm ball centered at x with radius ε, i.e., B p ε (x ) . . = {x ′ : ||x ′ -x|| p ≤ ε}. Formally: R acc rob (F θ ) = E (x,y)∼D 1{F θ (x) = y} ∧ 1{∀x ′ ∈ B p ε (x). F θ (x ′ ) = F θ (x)} Robust Inaccuracy Similarly to robust accuracy, an input-label pair (x, y) is considered robustly inaccurate iff the classifier F θ predicts an incorrect label F θ (x) ̸ = y and F θ is robust towards that misprediction for all inputs in B p ε (x). Formally, the robust inaccuracy is defined as: R ¬acc rob (F θ ) = E (x,y)∼D 1{F θ (x) ̸ = y} ∧ 1{∀x ′ ∈ B p ε (x). F θ (x ′ ) = F θ (x)} (2)

4. REDUCING ROBUST INACCURACY

In this section, we present our training method that extends existing robust training approaches by also considering samples that are robust but inaccurate. We start by describing a high-level problem statement which we then instantiate for both empirical robustness as well as certified robustness. Problem Statement Given a distribution over input-label pairs D, our goal is to find model parameters θ such that the resulting model maximizes robust accuracy, while at the same time minimizing robust inaccuracy. Concretely, this translates to the following optimization objective: arg min θ E (x,y)∼D β • L rob (x, y) optimize robust accuracy + 1{F θ (x) ̸ = y} • L ¬acc rob (x, y) penalize robust inaccuracy where β ∈ R + is a regularization term, 1{F θ (x) ̸ = y} is an indicator function denoting samples for which the model is inaccurate, and L rob (x, y) with L ¬acc rob (x, y) are loss functions that optimize robust accuracy and penalize robust inaccuracy, respectively. Here, the first loss function L rob (x, y) is standard and can be directly instantiated using existing approaches. The main challenge comes in defining the second loss term, as well as ensuring that the resulting formulation is easy to optimize, e.g., by defining a smooth approximation of the non-differentiable indicator function.

4.1. ADVERSARIAL TRAINING

We instantiate the loss function from Equation 3 when training empirically robust models as follows: L ERA = β • L TRADES (f θ ,(x, y)) + (1 -f θ (x) y ) min x ′ ∈B p ε (x) ℓ CE (f θ (x ′ ), arg max c∈Y\{F θ (x)} f θ (x ′ ) c ) (4) Following, we introduce each term in more detail and discuss the motivation behind our formulation. L rob To instantiate L rob , we can use any existing adversarial training method (Ding et al., 2018; Goodfellow et al., 2014; Wang et al., 2019; Zhang et al., 2019a) . For example, considering TRADES (Zhang et al., 2019a) , L rob is instantiated as: L TRADES . . = ℓ CE (f θ (x), y) + γ max x ′ ∈B p ε (x) D KL (f θ (x), f θ (x ′ )) where D KL is the Kullback-Leibler divergence (Kullback & Leibler, 1951) . 1{F θ (x) ̸ = y} Next, we consider the indicator function, which encourages learning on inaccurate samples. Since the indicator function is computationally intractable, we replace the hard qualifier by a soft qualifier 1 -f θ (x) y . The soft qualifier will be small for accurate and large for inaccurate samples, thus providing a smooth approximation of the original indicator function. L ¬acc rob Third, we define the loss that penalizes robust but inaccurate samples. This can be formulated similar to the adversarial training objective (Madry et al., 2018) , however, instead of optimizing the prediction of the adversarial example f θ (x ′ ) towards the correct label y, we optimize towards the most likely adversarial label arg max c∈Y\{F θ (x)} f θ (x ′ ) c . This leads to the following formulation: min x ′ ∈B p ε (x) ℓ CE (f θ (x ′ ), arg max c∈Y\{F θ (x)} f θ (x ′ ) c ) The purpose of the L ¬acc rob loss is to penalize robustness by making the model non-robust. As a result, it is sufficient to consider only a single non-robust example, thus the minimization (rather than maximization) in the loss objectivefoot_0 .

4.2. CERTIFIED TRAINING

Similarly to Section 4.1, we now instantiate the loss function from Equation 3 for probabilistic certified robustness via randomized smoothing (Cohen et al., 2019) . Randomized smoothing constructs a smoothed classifier G θ : X → Y from a base classifier F θ , where G θ (x) predicts the class which F θ is most likely to return when x is perturbed under isotropic Gaussian noise. Our proposed instantiation of Equation 3 for probabilistic certified robustness is as follows: L CRA (f θ , (x, y)) = β • L noise (f θ , (x, y)) + 1 k k j=1 1 -f θ (x + η j ) y CR(f θ , (x, y)) where η 1 , ..., η k are k i.i.d. samples from N (0, σ 2 I). Note that, since the robustness guarantees provided by randomized smoothing hold for the smoothed classifier G θ , the three loss components from Equation 3 need to be formulated with respect to the smoothed classifier G θ . L rob To instantiate L rob , we can use any existing certified training method for randomized smoothing, such as the methods defined by Cohen et al. (2019) or Zhai et al. (2020) . Concretely, when using Cohen et al. (2019) , the loss is defined using Gaussian noise augmentation: L noise . . = ℓ CE (f θ (x + η), y), η ∼ N (0, σ 2 I) 1{F θ (x) ̸ = y} We again replace the computationally intractable hard qualifier by a soft qualifier E δ∼N (0,σ 2 I) [1 -f θ (x + δ) y ], which encodes the misprediction probability of the smoothed classifier. In practice, we approximate expectations over Gaussians via Monte Carlo sampling, thus leading to the approximated soft inaccuracy qualifier 1 /k k j=1 1 -f θ (x + η j ) y . L ¬acc rob Finally, we instantiate the L ¬acc rob loss term, which encourages the model toward non-robust predictions on robust but inaccurate samples. We propose to minimize robustness by directly minimizing the certified radius of the smoothed classifier G θ . The certified radius formulation by Cohen et al. (2019) involves a sum of indicator functions, which is not differentiable. However, Zhai et al. (2020) have recently proposed the following differentiable certified radius formulation: CR(f θ , (x, y)) = σ 2 Φ -1 1 k k j=1 f θ (x + η j ; Γ) y -Φ -1 max y ′ ̸ =y 1 k k j=1 f θ (x + η j ; Γ) y ′ (9) where Φ -1 is the inverse of the standard Gaussian CDF, Γ is the inverse softmax temperature multiplied with the logits of f θ , and η 1:k are k i.i.d. samples from N (0, σ 2 I). Note that, by setting the loss term L ¬acc rob to CR(f θ , (x, y)), we directly penalize robustness of the smoothed classifier G θ .

5. ROBUST ABSTAIN MODELS

Next, we extend the models trained so far by leveraging robustness as a principled abstain mechanism. Abstain Model Given input space X ⊆ R d and label space Y, a model with an abstain option (El-Yaniv et al., 2010) is a pair of functions (F θ , S), where F θ : X → Y is a classifier and S : X → {0, 1} is a binary selector for F θ . Let S(x) = 0 indicate that the model abstains on input x ∈ X , while S(x) = 1 indicates that the model commits to the classifier F θ for input x and predicts F θ (x). Robustness Indicator Selector We instantiate abstain models with a robustness indicator selector, that abstains on all non-robust samples. For adversarial robustness, the selector is defined as: S ERI (x) = 1{∀x ′ ∈ B(x) : F θ (x ′ ) = F θ (x)} 10) For certified robustness, the selector is defined as: S CRI (x) = 1{∀x ′ ∈ B(x) : G θ (x ′ ) = G θ (x)} Robustness Guarantees: Robust Selection Similar to robust accuracy, the robustness of an abstain model needs to be evaluated with respect to a threat model. In our work, we consider the same threat model as for the underlying model F θ , namely B p ε (x) . . = {x ′ : ||x ′ -x|| p ≤ ε}, a ℓ p -norm ball centered at x with radius ε. Then, we define the robust selection of an abstain model as follows: R sel rob (S) = E (x,y)∼D 1{∀x ′ ∈ B p ε (x). S(x ′ ) = 1} That is, we say that a model robustly selects x if the selector S would select all valid perturbations x ′ ∈ B p ε (x). Combined with our definition of S ERI , we obtain the following criterion (cf. Appendix A.3): R sel rob (S ERI ) = E (x,y)∼D 1{∀x ′ ∈ B p 2•ε (x). F θ (x ′ ) = F θ (x)} In other words, to guarantee that the selector S ERI is robust for all x ′ ∈ B p ε (x), we in fact need to check robustness of the model F θ to double that region x ′ ∈ B p 2•ε (x). This is important in order to obtain the correct guarantees and is reflected in our evaluation in Section 7. Note that when evaluating robust selection for certified training, it is sufficient to show that the smoothed model G θ can be certified with a radius R ≥ ε. Then, the smoothed model guarantees that G θ (x ′ ) = c A for all x ′ ∈ B p ε (x), which is equivalent to our condition ∀x ′ ∈ B(x) : G θ (x ′ ) = G θ (x).

6. BOOSTING ROBUSTNESS WITHOUT ACCURACY LOSS

Consider an abstain model (F θ , S) and a dataset D. The selector S partitions D into two disjoint subsets -the abstained inputs D ¬s and the selected inputs D s for which F θ makes a prediction. For some tasks, making a best-effort prediction on all samples D s ∪ D ¬s may be desirable, which leads to compositional architectures, already used by prior works (Mueller et al., 2020; Wong et al., 2018) . Let H = ((F robust , S), F core ) be a 2-compositional architecture consisting of a selection mechanism S, a robustly trained model F robust , and a core model F core . Given an input x ∈ X , the selector S decides whether the model is confident on x and commits to the robust model F robust or whether the model should abstain and fall back to the core model F core . Formally: H(x) = S(x) • F robust (x) + (1 -S(x)) • F core (x) ) While F robust , F core can be chosen arbitrarily, we here combine robust trained models (which have lower natural accuracy), with models trained using standard training (which have high natural accuracy but low robustness). The performance of H then depends on the quality of the selector S. ) and robust inaccuracy (R ¬acc rob ) of existing robust models ( , , , ), and models fine-tuned with our loss ( , ). Our approach consistently reduces robust inaccuracy across various datasets, existing models and different regularization levels β.

7. EVALUATION

In this section, we present a thorough evaluation of our approach by instantiating it to four different datasets, six recent state-of-the-art models, for both adversarial and certified robustness, including top-trained models from RobustBench (Croce et al., 2020) . We show the following key results: • Fine-tuning models with our proposed loss successfully decreases robust inaccuracy and provides a Pareto front of models with different robustness tradeoffs. • Combining our proposed loss and robustness as an abstain mechanism leads to higher robust selection and accuracy compared to softmax response and selection network baselines. • Our 2-compositional models significantly improve robustness by up to +61% and slightly increase the natural accuracy by up to +0.2% (for B ∞ 1/255 and B ∞ 2/255 ). We perform all experiments on a single GeForce RTX 3090 GPU and use PyTorch (Paszke et al., 2019) for our implementation. The hyperparameters used for our experiments are provided in Appendix A.2. Models Our proposed training method requires neither retraining classifiers from scratch nor modifications to existing classifiers, thus our approach can be applied to fine-tune a wide range of existing modelsfoot_2 . To demonstrate this, we use the following robust pre-trained models: For empirical robustness, we evaluate existing models from Carmon et al. (2019) , Gowal et al. (2020 ), Rebuffi et al. (2021 ), and Zhang et al. (2019a) , which were all trained for ε ∞ = 8 /255, and all but the last model are top models in RobustBench (Croce et al., 2020) . In our evaluation, we fine-tune each model for 50 epochs for the considered threat model (ε ∞ ∈ { 1 /255, 2 /255, 4 /255}), using L TRADES (Zhang et al., 2019a) and L ERA (ours). Further, we also consider models by Ding et al. (2018) and Wang et al. (2019) as additional baselines. For certified robustness, we use a σ = 0.12 Gaussian noise augmentation trained model by Cohen et al. (2019) and a ε 2 = 0.5 adversarially trained model by Sehwag et al. (2021) . Similar to empirical robustness, we fine-tune the models for 50 epochs using L noise (Cohen et al., 2019) and L CRA (ours). Datasets We evaluate our approach on two academic datasets -CIFAR-10 and CIFAR-100 (Krizhevsky et al., 2009) , and two commercial datasets -Mapillary Traffic Sign Dataset (MTSD) (Ertler et al., 2020) and a Rail Defect Dataset kindly provided by Swiss Federal Railways (SBB). Consider Appendix A.1 for full details. When training on the CIFAR-10 and CIFAR-100 datasets, we use the AutoAugment (AA) policy by Cubuk et al. (2018) as the image augmentation. For the MTSD and SBB datasets, we use standard image augmentations (SA) consisting of random cropping, color jitter, and random translation and rotation. For completeness, our evaluation also includes models trained without any data augmentations.

Metrics

We use the natural accuracy, robust accuracy, and robust inaccuracy as our main evaluation metrics, as defined in Section 3, but evaluated on the corresponding test dataset. ) , and softmax response ( , ) abstain models. Higher R sel rob and R acc rob is better (top right corner is optimal). Observe that the Parent front of our approach ( , ) generally dominates the results of all baselines, significantly improving robust selection and robust accuracy. When evaluating empirical robustness, we use 40-step APGD CE (Croce & Hein, 2020b ) (referred to as APGD) to evaluate robustness of classifiers F θ . To evaluate certified robustness, we use the Monte Carlo algorithm for randomized smoothing from Cohen et al. (2019) . We certify 500 test samples and use the same randomized smoothing hyperparameters as Cohen et al. (2019) (cf. Appendix A.2).

7.1. REDUCING ROBUST INACCURACY

We first summarize the main results obtained by using our proposed loss functions L ERA and L CRA . Empirical Robustness The results in Figure 2 show the robust accuracy (R acc rob ) and robust inaccuracy (R ¬acc rob ) of different existing robust models fine-tuned via TRADES (Zhang et al., 2019a) with ( ) and without ( ) data augmentations, and the same models fine-tuned via our L ERA with ( ) and without ( ) data augmentations. Further, we show MART (Wang et al., 2019 ) ( ), and MMA (Ding et al., 2018) ( ) finetuned models as additional baselines. We can see that our approach consistently improves over existing models. For example, for CIFAR-10 and B ∞ 2/255 , the Carmon et al. (2019) model achieves 86.5% robust accuracy, but also 1.34% robust inaccuracy. In contrast, using L ERA , we obtain a number of models that reduce robust inaccuracy to 0.29%, while still achieving 83.8% robustness. Similar results are obtained for other models, perturbation regions, and datasets (cf. Appendix A.7). We observe that our approach achieves consistently lower robust inaccuracy compared to adversarial training. Further, by varying the regularization term β, we obtain a Pareto front of optimal solutions. Certified Robustness Similarly, we evaluate the robust accuracy (R acc rob ) and robust inaccuracy (R ¬acc rob ) for certifiably robust models fine-tuned using L noise and L CRA (ours). In Table 2a , we show results on CIFAR-10 for B 2 0.12 and B 2 0.25 . We observe that our approach achieves lower robust inaccuracy compared to existing models. For example, on CIFAR-10 and B 2 0.25 , the Cohen et al. ( 2019) model achieves 62% robust accuracy, but also 1% robust inaccuracy. In contrast, our approach reduces robust inaccuracy to 0.4% while still achieving 53.8% robust accuracy. For the Sehwag et al. (2021) model, our approach even improves both robust accuracy and robust inaccuracy. For B 2 0.25 , our approach improves the robust accuracy by +4.8% and reduces the robust inaccuracy by -0.6%.

7.2. USING ROBUSTNESS TO ABSTAIN

Next, we evaluate using robustness as an abstain mechanism (Section 5) and how it benefits from the training proposed in our work. We compare the following abstain mechanisms: Softmax Response (SR) (Geifman & El-Yaniv, 2017) , which abstains if the maximum softmax output of the model f θ is below a threshold τ for some input x ′ ∈ B p ε (x), that is: S SR (x) = 1{∀x ′ ∈ B p ε (x) : max c∈Y f θ (x ′ ) c ≥ τ } Similar to S RI , to guarantee robustness of S SR , we need to check the maximum softmax output of f θ on double the region B p 2•ε (x). To evaluate robustness of S SR , we use a modified version of APGD called APGDconf (Appendix A.5). For each considered model (e.g., Carmon et al. ( 2019)), we evaluate its corresponding abstain selector: ( , ) CARMON SR , GOWAL SR , etc. (all fine-tuned using TRADES). Robustness Indicator (RI) (our work), which abstains if the model F θ is non-robust: S RI (x) = 1{∀x ′ ∈ B p ε (x) : F θ (x ′ ) = F θ (x)} Note that, unlike other selectors, our robustness indicator is by design robust against an adversary using the same threat model. For each base model, we consider two instantiations ( , ) TRADES RI , and ( , ) ERA RI (Equation 4). Further, for CIFAR-10, we also instantiate models from Ding et al. ( Selection Network (SN), which trains a separate neural network s θ : X → R and selects if: S SN (x) = 1{s θ (x) ≥ τ } (15) When evaluating the robustness of an abstain model (F θ , S SN ), the robustness of both the classifier and the selection network have to be considered. We compare against two instantiations of this approach, both trained using certified training: ( ) ACE-COLT SN (Balunovic & Vechev, 2019; Mueller et al., 2020) , and ( ) ACE-IBP SN (Gowal et al., 2018; Mueller et al., 2020) . Empirical Robustness In Figure 3 , we compare different abstain approaches using two metricsrobust selection (R sel rob ), and the ratio of non-abstained samples that are robust and accurate (R acc rob ). We would like to maximize both, but typically there is a tradeoff between the two. This is evident in Figure 3 , where both our approach and softmax response produce a Pareto front of optimal solutions. Overall, the main results in Figure 3 show that, as designed, our approach consistently improves robust accuracy. For example, on CIFAR-10, B ∞ 1/255 and Carmon et al. ( 2019) model, we successfully improve robust accuracy by +1.18% at the cost of -3.78% decreased robust selection. This is close to optimal since increasing robust accuracy is typically achieved by correctly abstaining on misclassified samples. Interestingly, in some cases, we strictly improve over baseline models by increasing robust accuracy and robust selection. Compared to the other abstain methods, our approach generally improves both metrics while also providing much stronger guarantees. Concretely, our approach guarantees that selected samples are robust in the considered threat model. Softmax response only guarantees that all samples in the considered threat model have high confidence and is thus vulnerable to high confidence adversarial examples, and the selection network provides no guarantees with regards to the selector's robustness. Certified Robustness Applying our training for certified robustness L CRA with β = 1.0 consistently improves robust accuracy R acc rob of robustness indicator abstain models. In Table 2b , we show our results on CIFAR-10 for B 2 0.12 and B 2 0.25 . For instance, for the Cohen et al. ( 2019) model trained at σ = 0.12, we are able to improve the robust accuracy by +0.85% for B 2 0.25 , at the expense of -8.8% decrease in robust selection. For the Sehwag et al. (2021) model, our approach improves on both metrics. For B 2 0.25 , we increase robust accuracy by +0.82% and robust selection by +4.2%. ) for ERA RI ( , ), TRADES RI ( , ), MART RI ( ), MMA RI ( ), ACE-COLT SN , ACE-IBP SN ( , ), and TRADES SR ( , ) models. The core models used in the compositional architectures are listed in Appendix A.10. We can see that the Parent front of our method strictly improves over the prior work in the most important regionsignificantly improving model robustness while the model accuracy does not decrease.

7.3. BOOSTING ROBUSTNESS WITHOUT ACCURACY LOSS

Finally, we present the results of combining the abstain models trained so far with state-of-the-art models trained to achieve high accuracy. Note that, as discussed in Section 5, when evaluating adversarial robustness for B p ε , we in fact need to consider B p 2•ε robustness of the abstain model. A summary of the results is shown in Figure 4 . Similar to the results shown so far, the 2-compositional architectures that use models trained by our method ( , ) improve over existing methods that optimize robust accuracy ( , , , ), as well as over models using softmax response ( , ) or selection network ( , ) to abstain. For example, for CIFAR-10 with ε ∞ = 1 /255 and the Carmon et al. ( 2019) model, we improve natural accuracy by +0.58% and +0.62%, while decreasing the robustness only by -2.75% and -2.82%, when training with and without data augmentations respectively. More importantly, our approach significantly improves robustness of highly accurate noncompositional models, with minimal loss of accuracy, which we have summarized in Table 1 . We provide full results, including additional models and perturbation bounds in Appendix A.9, and an evaluation of the considered highly accurate non-compositional models in Appendix A.10.

8. CONCLUSION

In this work, we address the robustness vs accuracy tradeoff by avoiding robust inaccuracy and leveraging model robustness as a selection mechanism. We present a new training method that jointly minimizes robust inaccuracy and maximizes robust accuracy. The key concept was extending an existing robust training loss with a term that minimizes robust inaccuracy, making our method widely applicable since it can be instantiated using various existing robust training methods. We show the practical benefits of our approach by both, using robustness as an abstain mechanism, and by leveraging compositional architectures to improve robustness without sacrificing accuracy. However, there are also limitations and extensions to consider in future. First, while there are cases where our training improves robust accuracy and reduces robust inaccuracy, it does typically result in a trade-off between the two -reduced robust inaccuracy also leads to reduced robust accuracy. To address this issue, we compute a Pareto front of optimal solutions, all of which can be used to instantiate the compositional model. An interesting future work is exploring this trade-off further and develop new techniques to mitigate it. Second, given that we compute a Pareto front of optimal solutions, another extension is to consider model cascades that consist of different models along this Pareto front, and progressively fall back to models with higher robust accuracy but also higher robust inaccuracy. Third, we observed that the training becomes much harder as robust inaccuracy approaches zero (i.e., the best case). This is because these remaining robust inaccurate examples are the hardest to fix, and because there are only a few. In our work, we explored using data augmentation to address this issue, but more work is needed to make the training efficient in such a low data regime.

A APPENDIX

A.1 DATASETS We ran our evaluations on four different datasets, namely on CIFAR-10 and CIFAR-100 (Krizhevsky et al., 2009) , the Mapillary Traffic Sign Dataset (MTSD) (Ertler et al., 2020) , and a rail defect dataset provided by Swiss Federal Railways (SBB). Additionally, we used a synthetic dataset consisting of two-dimensional data points. In the following, we explain the necessary preprocessing steps to create the publicly available MTSD dataset.

Mapillary Traffic Sign Dataset (MTSD)

The Mapillary traffic sign dataset (Ertler et al., 2020 ) is a large-scale vision dataset that includes 52'000 fully annotated street-level images from all around the world. The dataset covers 400 known and other unknown traffic signs, resulting in over 255'000 traffic signs in total. Each street-level image is manually annotated and includes ground truth bounding boxes that locate each traffic sign in the image, as shown in Figure 5a . Further, each ground truth traffic sign annotation includes additional attributes such as ambiguousness or occlusion. Since the focus of this work is on classification, we convert the base MTSD dataset to a classification dataset (described below) by cropping to each ground truth bounding box. We show samples from the resulting cropped MTSD dataset in Figure 5b . We convert the MTSD objection detection dataset into a classification dataset as follows: 1. Ignore all bounding boxes that are annotated as occluded (sign partly occluded), out-offrame (sign cut off by image border), exterior (sign includes other signs), ambiguous (sign is not classifiable at all), included (sign is part of another bigger sign), dummy (looks like a sign but is not) (Ertler et al., 2020) . Further, we ignore signs of class other-sign, since this is a general class that includes any traffic sign with a label not within the MTSD taxonomy. 2. Crop to all remaining bounding boxes and produce a labeled image classification dataset. Cropping is done with slack, i.e. we crop to a randomly upsized version of the original bounding box. Given a bounding box BB = ([x min , x max ], [y min , y max ]), the corresponding upsized bounding box is given as U BB = [x min -λα x (x max -x min ), x max + λ(1 -α x )(x max -x min )], [y min -λα y (y max -y min ), y max + λ(1 -α y )(y max -y min )] where α x , α y ∼ U [0,1]foot_3 and λ is the slack parameter, which we set to λ = 1.0. 3. Resize cropped traffic signs to (64, 64).

Rail Defect Dataset (SBB)

The rail defect dataset (SBB) is a proprietary vision dataset collected and annotated by Swiss Federal Railways. It includes images of rails, each of which is annotated with ground truth bounding boxes for various types of rail defects. We note that all the models used in our work for this dataset are trained by the authors and not provided by SBB. In fact, for our work, we even consider a different type of task -classification instead of the original object detection. As a consequence, the accuracy and robustness results presented in our work are by no means representative of the actual models used by SBB. A.2 HYPERPARAMETERS TRADES We use L TRADES (Zhang et al., 2019a) to both train models from scratch and fine-tune existing models. When training models from scratch, we train for 100 epochs using L TRADES , with an initial learning rate 1e-1, which we reduce to 1e-2 and 1e-3, once 75% and 90% of the total epochs are completed. When fine-tuning models, we train for 50 epochs using L TRADES , with an initial learning rate 1e-3, which we reduce to 1e-4 once 75% of the total epochs are completed. We use batch size 200, use 10-step PGD (Madry et al., 2018) to generate adversarial examples during training, and set the β parameter in L TRADES to β T RADES = 6.0.

Empirical Robustness Abstain Training

We fine-tune for 50 epochs using L ERA (Equation 4), with an initial learning rate 1e-3, which we reduce to 1e-4 once 75% of the total epochs are completed. We use batch size 200, use 10-step PGD (Madry et al., 2018) MART We use MART (Wang et al., 2019) as an additional baseline to compare our models against. In our evaluations, we use the ε ∞ = 8 /255 trained WideResNet-28-10 (trained with 500K unlabeled data) published by Wang et al. (2019) , and fine-tune it using MART for the respective smaller perturbation region (ε ∞ ∈ { 1 /255, 2 /255, 4 /255}). We fine-tune for 50 epochs, using the same hyperparameters as Wang et al. (2019) , and without using data augmentations.

Certified Robustness Abstain Training

We fine-tune for 50 epochs using L CRA (Equation 7), with an initial learning rate 1e-3, which we reduce to 1e-4 once 75% of the total epochs are completed. We use batch size 50, k = 16 i.i.d. samples from N (0, σ 2 I), and set the inverse softmax temperature to Γ = 4.0 (cf. Section 4.2).

Probabilistic Certification via Randomized Smoothing

We use the practical Monte Carlo algorithm by Cohen et al. (2019) for randomized smoothing, using the same certification hyperparameters as them. We use N 0 = 100 Monte Carlo samples to identify the most probable class c A , N = 100, 000 Monte Carlo samples to estimate a lower bound on the probability p A , and set the failure probability to α = 0.001.

Synthetic Dataset

In Figure 1 , we illustrate the effect of our training on a synthetic three-class dataset, where each class follows a Gaussian distribution. We then use a simple four-layer neural network with 64 neurons per layer, and train it on N = 1000 synthetic samples, using L std , L TRADES (Zhang et al., 2019a) , and L ERA (Equation 4). For each loss variant, we train for 20 epochs, use a fixed learning rate 1e-1, and batch size 10. For L TRADES and L ERA , we use 10-step PGD (Madry et al., 2018) to generate adversarial examples during training, and set β T RADES = 6.0.  (S) = E (x,y)∼D 1{∀x ′ ∈ B p ε (x). S(x ′ ) = 1} Further, recall that when evaluating the robustness of an empirical robustness indicator selector S ERI (Equation 10), we in fact need to check robustness of the model F θ to double the perturbation region x ′ ∈ B p 2•ε (x), which can be see from the following derivation: R sel rob (S ERI ) = E (x,y)∼D 1{∀x ′ ∈ B p ε (x). S ERI (x ′ ) = 1} = E (x,y)∼D 1{∀x ′ ∈ B p ε (x). 1{∀x ′′ ∈ B p ε (x ′ ). F θ (x ′′ ) = F θ (x ′ )}} = E (x,y)∼D 1{∀x ′ ∈ B p 2•ε (x). F θ (x ′ ) = F θ (x)}

A.4 COMPARING APGD AND AUTOATTACK ROBUSTNESS

Recall from Section 7 that we use 40-step APGD CE (Croce & Hein, 2020b ) (referred to as APGD) to evaluate the empirical robustness of classifiers F θ . APGD is one of the adversarial attacks that constitute AutoAttack (Croce & Hein, 2020b) , which is an ensemble of adversarial attacks. Concretely, AutoAttack consists of APGD CE (Croce & Hein, 2020b ), APGD T DLR (Croce & Hein, 2020b ), FAB T (Croce & Hein, 2020a), and SquareAttack (Andriushchenko et al., 2020) . In the following, we conduct an ablation study over 40-step APGD and AutoAttack by comparing the robustness of an L ERA trained model. Concretely, we consider the Gowal et al. (2020) WideResNet-28-10 model, which was finetuned for B ∞ 2/255 using our L ERA loss (with β = 1.0) on CIFAR-10(cf. Section 7.1). We then evaluate its robust accuracy R acc rob and robust inaccuracy R ¬acc rob for the threat models ε ∞ ∈ { 1 /255, 2 /255, 4 /255, 8 /255}, using both 40-step APGD and AutoAttack, and show the results in Table 3 . Observe that for small perturbation regions ε ∞ ∈ { 1 /255, 2 /255}, the robust accuracy and robust inaccuracy are equivalent for 40-step APGD and AutoAttack, whereas for larger perturbation regions ε ∞ ∈ { 4 /255, 8 /255}, AutoAttack robust accuracy is marginally lower than 40-step APGD robust accuracy.

A.5 COMPARING ADVERSARIES FOR SOFTMAX RESPONSE (SR)

Recall from Section 7.2 that we evaluated the robustness of softmax response (SR) abstain models using APGDconf, which is a modified version of APGD (Croce & Hein, 2020b) using the alternative adversarial attack objective by Stutz et al. (2020) . This modified objective optimizes for an adversarial example x ′ that maximizes the confidence in any label c ̸ = F θ (x), instead of minimizing the confidence in the predicted label: x ′ = arg max x∈B p ε (x) max c̸ =F θ (x) f θ ( x) c The resulting adversarial attack finds high confidence adversarial examples, and thus represents an effective attack against a softmax response selector S SR .  ε ∞ = 1 /255 ε ∞ = 2 /255 ε ∞ = 4 /255 R sel rob [%] R acc rob [%] APGD APGDconf Figure 6: Robust selection (R sel rob ) and robust accuracy (R acc rob ) for CIFAR-10 softmax response (SR) abstain models (F, S SR ), for varying threshold τ ∈ [0, 1) and using the WideResNet-28-10 classifier F by Carmon et al. (2019) . Each SR abstain model is evaluated via APGD (Croce & Hein, 2020b) and APGDconf (Equation 17). Table 4 : Robust selection (R sel rob ) and robust accuracy (R acc rob ) of empirical robustness indicator abstain models (F, S ERI ), trained using L ERA (Equation 4) and L DGA (Equation 18). In the following, we conduct an ablation study over APGD and APGDconf by evaluating the robust selection R sel rob and robust accuracy R acc rob of an SR abstain model (F θ , S SR ) using both APGD and APGDconf. We use the adversarially trained WideResNet-28-10 model by Carmon et al. (2019) (taken from RobustBench (Croce et al., 2020) ), trained on CIFAR-10 for ε ∞ = 8 /255 perturbations. We then evaluate the classifier as an SR abstain model (F θ , S SR ) with varying threshold τ ∈ [0, 1), and report the robust selection and robust accuracy for varying ℓ ∞ perturbations in Figure 6 . Observe that for small perturbations such as ε ∞ = 1 /255, APGD and APGDconf are mostly equivalent concerning robust selection and robust accuracy. However, for larger perturbations such as ε ∞ = 4 /255, the SR abstain model is significantly less robust to APGDconf than to standard APGD, showing the importance of choosing a suitable adversarial attack. High confidence adversarial examples are generally more likely to be found for larger perturbations, thus an SR selector is significantly less robust to APGDconf than to APGD for larger perturbations.

A.6 LOSS FUNCTION ABLATION STUDY

Additionally to the L ERA loss from Equation 4, we consider an alternative loss formulation for training an empirical robustness indicator abstain model. The formulation is based on the Deep Gamblers loss (Liu et al., 2019) , which considers an abstain model (F θ , S) with an explicit abstain class a as a selection mechanism. Since we consider robustness indicator selection, we replace the output probability of the abstain class f θ (x) a with the output probability of the most likely adversarial label. This corresponds to the probability of a sample being non-robust and thus the probability of abstaining under a robustness indicator selector. Similar to L ERA , we also add the TRADES loss (Zhang et al., 2019a) to optimize robust accuracy. The resulting loss is then defined as: L DGA (f θ , (x, y)) = β • L TRADES (f θ , (x, y)) -log f θ (x) y + max c∈Y\{F θ (x)} f θ (x ′ ) c We conduct an ablation study over the two loss functions, L ERA and L DGA , for CIFAR-10 and a ε ∞ = 8 /255 TRADES (Zhang et al., 2019a) trained ResNet-50 model. We fine-tune the model for ℓ ∞ perturbations of radii 1 /255 and 2 /255, using both L ERA and L DGA , training for 50 epochs each and setting the regularization parameter β = 1.0. For each loss variant, we train the base model once without data augmentations and once using the AutoAugment (AA) policy (Cubuk et al., 2018) . 73.4 (+70.5%) 78.2 (+75.3%) 41.9 (+38.8%) 69.9 (+29.2%) 82.4 (+37.7%) Rnat 97.8 (+0.0%) 97.9 (+0.1%) 80.18 (+0.01%) 94.0 (+0.2%) 91.3 (-0.1%) models used in Section 7.3, for varying ℓ ∞ perturbation regions, where we use 40-step APGD (Croce & Hein, 2020b) to evaluate robustness.  D r∧a F θ = {(x, y) ∈ D : ∀x ′ ∈ B p ε (x). F θ (x ′ ) = F θ (x) ∧ F θ (x) = y} D r∧¬a F θ = {(x, y) ∈ D : ∀x ′ ∈ B p ε (x). F θ (x ′ ) = F θ (x) ∧ F θ (x) ̸ = y} D ¬r∧a F θ = {(x, y) ∈ D : ∃x ′ ∈ B p ε (x). F θ (x ′ ) ̸ = F θ (x) ∧ F θ (x) = y} D ¬r∧¬a F θ = {(x, y) ∈ D : ∃x ′ ∈ B p ε (x). F θ (x ′ ) ̸ = F θ (x) ∧ F θ (x) ̸ = y} We illustrate this dataset partitioning on the CIFAR-10 ( Krizhevsky et al., 2009) dataset. We consider a TRADES (Zhang et al., 2019b) trained ResNet-50 and the WideResNet-28-10 models by Carmon et al. (2019) ; Gowal et al. (2020) (taken from Robustbench (Croce et al., 2020) ), where each model is adversarially pretrained for ε ∞ = 8 /255 and then fine-tuned via TRADES to the respective ℓ ∞ threat model illustrated Table 7 . Further, we also consider a standard trained ResNet-50. We then evaluate the robustness and accuracy of each model using 40-step APGD (Croce & Hein, 2020b) . Considering Table 7 , note that standard adversarial training methods do not necessarily eliminate the occurrence of robust inaccurate samples (x, y) ∈ D r∧¬a F θ , and that the robust inaccuracy generally increases for smaller perturbation regions. Further, we note that while standard trained models have low robust inaccuracy, they also have low overall robustness, resulting in low overall robust accuracy. Further, we also illustrate the robustness-accuracy dataset partitioning on CIFAR-100 (Krizhevsky et al., 2009) . We consider a standard trained WideResNet-28-10 and the adversarially trained WideResNet-28-10 by Rebuffi et al. (2021) . Again, the model by Rebuffi et al. (2021) was pretrained for ε ∞ = 8 /255 perturbations and then TRADES fine-tuned for the respective threat model indicated in Table 8 . We again evaluate the robustness-accuracy dataset partitioning for varying ℓ ∞ perturbations using 40-step APGD (Croce & Hein, 2020b) , and list the exact size of each data split in Table 8 . Notably, we observe that on the model by Rebuffi et al. (2021) , 15.24% of all test samples are robust but inaccurate for ε ∞ = 1 /255 perturbations, which is a significantly larger fraction compared to similar models on CIFAR-10. Table 7 : CIFAR-10 robustness-accuracy dataset partitioning. We consider a TRADES (Zhang et al., 2019a) trained ResNet-50, adversarially trained WideResNet-28-10 models (Carmon et al., 2019; Gowal et al., 2020) , and a standard trained ResNet-50. Adversarially trained models are trained for the respective perturbation region. Each model is evaluated for the indicated ℓ ∞ threat model, using 40-step APGD (Croce & Hein, 2020b) . 



Naturally, this assumes that the method used to check robustness can correctly detect the non-robustness, even if it is caused by a single example. Note that, for a fair evaluation, we use a relatively weak 10-step PGD(Madry et al., 2018) attack during training and a strong 40-step APGD (Croce & Hein, 2020b) for evaluation. Our method can also be used to train from scratch, in which case a scheduler for β should be introduced. U [a,b] is the uniform distribution over the interval [a, b].



SERI), D) = 98.4% rob((F, SERI), D) = 100.0%

Figure2: Robust accuracy (R acc rob ) and robust inaccuracy (R ¬acc rob ) of existing robust models ( , , , ), and models fine-tuned with our loss ( , ). Our approach consistently reduces robust inaccuracy across various datasets, existing models and different regularization levels β.

Figure3: Comparison of different abstain approaches including existing robust classifiers TRADES RI ( , ), MART RI ( ), MMA RI ( ), classifiers fine-tuned with our proposed loss ERA RI ( , ), selection network ( , ), and softmax response ( , ) abstain models. Higher R sel rob and R acc rob is better (top right corner is optimal). Observe that the Parent front of our approach ( , ) generally dominates the results of all baselines, significantly improving robust selection and robust accuracy.

);Wang et al. (2019) with robustness indicator abstain: MART RI ( ), and MMA RI ( ).

For CIFAR-10, B ∞ 1/255 , and Gowal et al. (2020) model, we increase robust accuracy by +1.06% and robust selection by +1.61% (training without data augmentations).

Figure4: 2-compositional natural (R nat ) and robust accuracy (R acc rob ) for ERA RI ( , ), TRADES RI ( , ), MART RI ( ), MMA RI ( ), ACE-COLT SN , ACE-IBP SN ( , ), and TRADES SR ( , ) models. The core models used in the compositional architectures are listed in Appendix A.10. We can see that the Parent front of our method strictly improves over the prior work in the most important regionsignificantly improving model robustness while the model accuracy does not decrease.

(a) Base Mapillary Traffic Sign Dataset (MTSD). The ground truth bounding boxes are visualized in green.

Preprocessed Mapillary Traffic Sign Dataset (MTSD).

Figure 5: Illustration of Mapillary Traffic Sign Dataset (MTSD) samples. The base dataset consists of street-level images that include annotated ground truth bounding boxes locating the traffic signs (a). We convert the dataset to a classification task by cropping to the ground truth bounding boxes (b).

to generate adversarial examples during training, and set β T RADES = 6.0 for the loss term L rob = L TRADES . MMA We use MMA (Ding et al., 2018) as an additional baseline to compare our models against. In our evaluations, we use the d max = 12 /255 trained WideResNet-28-10 published by Ding et al. (2018), and fine-tune it using MMA with d max = 4 /255 for 50 epochs. We decided to fine-tune with d max = 4 /255, since we typically evaluate smaller perturbation regions (ε ∞ ∈ { 1 /255, 2 /255, 4 /255}), and since Ding et al. (2018) claim that d max should usually be set larger than ε ∞ in standard adversarial training. We fine-tune for 50 epochs, using the same hyperparameters as Ding et al. (2018), and without using data augmentations.

Figure9: Natural (R nat ) and robust accuracy (R acc rob ) for 2-compositional ERA RI models ( , ), and 2-compositional TRADES RI ( , ), MART RI ( ), and MMA RI ( ) models. Further, we also consider 2-compositional ACE-COLT SN , ACE-IBP SN ( , ), and 2-compositional TRADES SR ( , ) models. The core models used in the compositional architectures are listed in Appendix A.10.

Decision regions for models trained via standard training L std , adversarial training L TRADES

Comparison of existing robust models fine-tuned with L noise and L CRA (ours).

Robust accuracy (R acc rob ) and robust inaccuracy (R ¬acc rob ) of the B ∞ 2/255 L ERA (β = 1.0) finetuned Gowal et al. (2020) model, evaluated using both 40-step APGD (Croce & Hein, 2020b) and AutoAttack (Croce & Hein, 2020b). Recall from Section 5 that, given an abstain model (F θ , S) and a threat model B p ε (x) . . = {x ′ : ||x ′ -x|| p ≤ ε}, (F θ , S) is robustly selecting an input x if the selector S selects all valid perturbations x

Improvements of 2-compositional architectures using models F robust trained with our method over non-compositional models trained to optimize natural accuracy only (Appendix A.10).

Natural (R nat ) and adversarial accuracy (R acc rob ) of standard trained core models, used in 2-compositional architectures in Section 7.3 and Appendix A.9. N i=1 } on which we evaluate the classifier F θ : X → Y. Based on the robustness and accuracy of the classifier F θ , we can partition D into four disjoint subsets D = {D r∧a F θ , D ¬r∧a

CIFAR-100 robustness-accuracy dataset partitioning. We consider a standard trained WideResNet-28-10 and the adversarially trained WideResNet-28-10 byRebuffi et al. (2021), trained for the respective perturbation region considered in each evaluation. Each model is evaluated for the indicated ℓ ∞ threat model, using 40-step APGD(Croce & Hein, 2020b).

annex

) and robust inaccuracy (R ¬acc rob ) of existing robust models ( , ) fine-tuned with our proposed loss ( , ). Further, we also show models finetuned via MART (Wang et al., 2019 ) ( ) and MMA (Ding et al., 2018) ( ). Our approach consistently reduces the number of robust inaccurate samples across various datasets, existing models and at different regularization levels β.We show the robust accuracy and the robust selection of the resulting robustness indicator abstain models in Table 4 . Observe that for all experiments, L ERA trained models achieve consistently higher robust accuracy and higher robust selection, compared to L DGA trained models. For instance, when training for ε ∞ = 1 /255 perturbations without data augmentations, L ERA achieves +1.71% higher robust accuracy and +1.33% higher robust selection, compared to L DGA . Similarly, when training with AutoAugment, L ERA achieves +0.91% higher robust accuracy and +2.72% higher robust selection. Similar results hold for ε ∞ = 2 /255 perturbations.

A.7 ADDITIONAL EXPERIMENTS ON REDUCING ROBUST INACCURACY

In this section, we present additional experiments on reducing robust inaccuracy for empirical robustness.Similar to the results in Figure 2 , we show the robust accuracy (R acc rob ) and robust inaccuracy (R ¬acc rob ) of different existing models fine-tuned with ( ) and without ( ) data augmentations, in Figure 7 . At the same time, Figure 7 also shows the same models fine-tuned with our proposed loss with ( ) and without ( ) data augmentations. We again observe that our approach achieves consistently lower robust robust inaccuracy, compared to existing robust models. For example, on CIFAR-10 and for B ∞ 1/255 , the model from Carmon et al. (2019) achieves 91.7% robust accuracy but also 1.8% robust inaccuracy. Using our loss L ERA and varying the regularization term β, we can obtain a number of models that reduce robust inaccuracy to 0.14% while still achieving robust accuracy of 75.8%.

A.8 ADDITIONAL EXPERIMENTS ON USING ROBUSTNESS TO ABSTAIN

In this section, we present additional experiments on comparing different abstain approaches for empirical robustness.We compare robustness indicator abstain models (F, S RI ) using existing robust classifiers TRADES RI and classifiers fine-tuned with our proposed loss ERA RI . Further, we again consider softmax response and selection network abstain models, as described in Section 7.2. Equivalent to Section 7.2, we use the robust selection (R sel rob ), and the ratio of non-abstained samples that are robust and accurate (R acc rob ) as our evaluation metrics. We show the comparison of the different abstain models in Figure 8 . Similar to the results in Section 7.2, we again show that, as designed, our approach consistently improves robust accuracy. For instance, consider the CIFAR-10 Zhang et al. (2019a) model at ε ∞ = 1 /255, trained without data augmentations ( ). The ERA RI model with the highest robust selection R sel rob improves robust accuracy by +2.39% at the expense of -3.44% decrease in robust selection. This tradeoff is close to optimal since our approach increases robust accuracy by correctly abstaining from mispredicted samples, thus an increase in robust accuracy results in a corresponding decrease in robust selection. Further, we again observe that by varying the regularization parameter β, we can obtain a Pareto front of optimal solutions. Considering the CIFAR-10 Zhang et al. (2019a) model at ε ∞ = 1 /255, trained with data augmentations ( ), we can improve the robust accuracy up to 99.75%, an increase ) , and softmax response ( , ) abstain models. The higher R sel rob and R acc rob , the better (top right corner is optimal). of +4.38% compared to the corresponding TRADES RI model ( ). However, this comes at the expense of a disproportionally large decrease of -42.27% lower robust selection. We observe similar results for other models, datasets, and perturbations regions, shown in Figure 8 . Further, we again note that our approach mostly improves both robust selection and robust accuracy when compared to softmax response and selection network abstain models.

A.9 ADDITIONAL EXPERIMENTS ON BOOSTING ROBUSTNESS WITHOUT ACCURACY LOSS

In this section, we present additional results on combining abstain models with state-of-the-art models trained to achieve high natural accuracy.Equivalent to Section 7.3, we put the abstain models trained so far in 2-composition (Section 6) with the standard trained core models discussed in Appendix A.10. We show the natural (R nat ) and adversarial accuracy (R acc rob ) of the resulting 2-compositional architectures in Figure 9 . We again observe that 2-compositional architectures using models trained by our method ( , ) improve over existing methods that solely optimize for robust accuracy ( , ). Further, our method mostly improves both the natural and robust accuracy, compared to 2-compositional architectures using softmax response ( , ) or selection network ( , ) to abstain. For example, on SBB and the Zhang et al. (2019a) model at ε ∞ = 1 /255, our approach ( ) improves natural accuracy by +0.68%, while decreasing the robust accuracy by only -1.54%.Further, we show that 2-compositional architectures using models trained by our method achieve significantly higher robustness and mostly equivalent overall accuracy, compared to state-of-theart non-compositional models trained for high natural accuracy. In Table 5 , we show the natural (R nat ) and adversarial accuracy (R acc rob ) of our 2-compositional models and illustrate the accuracy improvement over the standard trained models discussed in Appendix A.10. For instance, consider CIFAR-10 at ε ∞ = 2 /255 and the 2-compositional architecture using the Gowal et al. (2020) model as robust model F robust . Our model improves the robust accuracy by +75.3% and the natural accuracy by +0.1%, compared to the standard trained model by Zhao et al. (2020) . Similar results hold for other models, datasets, and perturbation regions.

A.10 CORE MODELS

Recall from Section 6 that an abstain model (F, S) can be enhanced by a core model F core , which makes a prediction on all abstained samples, resulting in 2-compositional architectures. In Section 7.3, we presented an evaluation of 2-compositional architectures, where we used state-of-the-art standard trained models as core models. In Table 6 , we show the natural and adversarial accuracy of core

