CERTIFY OR PREDICT: BOOSTING CERTIFIED ROBUSTNESS WITH COMPOSITIONAL ARCHITECTURES

Abstract

A core challenge with existing certified defense mechanisms is that while they improve certified robustness, they also tend to drastically decrease natural accuracy, making it difficult to use these methods in practice. In this work, we propose a new architecture which addresses this challenge and enables one to boost the certified robustness of any state-of-the-art deep network, while controlling the overall accuracy loss, without requiring retraining. The key idea is to combine this model with a (smaller) certified network where at inference time, an adaptive selection mechanism decides on the network used to process the input sample. The approach is compositional: one can combine any pair of state-of-the-art (e.g., EfficientNet or ResNet) and certified networks, without restriction. The resulting architecture enables much higher natural accuracy than previously possible with certified defenses alone, while substantially boosting the certified robustness of deep networks. We demonstrate the effectiveness of this adaptive approach on a variety of datasets and architectures. For instance, on CIFAR-10 with an ∞ perturbation of 2/255, we are the first to obtain a high natural accuracy (90.1%) with non-trivial certified robustness (27.5%). Notably, prior state-of-the-art methods incur a substantial drop in accuracy for a similar certified robustness.

1. INTRODUCTION

Most recent defenses against adversarial examples have been broken by stronger and more adaptive attacks (Athalye et al., 2018; Tramer et al., 2020) , highlighting the importance of investigating certified defenses with suitable robustness guarantees (Raghunathan et al., 2018; Wong & Kolter, 2018; Zhang et al., 2020; Balunović & Vechev, 2020) . And while there has been much progress in developing new certified defenses, a fundamental roadblock to their practical adoption is that they tend to produce networks with an unsatisfying natural accuracy. In this work we propose a novel architecture which brings certified defenses closer to practical use: the architecture enables boosting certified robustness of state-of-the-art deep neural networks without incurring significant accuracy loss and without requiring retraining. Our proposed architecture is compositional and consists of three components: (i) a core-network with high natural accuracy, (ii) a certification-network with high certifiable robustness (need not have high accuracy), and (iii) a selection mechanism that adaptively decides which one of the two networks should process the input sample. The benefit of this architecture is that we can plug in any state-of-the-art deep neural network as a core-network and any certified defense for the certification-network, thus benefiting from any future advances in standard training and certified defenses. A key challenge with certifying the robustness of a decision made by the composed architecture is obtaining a certifiable selection mechanism. Towards that, we propose two different selection mechanisms, one based on an auxiliary selection-network and another based on entropy, and design effective ways to certify both. Experimentally, we demonstrate the promise of this architecture: we are able to train a model with much higher natural accuracy than models trained using prior certified defenses while obtaining non-trivial certified robustness. For example, on the challenging CIFAR-10 dataset with an ∞ perturbation of 2/255, we obtain 90.1% natural accuracy and a certified robustness of 27.5%. On the same task, prior approaches cannot obtain the same natural accuracies for any non-trivial certified robustness.

Main contributions Our main contributions are:

• A new architecture, called ACE (short for Architecture for Certification), which boosts certified robustness of networks with high natural accuracy (e.g., EfficientNet). • Methods to train our newly proposed architecture and to certify the robustness of the entire composed network, including the certification of the selection mechanism. • Experimental evaluation on the CIFAR-10, TinyImageNet and ImageNet200 datasets, demonstrating the promise of ACE: at the same non-trivial certified robustness levels, we can achieve significantly higher accuracies than prior work. As mentioned earlier, a key challenge with these methods is that in order to gain certified robustness, they tend to incur a drastic drop in natural accuracy. In parallel to certified defenses, there has also been interest in certifying already trained models (Katz et al., 2017; Tjeng et al., 2017; Gehr et al., 2018; Weng et al., 2018; Bunel et al., 2018; Wang et al., 2018a; Singh et al., 2019) . While these methods were initially focused mostly on L p robustness, these works (as well as ours) can be naturally extended to other notions of robustness, such as geometric (Balunović et al., 2019) or semantic (Mohapatra et al., 2020) perturbations. A line of work that weakens deterministic guarantees so to scale to larger networks is that of randomized smoothing which offers probabilistic guarantees (Lecuyer et al., 2018; Cohen et al., 2019; Salman et al., 2019a) . While interesting, this technique incurs overhead at inference time due to additional sampling, and further, generalizing smoothing to richer transformations (e.g., geometric) is non-trivial (Fischer et al., 2020) . In contrast, our work handles large networks while providing deterministic guarantees and because of its compositional nature, directly benefits from any advancements in certification and certified defenses with richer perturbations. Our proposed architecture is partially inspired by prior work on designing custom architectures for dynamic routing in neural networks (Teerapittayanon et al., 2016; Bolukbasi et al., 2017; McGill & Perona, 2017; Wang et al., 2018b) . While the main goal of these architectures is to speed up inference, our observation is that similar type of ideas are applicable to the problem of enhancing certifiable robustness of existing neural networks.

3. BACKGROUND

We now present the necessary background needed to define our method. Adversarial Robustness We define adversarial robustness of a model h as a requirement that h classifies all inputs in a p-norm ball B p (x) of radius around the sample x to the same class: arg max j h(x) j = arg max j h(x ) j , ∀x ∈ B p (x) := {x = x + η | |η| p ≤ p } In this work we focus on an ∞ based threat model and use the notation p to indicate the upper bound to the p -norm of admissible perturbations. The robust accuracy of a network is derived from this definition as the probability that an unperturbed sample from the test distribution is classified correctly and Equation 1 holds. As it is usually infeasible to compute exact robustness, we define certifiably robust accuracy (also certifiable accuracy or certifiable robustness), as a provable lower



We release our code as open source: https://github.com/eth-sri/ACE 2 RELATED WORK There has been much recent work on certified defenses, that is, training neural networks with provable robustness guarantees. These works include methods based on semidefinite relaxations (Raghunathan et al., 2018), linear relaxations and duality (Wong & Kolter, 2018; Wong et al., 2018; Xu et al., 2020), abstract interpretation (Mirman et al., 2018), and interval bound propagation (Gowal et al., 2018). The three most recent advances are COLT (Balunović & Vechev, 2020), based on convex layer-wise adversarial training, CROWN-IBP (Zhang et al., 2020), based on a combination of linear relaxations Zhang et al. (2018) and interval propagation, and LiRPA (Xu et al., 2020) scaling to problems with many more classes by directly bounding the cross entropy loss instead of logit margins.

