ENHANCING CERTIFIED ROBUSTNESS OF SMOOTHED CLASSIFIERS VIA WEIGHTED MODEL ENSEMBLING

Abstract

Randomized smoothing has achieved state-of-the-art certified robustness against l 2 -norm adversarial attacks. However, it is not wholly resolved on how to find the optimal base classifier for randomized smoothing. In this work, we employ a Smoothed WEighted ENsembling (SWEEN) scheme to improve the performance of randomized smoothed classifiers. We show the ensembling generality that SWEEN can help achieve optimal certified robustness. Furthermore, theoretical analysis proves that the optimal SWEEN model can be obtained from training under mild assumptions. We also develop an adaptive prediction algorithm to reduce the prediction and certification cost of SWEEN models. Extensive experiments show that SWEEN models outperform the upper envelope of their corresponding candidate models by a large margin. Moreover, SWEEN models constructed using a few small models can achieve comparable performance to a single large model with a notable reduction in training time.

1. INTRODUCTION

Deep neural networks have achieved great success in image classification tasks. However, they are vulnerable to adversarial examples, which are small imperceptible perturbations on the original inputs that can cause misclassification (Biggio et al., 2013; Szegedy et al., 2014) . To tackle this problem, researchers have proposed various defense methods to train classifiers robust to adversarial perturbations. These defenses can be roughly categorized into empirical defenses and certified defenses. One of the most successful empirical defenses is adversarial training (Kurakin et al., 2017; Madry et al., 2018) , which optimizes the model by minimizing the loss over adversarial examples generated during training. Empirical defenses produce models robust to certain attacks without a theoretical guarantee. Most of the empirical defenses are heuristic and subsequently broken by more sophisticated adversaries (Carlini & Wagner, 2017; Athalye et al., 2018; Uesato et al., 2018; Tramer et al., 2020) . Certified defenses, either exact or conservative, are introduced to mitigate such deficiency in empirical defenses. In the context of l p norm-bounded perturbations, exact methods report whether an adversarial example exists within an l p ball with radius r centered at a given input x. Exact methods are usually based on Satisfiability Modulo Theories (Katz et al., 2017; Ehlers, 2017) or mixed-integer linear programming (Lomuscio & Maganti, 2017; Fischetti & Jo, 2017) , which are computationally inefficient and not scalable (Tjeng et al., 2019) . Conservative methods are more computationally efficient, but might mistakenly flag a safe data point as vulnerable to adversarial examples (Raghunathan et al., 2018a; Wong & Kolter, 2018; Wong et al., 2018; Gehr et al., 2018; Mirman et al., 2018; Weng et al., 2018; Zhang et al., 2018; Raghunathan et al., 2018b; Dvijotham et al., 2018b; Singh et al., 2018; Wang et al., 2018b; Salman et al., 2019b; Croce et al., 2019; Gowal et al., 2018; Dvijotham et al., 2018a; Wang et al., 2018a) . However, both types of defenses are not scalable to practical networks that perform well on modern machine learning problems (e.g., the ImageNet (Deng et al., 2009) 

classification task).

Recently, a new certified defense technique called randomized smoothing has been proposed (Lecuyer et al., 2019; Cohen et al., 2019) . A (randomized) smoothed classifier is constructed from a base classifier, typically a deep neural network. It outputs the most probable class given by its base classifier under a random noise perturbation of the input. Randomized smoothing is scalable due to its independency over architectures and has achieved state-of-the-art certified l 2 -robustness. In theory, randomized smoothing can apply to any classifiers. However, naively using randomized smoothing on standard-trained classifiers leads to poor robustness results. It is still not wholly resolved on how a base classifier should be trained so that the corresponding smoothed classifier has good robustness properties. Recently, Salman et al. (2019a) employ adversarial training to train base classifiers and substantially improve the performance of randomized smoothing, which indicates that techniques originally proposed for empirical defenses can be useful in finding good base classifiers for randomized smoothing. In this paper, we take a step towards finding suitable base models for randomized smoothing by model ensembling. The idea of model ensembling has been used in various empirical defenses against adversarial examples and shows promising results for robustness (Liu et al., 2018; Strauss et al., 2018; Pang et al., 2019; Wang et al., 2019; Meng et al., 2020; Sen et al., 2020) . Moreover, an ensemble can combine the strengths of candidate modelsfoot_0 to achieve superior clean accuracy (Hansen & Salamon, 1990; Krogh & Vedelsby, 1994) . Thus, we believe ensembling several smoothed models can help improve both the robustness and accuracy. Specifically for randomized smoothing, the smoothing operator is commutative with the ensembling operator: ensembling several smoothed models is equivalent to smoothing an ensembled base model. This property makes the combination suitable and efficient. Therefore, we directly ensemble a base model by taking some pre-trained models as candidates and optimizing the optimal weights for randomized smoothing. We refer to the final model as a Smoothed WEighted ENsembling (SWEEN) model. Moreover, SWEEN does not limit how individual candidate classifiers are trained, thus is compatible with most previously proposed training algorithms on randomized smoothing. Our contributions are summarized as follows: 1. We propose SWEEN to substantially improve the performance of smoothed models. Theoretical analysis shows the ensembling generality and the optimization guarantee: SWEEN can achieve optimal certified robustness w.r.t. the defined γ-robustness index, which is an extension of previously proposed criteria of certified robustness (Lemma 1), and SWEEN can be easily trained to a near-optimal risk with a surrogate loss (Theorem 2). 2. We develop an adaptive prediction algorithm for the weighted ensembling, which effectively reduces the prediction and certification cost of the smoothed ensemble classifier. 3. We evaluate our proposed method through extensive experiments. On all tasks, SWEEN models consistently outperform the upper envelopes of their respective candidate models in terms of the approximated certified accuracy by a large margin. In addition, SWEEN models can achieve comparable or superior performance to a large individual model using a few candidates with a notable reduction in total training time.

2. RELATED WORK

In the past few years, numerous defenses have been proposed to build classifiers robust to adversarial examples. Our work typically involves randomized smoothing and model ensembling. et al., 2020; Kumar et al., 2020; Yang et al., 2020; Lee et al., 2019; Teng et al., 2019; Zhang et al., 2020) . Recently, a series of works (Salman et al., 2019a; Zhai et al., 2020) develop practical algorithms to train a base classifier for randomized smoothing. Our work improves the performance of smoothed classifiers via weighted ensembling of pre-trained base classifiers.

Randomized smoothing

Model ensembling Model ensembling has been widely studied and applied in machine learning as a technique to improve the generalization performance of the model (Hansen & Salamon, 1990; Krogh & Vedelsby, 1994) . Krogh & Vedelsby (1994) show that ensembles constructed from accurate



In this paper, "candidate model" and "candidate" refer to an individual model used in an ensemble. The term "base model" refers to a model to which randomized smoothing applies.



Randomized smoothing constructs a smoothed classifier from a base classifier via convolution between the input distribution and certain noise distribution. It is first proposed as a heuristic defense by(Liu et al., 2018; Cao & Gong, 2017).Lecuyer et al. (2019)   first prove robustness guarantees for randomized smoothing utilizing tools from differential privacy. Subsequently, a stronger robustness guarantee is given byLi et al. (2018). Cohen et al. (2019) provide a tight robustness bound for isotropic Gaussian noise in l 2 robustness setting. The theoretical properties of randomized smoothing in various norm and noise distribution settings have been further discussed in the literature (Blum

