ENHANCING CERTIFIED ROBUSTNESS OF SMOOTHED CLASSIFIERS VIA WEIGHTED MODEL ENSEMBLING

Abstract

Randomized smoothing has achieved state-of-the-art certified robustness against l 2 -norm adversarial attacks. However, it is not wholly resolved on how to find the optimal base classifier for randomized smoothing. In this work, we employ a Smoothed WEighted ENsembling (SWEEN) scheme to improve the performance of randomized smoothed classifiers. We show the ensembling generality that SWEEN can help achieve optimal certified robustness. Furthermore, theoretical analysis proves that the optimal SWEEN model can be obtained from training under mild assumptions. We also develop an adaptive prediction algorithm to reduce the prediction and certification cost of SWEEN models. Extensive experiments show that SWEEN models outperform the upper envelope of their corresponding candidate models by a large margin. Moreover, SWEEN models constructed using a few small models can achieve comparable performance to a single large model with a notable reduction in training time.

1. INTRODUCTION

Deep neural networks have achieved great success in image classification tasks. However, they are vulnerable to adversarial examples, which are small imperceptible perturbations on the original inputs that can cause misclassification (Biggio et al., 2013; Szegedy et al., 2014) . To tackle this problem, researchers have proposed various defense methods to train classifiers robust to adversarial perturbations. These defenses can be roughly categorized into empirical defenses and certified defenses. One of the most successful empirical defenses is adversarial training (Kurakin et al., 2017; Madry et al., 2018) , which optimizes the model by minimizing the loss over adversarial examples generated during training. Empirical defenses produce models robust to certain attacks without a theoretical guarantee. Most of the empirical defenses are heuristic and subsequently broken by more sophisticated adversaries (Carlini & Wagner, 2017; Athalye et al., 2018; Uesato et al., 2018; Tramer et al., 2020) . Certified defenses, either exact or conservative, are introduced to mitigate such deficiency in empirical defenses. In the context of l p norm-bounded perturbations, exact methods report whether an adversarial example exists within an l p ball with radius r centered at a given input x. Exact methods are usually based on Satisfiability Modulo Theories (Katz et al., 2017; Ehlers, 2017) or mixed-integer linear programming (Lomuscio & Maganti, 2017; Fischetti & Jo, 2017) , which are computationally inefficient and not scalable (Tjeng et al., 2019) . Conservative methods are more computationally efficient, but might mistakenly flag a safe data point as vulnerable to adversarial examples (Raghunathan et al., 2018a; Wong & Kolter, 2018; Wong et al., 2018; Gehr et al., 2018; Mirman et al., 2018; Weng et al., 2018; Zhang et al., 2018; Raghunathan et al., 2018b; Dvijotham et al., 2018b; Singh et al., 2018; Wang et al., 2018b; Salman et al., 2019b; Croce et al., 2019; Gowal et al., 2018; Dvijotham et al., 2018a; Wang et al., 2018a) . However, both types of defenses are not scalable to practical networks that perform well on modern machine learning problems (e.g., the ImageNet (Deng et al., 2009) 

classification task).

Recently, a new certified defense technique called randomized smoothing has been proposed (Lecuyer et al., 2019; Cohen et al., 2019) . A (randomized) smoothed classifier is constructed from a base classifier, typically a deep neural network. It outputs the most probable class given by its base classifier under a random noise perturbation of the input. Randomized smoothing is scalable due to its independency over architectures and has achieved state-of-the-art certified l 2 -robustness. In theory, randomized smoothing can apply to any classifiers. However, naively using randomized smoothing on standard-trained classifiers leads to poor robustness results. It is still not wholly

