ON THE CERTIFIED ROBUSTNESS FOR ENSEMBLE MODELS AND BEYOND Anonymous

Abstract

Recent studies show that deep neural networks (DNN) are vulnerable to adversarial examples, which aim to mislead DNNs to make arbitrarily incorrect predictions. To defend against such attacks, both empirical and theoretical defense approaches have been proposed for a single ML model. In this work, we aim to explore and characterize the robustness conditions for ensemble ML models. We prove that the diversified gradient and large confidence margin are sufficient and necessary conditions for certifiably robust ensemble models under the modelsmoothness assumption. We also show that an ensemble model can achieve higher certified robustness than a single base model based on these conditions. To our best knowledge, this is the first work providing tight conditions for the ensemble robustness. Inspired by our analysis, we propose the lightweight Diversity Regularized Training (DRT) for ensemble models. We derive the certified robustness of DRT based ensembles such as standard Weighted Ensemble and Max-Margin Ensemble following the sufficient and necessary conditions. Besides, to efficiently calculate the model-smoothness, we leverage adapted randomized model smoothing to obtain the certified robustness for different ensembles in practice. We show that the certified robustness of ensembles, on the other hand, verifies the necessity of DRT. To compare different ensembles, we prove that when the adversarial transferability among base models is high, Max-Margin Ensemble can achieve higher certified robustness than Weighted Ensemble; vice versa. Extensive experiments show that ensemble models trained with DRT can achieve the state-of-theart certified robustness under various settings. Our work will shed light on future analysis for robust ensemble models.

1. INTRODUCTION

Deep neural networks (DNN) have been widely applied in various applications, such as image classification (Krizhevsky, 2012; He et al., 2016) , face recognition (Taigman et al., 2014; Sun et al., 2014) , and natural language processing (Vaswani et al., 2017; Devlin et al., 2019) . However, it is well-known that DNNs are vulnerable to adversarial examples (Szegedy et al., 2013; Carlini & Wagner, 2017; Xiao et al., 2018) , and it has raised great concerns especially when they are deployed in the safety-critical applications such as autonomous driving and facial recognition. To defend against such attacks, several empirical defenses have been proposed (Papernot et al., 2016b; Buckman et al., 2018; Madry et al., 2018) ; however, many of them have been attacked again by strong adaptive attackers (Athalye et al., 2018; Tramer et al., 2020) . On the other hand, the certified defenses (Wong & Kolter, 2018; Cohen et al., 2019) have been proposed to provide certified robustness guarantees for given ML models, so that no additional attack can break the model under certain conditions. For instance, randomized smoothing has been proposed as an effective defense providing certified robustness (Lecuyer et al., 2019; Li et al., 2019; Cohen et al., 2019; Yang et al., 2020) . Compared with other certified robustness approaches such as linear bound propagation (Weng et al., 2018; Mirman et al., 2018) and interval bound propagation (Gowal et al., 2019) , randomized smoothing provides a way to smooth a given DNN efficiently and does not depend on the neural network architecture. However, existing defenses mainly focus on the robustness of a single ML model, and it is unclear whether an ensemble ML model could provide additional robustness. In this work, we aim to char-acterize the conditions for a robust ensemble and answer the question from both theoretical and empirical perspectives. In particular, we analyze the standard Weighted Ensemble (WE) and Max-Margin Ensemble (MME) protocols, and prove the necessary and sufficient conditions for robust ensemble models under mild model-smoothness assumptions. Under these conditions, we can see that an ensemble model would be more robust than each single base model. The intuitive illustration of their certified robust radius is in Fig 1 . Our analysis shows that diversified gradient and large confidence margins of base models would lead to higher certified robustness for ensemble models. Inspired by our analysis, we propose Diversity-Regularized Training, a lightweight regularization-based ensemble training approach. We derive certified robustness for both WE and MME trained with DRT, and realize model-smoothness assumption via randomized smoothing. We analyze different smoothing protocols and prove that Ensemble Before Smoothing provides higher certified robustness. We further prove that when the adversarial transferability among base models is high, MME is more robust than WE. We evaluate DRT on a wide range of datasets including MNIST, CIFAR-10, and ImageNet. Extensive experiments show that DRT can achieve higher certified robustness compared with the stateof-the-art baselines with similar training cost as training a single model. Furthermore, when we combine DRT with existing robust models as the base models, DRT can achieve the highest certified robustness to our best knowledge. We summarize our main contributions as follows: 1) We provide the necessary and sufficient conditions for robust ensemble models including Weighted Ensemble (WE) and Max-Margin Ensemble (MME) under the model-smoothness assumptions. We prove that an ensemble model is more robust than a single base model under the model-smoothness assumption. Our analysis shows that diversified gradients and large confidence margins of base models are the keys to robust ensembles. 2) Based on our analysis, we propose DRT, a lightweight regularization-based training approach, containing both Gradient Diversity Loss and Confidence Margin Loss. 3) We derive certified robustness for ensemble models trained with DRT. The analysis of certified robustness further reveals the importance of DRT. Under mild conditions, we further prove that when the adversarial transferability among base models is high MME is more robust than WE. 4) We conduct extensive experiments to evaluate the effectiveness of DRT on various datasets, which show that DRT can achieve the best certified robustness with similar training time as a single ML model. (Szegedy et al., 2013) . To defend against such attacks, several empirical defenses have been proposed (Papernot et al., 2016b; Madry et al., 2018) . For ensemble models, existing work mainly focuses on empirical robustness (Pang et al., 2019; Li et al., 2020; Srisakaokul et al., 2018) where the robustness is measured by accuracy under existing attacks and no certifiable robustness guarantee could be provided or enhanced; or certify the robustness for a vanilla weighted ensemble (Zhang et al., 2019; Liu et al., 2020) using either LP-based (Zhang et al., 2018) verification or randomized smoothing but without diversity enforcement. In this paper, we aim to prove that the gradient diversity and base model margin are two key factors for certified ensemble robustness and based on these key factors, we propose a training approach to enhance the certified robustness of model ensemble.

Related work. DNNs are known vulnerable to adversarial examples

Randomized smoothing (Cohen et al., 2019) has been proposed to provide certified robustness for a single ML model. It achieved the state-of-the-art certified robustness on large ImageNet and CIFAR-10 dataset under L 2 norm. Several approaches have further improved it by: (1) choosing different smoothing distributions for different L p norms (Dvijotham et al., 2019; Zhang et al., 2020; Yang et al., 2020) , and (2) training more robust smoothed classifiers, using data augmentation (Cohen et al., 2019 ), unlabeled data (Carmon et al., 2019 ), adversarial training (Salman et al., 2019 ), regularization (Li et al., 2019; Zhai et al., 2019), and denoising (Salman et al., 2020) . However, within our knowledge, there is no work studying how to customize randomized smoothing for ensemble models. In this paper, we compare and select a good randomized smoothing strategy to improve the certified robustness of the ensemble. In this paper, we mainly focus on the certified robustness under L 2 norm. Though randomized smoothing suffers from difficulties when it comes to L ∞ norm (Yang et al., 2020; Kumar et al., 



Figure 1: Illustration of a robust ensemble.

