ON THE CERTIFIED ROBUSTNESS FOR ENSEMBLE MODELS AND BEYOND Anonymous

Abstract

Recent studies show that deep neural networks (DNN) are vulnerable to adversarial examples, which aim to mislead DNNs to make arbitrarily incorrect predictions. To defend against such attacks, both empirical and theoretical defense approaches have been proposed for a single ML model. In this work, we aim to explore and characterize the robustness conditions for ensemble ML models. We prove that the diversified gradient and large confidence margin are sufficient and necessary conditions for certifiably robust ensemble models under the modelsmoothness assumption. We also show that an ensemble model can achieve higher certified robustness than a single base model based on these conditions. To our best knowledge, this is the first work providing tight conditions for the ensemble robustness. Inspired by our analysis, we propose the lightweight Diversity Regularized Training (DRT) for ensemble models. We derive the certified robustness of DRT based ensembles such as standard Weighted Ensemble and Max-Margin Ensemble following the sufficient and necessary conditions. Besides, to efficiently calculate the model-smoothness, we leverage adapted randomized model smoothing to obtain the certified robustness for different ensembles in practice. We show that the certified robustness of ensembles, on the other hand, verifies the necessity of DRT. To compare different ensembles, we prove that when the adversarial transferability among base models is high, Max-Margin Ensemble can achieve higher certified robustness than Weighted Ensemble; vice versa. Extensive experiments show that ensemble models trained with DRT can achieve the state-of-theart certified robustness under various settings. Our work will shed light on future analysis for robust ensemble models.

1. INTRODUCTION

Deep neural networks (DNN) have been widely applied in various applications, such as image classification (Krizhevsky, 2012; He et al., 2016) , face recognition (Taigman et al., 2014; Sun et al., 2014) , and natural language processing (Vaswani et al., 2017; Devlin et al., 2019) . However, it is well-known that DNNs are vulnerable to adversarial examples (Szegedy et al., 2013; Carlini & Wagner, 2017; Xiao et al., 2018) , and it has raised great concerns especially when they are deployed in the safety-critical applications such as autonomous driving and facial recognition. To defend against such attacks, several empirical defenses have been proposed (Papernot et al., 2016b; Buckman et al., 2018; Madry et al., 2018) ; however, many of them have been attacked again by strong adaptive attackers (Athalye et al., 2018; Tramer et al., 2020) . On the other hand, the certified defenses (Wong & Kolter, 2018; Cohen et al., 2019) have been proposed to provide certified robustness guarantees for given ML models, so that no additional attack can break the model under certain conditions. For instance, randomized smoothing has been proposed as an effective defense providing certified robustness (Lecuyer et al., 2019; Li et al., 2019; Cohen et al., 2019; Yang et al., 2020) . Compared with other certified robustness approaches such as linear bound propagation (Weng et al., 2018; Mirman et al., 2018) and interval bound propagation (Gowal et al., 2019) , randomized smoothing provides a way to smooth a given DNN efficiently and does not depend on the neural network architecture. However, existing defenses mainly focus on the robustness of a single ML model, and it is unclear whether an ensemble ML model could provide additional robustness. In this work, we aim to char-

