ON THE ROBUSTNESS OF RANDOMIZED ENSEMBLES TO ADVERSARIAL PERTURBATIONS Anonymous authors Paper under double-blind review

Abstract

Randomized ensemble classifiers (RECs), where one classifier is randomly selected during inference, have emerged as an attractive alternative to traditional ensembling methods for realizing adversarially robust classifiers with limited compute requirements. However, recent works have shown that existing methods for constructing RECs are more vulnerable than initially claimed, casting major doubts on their efficacy and prompting fundamental questions such as: "When are RECs useful?", "What are their limits?", and "How do we train them?". In this work, we first demystify RECs as we derive fundamental results regarding their theoretical limits, necessary and sufficient conditions for them to be useful, and more. Leveraging this new understanding, we propose a new boosting algorithm (BARRE) for training robust RECs, and empirically demonstrate its effectiveness at defending against strong ℓ ∞ norm-bounded adversaries across various network architectures and datasets. Our code is submitted as part of the supplementary material, and will be publicly released on GitHub.

1. INTRODUCTION

Defending deep networks against adversarial perturbations (Szegedy et al., 2013; Biggio et al., 2013; Goodfellow et al., 2014) remains a difficult task. Several proposed defenses (Papernot et al., 2016; Pang et al., 2019; Yang et al., 2019; Sen et al., 2019; Pinot et al., 2020) have been subsequently "broken" by stronger adversaries (Carlini & Wagner, 2017; Athalye et al., 2018; Tramèr et al., 2020; Dbouk & Shanbhag, 2022) , whereas strong defenses (Cisse et al., 2017; Tramèr et al., 2018; Cohen et al., 2019) , such as adversarial training (AT) (Goodfellow et al., 2014; Zhang et al., 2019; Madry et al., 2018) , achieve unsatisfactory levels of robustness 1 . A popular belief in the adversarial community is that single model defenses, e.g., AT, lack the capacity to defend against all possible perturbations, and that constructing an ensemble of diverse, often smaller, models should be more cost-effective (Pang et al., 2019; Kariyappa & Qureshi, 2019; Pinot et al., 2020; Yang et al., 2020b; 2021; Abernethy et al., 2021; Zhang et al., 2022) . Indeed, recent deterministic robust ensemble methods, such as MRBoost (Zhang et al., 2022) , have been successful at achieving higher robustness compared to AT baselines using the same network architecture, at the expense of 4× more compute (see Fig. 1 ). In fact, Fig 1 indicates that one can simply adversarially training larger deep nets that can match the robustness and compute requirements of MRBoost models, rendering state-of-the-art boosting techniques obsolete for designing classifiers that are both robust and efficient. In contrast, randomized ensembles, where one classifier is randomly selected during inference, offer a unique way of ensembling that can operate with limited compute resources. However, the recent work of Dbouk & Shanbhag (2022) has cast major concerns regarding their efficacy, as they successfully compromised the state-of-the-art randomized defense of Pinot et al. (2020) by large margins using their proposed ARC adversary. Furthermore, there is an apparent lack of proper theory on the robustness of randomized ensembles, as fundamental questions such as: "when does randomization help?" or "how to find the optimal sampling probability?" remain unanswered. Contributions. In this work, we first provide a theoretical framework for analyzing the adversarial robustness of randomized ensmeble classifiers (RECs). Our theoretical results enable us to better 1 when compared to the high clean accuracy achieved in a non-adversarial setting 1

