

Abstract

We propose an ensemble-based defense against adversarial examples using distance map layers (DMLs). Similar to fully connected layers, DMLs can be used to output logits for a multi-class classification model. We show in this paper how DMLs can be deployed to prevent transferability of attacks across ensemble members by adapting pairwise (almost) orthogonal covariance matrices. We also illustrate how DMLs provide an efficient way to regularize the Lipschitz constant of the ensemble's member models, which further boosts the resulting robustness. Through empirical evaluations across multiple datasets and attack models, we demonstrate that the ensembles based on DMLs can achieve high benign accuracy while exhibiting robustness against adversarial attacks using multiple white-box techniques along with AutoAttack.

1. I

Ongoing research has provided defenses against adversarial examples, which are crafted from correctly classified inputs with imperceptible perturbations. Despite the success of ensemble learning as a mechanism for reducing prediction errors and improving generalization by combining predictions of multiple models performing the same task (Russakovsky et al., 2015; Sagi & Rokach) , early research has shown the ineffectiveness of mutiple ensemble-based defenses against adversarial examples and even has gone further to suggest that ensembles are only as robust as their weak components (He et al., 2017) . If this claim is true, it basically defeats the purpose of using an ensemble, which is building a strong model out of weaker ones. Nevertheless, research continued to investigate the usage of ensembles as a defense mechanism (Pang et al., 2019; Verma & Swami; Sen et al.) . However, recent attempts have been quickly shown ineffective (Tramèr et al.; Croce & Hein, 2020) . We believe that one of the primary reasons for the weakness of ensemble defenses is the inter-model attack transferability. It was shown that even fundamentally different models could exhibit high attack transferability rate (Papernot et al., 2017; Kurakin et al., 2018) . This phenomenon hindered the consideration of ensemble learning as a strong defense mechanism on its own. The reason is that if the member models are not robust, and they exhibit high attack transferability rate, attacks generated from one model can attack the rest, and hence, attack the entire ensemble. In this work, we show that it is possible to circumvent the problem above and instill diversity among ensemble members via the employment of specially initialized and optimized distance-map-layers (DMLs), and hence throttle the inter-model attack transferability. Moreover, we demonstrate that DMLs provide spontaneous regularization of the Lipschitz constant, and therefore further boost the robustness. The rest of this paper is organized as follows: We first discuss an overview of relevant recent works on ensemble defenses against adversarial examples and background information. Then we introduce a distance map layer based on Mahalanobis distance, and also explain the threat models considered in this work. We describe the creation of ensemble of DML-based individual models. Afterwards, we introduce a randomized version of DML. Finally, we evaluate the robustness of our ensemble model over MNIST, CIFAR-10 and RESISC-45 datasets.

