

Abstract

We propose an ensemble-based defense against adversarial examples using distance map layers (DMLs). Similar to fully connected layers, DMLs can be used to output logits for a multi-class classification model. We show in this paper how DMLs can be deployed to prevent transferability of attacks across ensemble members by adapting pairwise (almost) orthogonal covariance matrices. We also illustrate how DMLs provide an efficient way to regularize the Lipschitz constant of the ensemble's member models, which further boosts the resulting robustness. Through empirical evaluations across multiple datasets and attack models, we demonstrate that the ensembles based on DMLs can achieve high benign accuracy while exhibiting robustness against adversarial attacks using multiple white-box techniques along with AutoAttack.

1. I

Ongoing research has provided defenses against adversarial examples, which are crafted from correctly classified inputs with imperceptible perturbations. Despite the success of ensemble learning as a mechanism for reducing prediction errors and improving generalization by combining predictions of multiple models performing the same task (Russakovsky et al., 2015; Sagi & Rokach) , early research has shown the ineffectiveness of mutiple ensemble-based defenses against adversarial examples and even has gone further to suggest that ensembles are only as robust as their weak components (He et al., 2017) . If this claim is true, it basically defeats the purpose of using an ensemble, which is building a strong model out of weaker ones. Nevertheless, research continued to investigate the usage of ensembles as a defense mechanism (Pang et al., 2019; Verma & Swami; Sen et al.) . However, recent attempts have been quickly shown ineffective (Tramèr et al.; Croce & Hein, 2020) . We believe that one of the primary reasons for the weakness of ensemble defenses is the inter-model attack transferability. It was shown that even fundamentally different models could exhibit high attack transferability rate (Papernot et al., 2017; Kurakin et al., 2018) . This phenomenon hindered the consideration of ensemble learning as a strong defense mechanism on its own. The reason is that if the member models are not robust, and they exhibit high attack transferability rate, attacks generated from one model can attack the rest, and hence, attack the entire ensemble. In this work, we show that it is possible to circumvent the problem above and instill diversity among ensemble members via the employment of specially initialized and optimized distance-map-layers (DMLs), and hence throttle the inter-model attack transferability. Moreover, we demonstrate that DMLs provide spontaneous regularization of the Lipschitz constant, and therefore further boost the robustness. The rest of this paper is organized as follows: We first discuss an overview of relevant recent works on ensemble defenses against adversarial examples and background information. Then we introduce a distance map layer based on Mahalanobis distance, and also explain the threat models considered in this work. We describe the creation of ensemble of DML-based individual models. Afterwards, we introduce a randomized version of DML. Finally, we evaluate the robustness of our ensemble model over MNIST, CIFAR-10 and RESISC-45 datasets.

2. B

Not long ago, researchers revealed the vulnerability of machine learning models, particularly deep neural networks (DNNs), against adversarial examples (Szegedy et al., 2014) . These models can provide incorrect predictions on examples that are slightly perturbed from correctly classified ones. The process of generating adversarial examples from natural ones is called adversarial evasion attacks (Biggio et al.) . Evasion attacks can be categorized into black-box, gray-box, and white-box attacks. In the black-box setting, the attacker does not have access to the model's parameters and the potential defense of the model (Papernot et al., 2017; Brendel et al., 2018) . In the gray-box setting, although the attacker does not have access to the model's parameter, it is aware of the defense applied in the model. In the white-box evasion, the attacker has access to both the parameters and the defense applied in the model. Therefore, the attacker can apply gradient-based attack techniques (Goodfellow et al., 2014; Carlini & Wagner, 2017; Madry et al., 2018) , or it can even design a custom made adaptive attack based on its knowledge about the model and the defense (Tramèr et al.) . From the defense perspective, white-box attacks are certainly the most challenging type of evasion attacks. Our proposed defense targets white-box attacks. Specifically, we focus on restricted perturbations of images, and, as it is typical in many research works, we consider the ∞ perturbations. Since the discovery of adversarial attacks, a significant research activity has been devoted to developing appropriate defenses (Madry et al., 2018; Zhang et al.) . However, most of the proposed defenses have been soon defeated (Tramèr et al.; Croce & Hein, 2020 ). An intuitive mechanism to defend against adversarial examples is training ensembles of other models in order to enhance their defenses (Pang et al., 2019; Verma & Swami; Sen et al.) . However, the efficacy of existing ensemble approaches is often faced with skepticism due to the transferability phenomenon (Papernot et al., 2017) , which makes the ensemble model believed to be at most as robust as its strongest constituent model (He et al., 2017) . In this paper, we promote the diversity among individual networks from a new perspective via distance map layers. Our construction is based on the assumption that defending against transferred adversarial example is effective, even if the individual models are vulnerable to direct attacks. Our approach is orthogonal to the previous approaches and can be combined with other generic weak or strong defenses to further enhance the ensemble's resistance to adversaries. Although there is no strict illustration on what is intuitively defined as diversity, in this work, we define the diversity as the highest dissimilarity between the learned features of different classes across the ensemble members. Diversity is greater when the prediction errors of individual members are highly uncorrelated (Liu & Yao, 1999a; b; Dietterich, 2000; Liu et al., 2019) . This property may lead an adversarial perturbation to fail to fool the majority of networks in the ensemble. Topologically, diversity can be depicted as the variability in the shapes of the decision boundaries and inter-class neighborhood relationships in the embedding space. Changing the shape of the decision boundaries in the embedding space implies different loss shapes. With diversity in the shape of the losses corresponding to different models, the gradient from one model would not be a good approximation (transferable) to the other model. Note that due to the high capacity of DNNs, changing the topology of the embedding space can have no or negligible effect on the model's accuracy. Using this perspective on model diversity, we will illustrate that DML-based ensembles could be designed to achieve the requirement of diversity in prediction errors through injecting dissimilarities only between the DMLs of ensemble members. Previous works increase the diversity over training data via promoting the diversity of prediction errors of ensemble members (Liu & Yao, 1999a; b; Liu et al., 2019) .

3. T

In this section, we elaborate on our model with distance map layer (DML). We start by recalling the definition of Mahalanobis distance between two point 1 , 2 ∈ : ( 1 , 2 ) = ( 1 -2 ) ( 1 -2 ), where is the inverse of the covariance matrix (referred to as the matrix for the rest of the paper), which is a positive semi-definite matrix. Lemma 1. The Mahalanobis distance is -Lipschitz continuous with = √ 2 2 , where = , and is a triangular matrix.

