EFFECTIVE AND EFFICIENT VOTE ATTACK ON CAP-SULE NETWORKS

Abstract

Standard Convolutional Neural Networks (CNNs) can be easily fooled by images with small quasi-imperceptible artificial perturbations. As alternatives to CNNs, the recently proposed Capsule Networks (CapsNets) are shown to be more robust to white-box attacks than CNNs under popular attack protocols. Besides, the class-conditional reconstruction part of CapsNets is also used to detect adversarial examples. In this work, we investigate the adversarial robustness of CapsNets, especially how the inner workings of CapsNets change when the output capsules are attacked. The first observation is that adversarial examples misled CapsNets by manipulating the votes from primary capsules. Another observation is the high computational cost, when we directly apply multi-step attack methods designed for CNNs to attack CapsNets, due to the computationally expensive routing mechanism. Motivated by these two observations, we propose a novel vote attack where we attack votes of CapsNets directly. Our vote attack is not only effective but also efficient by circumventing the routing process. Furthermore, we integrate our vote attack into the detection-aware attack paradigm, which can successfully bypass the class-conditional reconstruction based detection method. Extensive experiments demonstrate the superior attack performance of our vote attack on CapsNets.

1. INTRODUCTION

A hardly perceptible small artificial perturbation can cause Convolutional Neural Networks (CNNs) to misclassify an image. Such vulnerability of CNNs can pose potential threats to security-sensitive applications, e.g., face verification (Sharif et al., 2016) and autonomous driving (Eykholt et al., 2018) . Besides, the existence of adversarial images demonstrates that the object recognition process in CNNs is dramatically different from that in human brains. Hence, the adversarial examples have received increasing attention since it was introduced (Szegedy et al., 2014; Goodfellow et al., 2015) . Many works show that network architectures play an important role in adversarial robustness (Madry et al., 2018; Su et al., 2018; Xie & Yuille, 2020; Guo et al., 2020) . As alternatives to CNNs, Capsule Networks (CapsNets) have also been explored to resist adversarial images since they are more biologically inspired (Sabour et al., 2017) . The CapsNet architectures are significantly different from those of CNNs. Under popular attack protocols, CapsNets are shown to be more robust to white-box attacks than counter-part CNNs (Hinton et al., 2018; Hahn et al., 2019) . Furthermore, the reconstruction part of CapsNets is also applied to detect adversarial images (Qin et al., 2020) . In image classifications, CapsNets first extract primary capsules from the pixel intensities and transform them to make votes. The votes reach an agreement via an iterative routing process. It is not clear how these components change when CapsNets are attacked. By attacking output capsules directly, the robust accuracy of CapsNets is 17.3%, while it is reduced to 0 on the counter-part CNNs in the same setting. Additionally, it is computationally expensive to apply multi-step attacks (e.g., PGD (Madry et al., 2018) ) to CapsNets directly, due to the costly routing mechanism. The two observations motivate us to propose an effective and efficient vote attack on CapsNets. The contributions of our work can be summarised as follows: 1). We investigate the inner working changes of CapsNets when output capsules are attacked; 2). Motivated by the findings, we propose an effective and efficient vote attack; 3). We integrate the vote attack in the detection-aware attack to bypass class-conditional reconstruction based adversarial detection. The next section introduces background knowledge and related work. Sec. 3 and 4 investigate capsule attack and introduce our vote attack, respectively. The last two sections show experiments and our conclusions.

2. BACKGROUND KNOWLEDGE AND RELATED WORK

Capsule Networks The overview of CapsNets is shown in Figure 1 . CapsNets first extract primary capsules u u u i from the input image x x x with pure convolutional layers (or CNN backbones). Each primary capsule u u u i is then transformed to make votes for high-level capsules. The voting process, also called transformation process, is formulated as û û ûj|i = W W W ij u u u i . Next, a dynamic routing process is applied to identify weights c ij for the votes û û ûj|i , with i ∈ {1, 2, . . . , N } corresponding to indices of primary capsules and j ∈ {1, 2, . . . , M } to indices of high-level capsules. Specifically, the routing process iterates over the following three steps s s s (t) j = N i c (t) ij û û ûj|i , v v v (t) j = g(s s s (t) j ), c (t+1) ij = exp(bij + t r=1 v v v (r) j û û ûj|i ) k exp(b ik + t r=1 v v v (r) k û û ûk|i ) , where the superscript t indicates the index of iterations starting from 1, and g(•) is a squashing function (Sabour et al., 2017) that maps the length of the vector s s s j into the range of [0, 1). The b ik is the log prior probability. Note that the routing process is the most expensive part of CapsNets. The final output capsules are computed as v v v j = g( N i=1 c ij * û û ûj|i ) where c ij is the output of the last routing iteration. The output capsules are represented by vectors, the length of which indicates the confidence of the entitys' existence. In the training phase, the class-conditional reconstruction net reconstructs the input image from the capsule corresponding to the ground-truth class t, i.e., x x x = r(v v v t ). The reconstruction error d(x x x, x x x) = x x x -x x x 2 works as a regularization term. All above notations will be used across this manuscript. To improve CapsNets (Sabour et al., 2017) , various routing mechanisms have been proposed, such as (Hinton et al., 2018; Zhang et al., 2018; Hahn et al., 2019; Tsai et al., 2020) . The advanced techniques of building CNNs or GNNs have also been integrated into CapsNets successfully. For example, the multi-head attention-based graph pooling is applied to replace the routing mechanism (Gu & Tresp, 2020b) . The CNN backbones are applied to extract more accurate primary capsules (Rajasegaran et al., 2019; Phaye et al., 2018) . To understand CapsNets, (Gu & Tresp, 2020a) investigates the contribution of dynamic routing to the input affine transformation robustness. This work focuses on its contribution to the adversarial robustness. (Hinton et al., 2018; Hahn et al., 2019) demonstrated the high adversarial robustness of CapsNets. However, it has been shown in (Michels et al., 2019) that the robustness does not hold for all attacks. In addition, many defense strategies proposed for CNNs are circumvented by later defense-aware white-box attacks (Athalye et al., 2018) . Given the previous research line, we argue that it is necessary to explore CapsNet architecture-aware attacks, before we give any claim on the robustness of CapsNets. To the best of our knowledge, there is no attack specifically designed for CapsNets in current literature. Adversarial Attacks Given the outputs f (x x x) of an input in a CNN, attacks fool the model by creating perturbations to increase the loss L(f (x x x + δ δ δ), y y y) where L(•) is the standard cross-entropy loss and δ δ δ indicates a p -bounded perturbation. The one-step Fast Gradient Sign Method (FGSM (Goodfellow et al., 2015) ) creates perturbations as δ δ δ = • sign(∇ δ δ δ L(f (x x x + δ δ δ), y y y)). (3) The multi-step Projected Gradient Descent (PGD (Madry et al., 2018) ), is defined as δ δ δ ← clip (δ δ δ + α • sign(∇ δ δ δ L(f (x x x + δ δ δ), y y y))). Other popular multi-step attacks also include Basic Iteractive Method (BIM (Kurakin et al., 2017) ) Momentum Iterative Method (MIM (Dong et al., 2018) ). Besides, C&W attack (Carlini & Wagner, 2017b) and Deepfool (Moosavi-Dezfooli et al., 2016) are popular strong attacks on the 2 -norm constraint. Adversarial Detection Besides adversarial attack and defense (Madry et al., 2018; Chen et al., 2020; Li et al., 2020) , adversarial detection has also received much attention (Xu et al., 2017; Ma et al., 2020) . Many CNN-based adversarial detection methods were easily bypassed by constructing new loss functions (Carlini & Wagner, 2017a) . Adversarial images are not easily detected. The most recent work (Qin et al., 2020) leverages the class-conditional reconstruction net of CapsNets to detect adversarial images. Given any input x x x, the predictions and the corresponding capsule are f (x x x) and V V V , respectively. The input is flagged as an adversarial image, if the reconstruction error is bigger than a pre-defined threshold r(v v v p ) -x x x 2 > θ where p = arg max f (x x x) is the predicted class. The reconstruction net r(•) reconstructed the input from the capsule v v v p of the predicted class. The choice of θ involves a trade-off between false positive and false negative detection rates. Instead of tuning this parameter, the work (Qin et al., 2020) simply sets it as the 95th percentile of benign validation distances. A strong detection-aware reconstructive attack is also proposed to verify the effectiveness of the proposed detection method in (Qin et al., 2020) . The reconstructive attack is a two-stage optimization method where it first creates a perturbation δ δ δ to fool the prediction as in Equation ( 5), and updates the perturbation further to reduce the reconstruction error as in Equation ( 6), δ δ δ ← clip (δ δ δ + α • β • sign(∇ δ δ δ L(f (x x x + δ δ δ), y y y))), (5) δ δ δ ← clip (δ δ δ + α • (1 -β) • sign(∇ δ δ δ r(v v v f (x x x) ), x x x 2 ), where α is the step size, and β is a hyper-parameter to balance the losses in the two stages.

3. CAPSULE ATTACK ON CAPSULE NETWORKS

Attack Formulation. In CNNs, under certain constraints, the adversary finds adversarial perturbation of an instance by maximizing the classification loss. In CapsNets, the length of output capsules corresponds to the output probability of the classes. Similarly, the adversarial perturbation can be obtained by first mapping the length of output capsules to logits Z(x x x) j = log( v v v j 2 ) and solving the maximization problem in Equation ( 7). In this formulation, output capsules are attacked directly, which is called Caps-Attack. δ δ δ * = arg max δ δ δ∈N H(Z(x x x + δ δ δ), y y y) = L(softmax(Z(x x x + δ δ δ)), y y y), where N = {δ δ δ : δ δ δ p ≤ } with > 0 being the maximal perturbation. This optimization problem can be naturally solved using the algorithm designed for the attack against CNNs, such as FGSM (see Equation ( 3)) (Goodfellow et al., 2015) and PGD (see Equation ( 4)) (Madry et al., 2018) . Analysis. In CapsNets, the primary capsule u u u i can make a positive or negative vote for the j-th class or abstain from voting. It depends on the relationship between v v v j and û û ûj|i . The vote from u u u i for the j-th class is positive if cos(v v v j , û û ûj|i ) > 0, otherwise negative if cos(v v v j , û û ûj|i ) < 0. The similarity value cos(v v v j , û û ûj|i ) = 0 corresponds to abstention of the primary capsules. -1.0 -0.5 0 0.5 1.0 The left-to-right columns correspond to statistics of predictions on clean images, under Caps-Attack, and under Vote-Attack, respectively. The first row corresponds to the statistics on ground-truth classes, and the second row corresponds to the classes with the largest output probabilities that are not ground-truth (L-NGT) classes. In each subplot, the x-axis indicates the cosine similarity value between the vote û û ûj|i and the output capsule v v v j . The blue histogram shows the percentage of votes falling in bins divided by the similarity values in x-axis. The green histogram corresponds to the strength of votes (the averaged length of the votes û û ûj|i ). The red curve presents the averaged weight (i.e., c ij , see Equation ( 2)) of votes at each bin. Please refer to the main context for more in-depth analysis of this figure. How do the votes change when CapsNets are attacked by adversarial images? We investigate this question with experiments and visualize the results. We firstly train a CapsNet with Dynamic Routing (DR-CapsNet) (Sabour et al., 2017) on the CIFAR10 dataset (Krizhevsky et al., 2009) . With the standard well-trained DR-CapsNet (92.8% test accuracy), we classify all clean images in the test dataset and extract all votes û û ûj|i and output capsules v v v j of the ground-truth (GT) classes. We compute cos(v v v j , û û ûj|i ) in all classifications and split them into 100 equal-width bins in the range of [-1, 1] . In each bin, we compute the averaged length of all û û ûj|i and average of all coupling coefficients c ij therein. Note that c ij identified by the routing process stands for the weights of the vote û û ûj|i . The results are visualized in Figure 2a . The majority of primary capsules make positive votes (more votes with positive similarity values in blue bins). To obtain adversarial images, we apply PGD attack to the clean image classifications on the DR-CapsNet where 17.3% robust accuracy is obtained. Similarly, we extract corresponding information from the classifications of adversarial images on the ground-truth class and visualize the results in Figure 2b . The votes corresponding to cos(v v v j , û û ûj|i ) ≈ 0 are invalid since they have only tiny impact on final prediction. The adversarial images make votes invalid by manipulating the votes and the weights of them. Concretely, the votes on adversarial images are û û û j|i . The voting weights identified by the routing process are c ij . Both are manipulated by adversarial images so that the output capsule v v v j = N i=1 c ij * û û û j|i is orthogonal to most votes û û û j|i . Namely, the adversarial images make the majority of votes invalid for the ground-truth class (the concentration of votes around the zero). To understand how votes change on non-ground-truth classes, we also visualize the corresponding information on the classes with the Largest output probabilities that are Not Ground-Truth classes (L-NGT classes) in Figure 2d and 2e . We mark differences between the two plots with dashed gray boxes. We can observe that the votes for L-NGT classes become stronger since both the coupling coefficients (the red line) and the strength of their positive votes (the green bins) become larger. Drawbacks. The above analysis explains why the attack method originally designed for CNNs still works for CapsNets. The first drawback of Caps-Attack is its limited effectiveness. As will be shown in later experiments, under the same attack method, CapsNets are much more robust than CNNs. Since the routing process is the main difference between CapsNets and CNNs, we attribute the higher robustness of CapsNets to the conjecture that the routing process obfuscates the gradients used to generate adversarial examples. One intuitive way to mitigate it is to approximate the routing process, e.g., with Backward Pass Differentiable Approximation (BPDA) (Athalye et al., 2018) . However, it is non-trivial to approximate the routing process with several routing iterations. The second drawback of Caps-Attack is the low efficiency. The widely used multi-step gradientbased attacks require many times forward and backward passes on the whole CapsNet to generate adversarial examples, e.g., under PGD attack. Caps-Attack are computationally expensive due to the costly iterative routing mechanism of CapsNets.

4. VOTE ATTACK ON CAPSULE NETWORKS

The above two drawbacks of Caps-Attack inspire us that it is necessary to develop adversarial attack methods specifically for CapsNets, rather than directly applying the attack methods designed for CNNs to attack CapsNets. In this work, we propose to directly attack the votes (see Equation ( 8)) rather than the final output capsules of CapsNets, dubbed Vote-Attack. The behind rationale is that the vote û û ûj|i exactly corresponds to the output class j, though it is an intermediate activation of CapsNets. Besides, when the votes from primary capsules are attacked, the corresponding weights (i.e., c ij , see Equation ( 2)) identified by the routing process will also be changed. Thus, the attacked votes could mislead the corresponding outputs of CapsNets. Specifically, given an input-label pair (x x x, y y y), the N votes from primary capsules are û û û-|i = f i v (x x x) where i ∈ {1, 2, . . . , N }. The average of the N votes is first computed and then squashed wtih the squashing fucntion g(•). The vector lengths of the squashed one correspond to output probabilities. Formally, the Vote-Attack on x x x is defined as δ δ δ * = arg max δ δ δ∈N H(log(g( 1 N N i=1 f i v (x x x + δ δ δ))), y y y). In the formulation above, we first average the votes and squash the averaged vote. There are two intuitive variants of the proposed Vote-attack. The one is to first squash their votes and then average the squashed votes. The other is to average the loss caused by all votes. Instead of opimizing on the loss computed on the squahed averaged vote, we can compute the loss of individual vote seperatedly and average them. More details about these two variants of our Vote-Attack can be found in Appendix A. The maximization problem of Equation ( 8) can be approximately solved with popular attack method, e.g., PGD attack. When PGD is taken as the underlying attack, the proposed Vote-Attack method can reduce the robust accuracy of DR-CapsNets from 17.3% (with Caps-Attack) to 4.83%. Our Vote-Attack can also be extended to targeted attack by simply modifying the attack loss function of Equation ( 8) into δ δ δ * = arg max δ δ δ∈N l(log(g( 1 N N i=1 f i v (x x x + δ δ δ))), t t t) where t t t is the target class. Analysis. We also visualize the votes on the adversarial images created by our Vote-Attack. On the GT classes (see Figure 2c ), our Vote-Attack increase the negative votes and decrease the positive votes, when compared to Caps-Attack in Figure 2b . On the L-NGT classes, the positive votes are strengthened further by our Vote-Attack, which leads to more misclassifications. See the difference in dashed gray boxes, where both the length of positive votes and the weights become larger (where the similarity values are about 1.0). Advantages. It is interesting to find that the proposed Vote-Attack could alleviate the drawbacks of CapsNets. Firstly, since the routing process is excluded, Vote-Attack could mitigate the gradient obfuscation when computing the gradient to generate adversarial samples. Hence, the attack performance of Vote-Attack is expected to be higher than Caps-Attack. Secondly, since the costly routing process is removed from the attack method, Vote-Attack will be more efficient than Caps-Attack.

5. EXPERIMENTS

In this section, we verify our proposal via empirical experiments. We first show the effectiveness of Vote-Attack on CapsNets in the regular training scheme and the adversarial training one. We also show the efficiency of Vote-Attack. Besides, we apply Vote-Attack to bypass the recently proposed CapsNet-based adversarial detection method. All the reported scores are averaged over 5 runs.

5.1. EFFECTIVENESS OF VOTE ATTACK ON CAPSNETS

Models: We take ResNet18 as a CNN baseline. In couter-part CapsNets, we apply resnet18 backbone to extract primary capsules u u u ∈ (64 × 4 × 4, 8) where the outputs of the backbone are feature maps of the shape (512, 4, 4) and 64 is the number of capsule groups, 8 is the primary capsule size. The primary capsules are transformed to make 64 × 4 × 4 votes û û û ∈ (64 × 4 × 4, 10, 16) with the learned transformation matrices W W W ∈ (64 × 4 × 4, 8, 160). The size of output capsule is 16, and 10 are the number of output classes. The votes û û û reach an agreement v v v ∈ (10, 16) via the dynamic routing meachnism. The length of 10 output capsules are the probabilites of 10 output classes.

Datasets:

The popular datasets CIFAR10 (Krizhevsky et al., 2009) and SVHN (Netzer et al., 2011) are used in this experiment. The standard preprocess is applied on CIFAR10 for training: 4 pixels are padded on an input of 32 × 32, and a 32 × 32 crop is randomly sampled from the padded image or its horizontal flip. For ∞ -based attacks, the perturbation range is 0.031 (CIFAR10) and 0.047 (SVHN) for pixels ranging in [0, 1]. For 2 -based attacks, the 2 norm of the allowed maximal perturbation is 1.0 for both datasets.

White-Box Attacks

We train CNNs and CapsNets with the same standard training scheme where the models are trained with a batch size of 256 for 80 epochs using SGD with an initial learning rate of 0.1 and moment 0.9. The learning rate is set to 0.01 from the 50-th epoch. We apply popular ∞ -based attacks (FGSM (Goodfellow et al., 2015) , BIM (Kurakin et al., 2017) , MIM (Dong et al., 2018) ,PGD (Madry et al., 2018) ) and 2 -based attacks (C&W attack (Carlini & Wagner, 2017b) , Deepfool (Moosavi-Dezfooli et al., 2016) ) to attack the well-trained models. The hyper-parameters mainly follow the Foolbox tool (Rauber et al., 2017) . In CapsNets, Capsules and Votes are taken as targets to attack, respectively. CapsNets are shown to be robust to input affine transformation (Sabour et al., 2017; Gu & Tresp, 2020a) . When inputs are affine transformed, the votes in CapsNets also change correspondingly. We also verify the effectiveness of Vote-Attack in case of affine transformed inputs. We consider two cases: 1) The CapsNet built on standard convolutional layers (Sabour et al., 2017) is trained on MNIST dataset and tested on AffNIST dataset. 2) The CapsNet built on a backbone (i.e. ResNet18) is trained on the standard CIFAR10 training dataset and tested on affine-transformed CIFAR10 test images. In both cases, our Vote-Attack achieves higher attack success rates than Caps-Attack. More details about this experiment can be found in Appendix E. This experiment shows that our Vote-Attack is more effective than Caps-Attack when the inputs are affine-transformed. Under Vote-Attack, the robust accuracy of CapsNets is still higher than that of counter-part CNNs. However, we did claim CapsNets are more robust for two reasons. 1) CapsNets possess more network parameters due to transformation matrices. 2) The potential attacks can reduce the robust accuracy further. This study demonstrates that the high adversarial robustness of CapsNets can be a fake sense, and we should be careful to draw any conclusion about the robustness of CapsNets. 

5.2. EFFICIENCY OF VOTE ATTACK ON CAPSNETS

In the last subsection, we demonstrate the effectiveness of Vote-Attack from different perspectives. We now show the efficiency of Vote-Attack. In our Vote-Attack, no routing process is involved in both forward inferences and gradient backpropagations. To show the efficiency of Vote-Attack empirically, we record the time required by each attack to create a single adversarial example and average them across the CIFAR10 test dataset. A single Nvidia V100 GPU is used. The required time is reported in Table 3 . The time on SVHN dataset is almost the same as in CI-FAR10 since both input space dimensions are the same (i.e., 32, 32, 3) . The column corresponding to A std shows the time required to classify a single input image. Compared to the logit attack in CNNs, Caps-Attack in CapsNets requires more time to create adversarial examples since the dy-namic routing is computationally expensive. Our Vote-Attack can create adversarial images without using the routing part, and reduce the required time significantly. However, the required time is still more than that on CNNs. The reason behind this is that the current deep learning framework is highly optimized on the convolutional operations, less on the voting process. 

5.3. BYPASSING CLASS-CONDITIONAL CAPSULE RECONSTRUCTION BASED DETECTION

In this experiment, we demonstrate that class-conditional capsule reconstruction based detection can be bypassed by integrating our Vote-Attack in the detection-aware attack method. Following the work (Qin et al., 2020) , we use the original CpasNet architecture (Sabour et al., 2017) for this experiment. The architecture details are shown as follows. CapsNets, two standard convolutional layers, Conv1(C256, K9, S1), Conv2(C256, K9, S2), are used to extract primary capsules of shape (32×6×6, 8). The output capsule of shape (10, 16) can be obtained after the dynamic routing process. The output capsules will be taken as input for a reconstruction net with (FC160-FC512-FC1024-FC28×28). In the reconstruction process, only one of the output capsules is activated, others are masked with zeros. Since the input contains the class information, the reconstruction is class-conditional. The capsules corresponding to the ground-truth class will be activated during training, while the winning capsule (the one with maximal length) will be activated in the test phase. Two CNN baseline models are considered. CNN+CR uses the same architecture without routing and group 160 activations into 10 groups where the sum of 16 activations is taken as a logit. The same class-conditional reconstruction mechanism is used. CNN+R does not group 160 activations and reconstructs the input from activations without a masking mechanism. More details of the baseline models can be found in (Qin et al., 2020) . Given an input, it will be flagged as adversarial examples if its reconstruction error is bigger than a given threshold d(x x x, x x x) > θ. Following (Qin et al., 2020), we set θ as 95th percentile of reconstruction errors of benign validation images, namely, 5% False postive rate. We report Success Rate S = 1 K N i (f (x x x + δ δ δ) = y) and Undetected Rate R = 1 K N i (f (x x x + δ δ δ) = y) ∩ (d(x x x, x x x) ≤ θ). Both detection-agnostic and detection-aware attacks introduced in Sec. 2 are considered. The results on FMNIST dataset are reported in Table 4 . In detection-agnostic attacks, we apply our Vote-Attack to attack CapsNets directly without considering the detection mechanism. The CapsNet used in this experiment is built on standard convolutional layers instead of backbones in previous experiments. Our Vote-Attack still achieve a higher success rate than Caps-Attack. It indicates that the Vote-Attack is effective across different architectures. Furthermore, the undetected rate is also increased correspondingly. In detection-aware attacks, the integration of our Vote-Attack increases the attack success rate and undetected rate significantly. More results on MNIST and SVHN datasets are shown in Appendix F. Under the class-conditional capsule reconstruction based detection, some of the undetected examples are not imperceptible anymore, as shown in (Qin et al., 2020) . Some images are flipped into the attack target classes when attacked, although a small threshold is applied. Some images are hard to flip, e.g., the ones with a big digit or thin strokes. We also visualize the adversarial examples created by Caps-Attack and our Vote-Attack in Figure 3 . More figures and details are shown in Appendix G. We find that there is no obvious visual difference between the adversarial examples created by the two attacks. This finding echoes a previous experiment, where we compute the different norms (i.e., the 0 , 1 , 2 norms) of the created perturbations. The idea of attacking votes of CapsNet can also be applied to different versions of CapsNets. However, some adaptions are required since different CapsNet versions can have significantly different architectures. For instance, in EM-CapsNet (Hinton et al., 2018) , a capsule corresponding to an entity are represented by a matrix, and the confidence of the entity's existence is represented by the activation of a single neuron. The possible adaption could be attacking votes by flipping the neuron activations that represents the existence of entities. Recently, many capsule networks have been proposed, to name a few (Hinton et al., 2018; Zhang et al., 2018; Rawlinson et al., 2018; Hahn et al., 2019; Ahmed & Torresani, 2019; Gu & Tresp, 2020a; Tsai et al., 2020; Ribeiro et al., 2020) . We leave the further exploration on different versions of CapsNet in future work. Even though CapsNets still seem to be more robust than counter-part CNNs under our stronger Vote-Attack, it is too early to draw such a conclusion. We conjecture that the robust accuracy of CapsNets can be reduced further. In future work, we will explore more strong attacks as well as the certifications to compare the robustness of CNNs and CapsNets. 

E ADVERSARIAL ROBUSTNESS ON AFFINE-TRANSFORMED DATA

CapsNets learn equivariant visual representations. When inputs are affine transformed, the votes also changes correspondingly. In this experiment, we aim to verify the effectiveness of Vote-Attack when inputs and their votes in Capsnets changed. The model is trained the same as before. We translate the test images with 2 pixels randomly and rotate the images within a given pre-defined degree. The robust accuracy of affine-transformed images is shown in Table 9 on CIFAR10 dataset. Under different rotation degrees, our Vote-Attack is still effective. It consistently reduces the robust accuracy of CapsNets, when compared to Caps-Attack. We also conduct experiments on AffNIST dataset. In this experiment, the original CapsNet architecture and the original CNN baseline in (Sabour et al., 2017) are used. The modes are trained on standard MNIST dataset and tested on AffNIST dataset. In AffNIST dataset, the MNIST images are transformed, namely, rotated, translated, scaled, or sheared. More details about this dataset are in this resourcefoot_0 . The perturbation threshold and the attack step size are set to 0.3 and 0.01, respectively. The other hyper-parameters are defaults in the Foolbox tool (Rauber et al., 2017) . The test accuracy on the untransformed test dataset (A std ), the accuracy on the transformed dataset (A af f ) and the robust accuracy under different attacks are reported in Table 10 . Our Vote-Attack achieve higher attack success rates than Caps-Attack. 6) (i.e., the second attack step in detection-aware attack method). Some images are flipped, and some hard ones are not. The images in the third subfigure are the adversarial images created by Vote-Attack. There is no obvious visual difference between the adversarial examples created by the two attacks. To be noted that the images are randomly selected (not cherry picked).



https://www.cs.toronto.edu/ tijmen/affNIST/



Figure 1: The overview of Capsule Networks: the CapsNet architecture consists of four components, i.e., primary capsule extraction, voting, routing, and class-conditional reconstruction.

Figure2: The left-to-right columns correspond to statistics of predictions on clean images, under Caps-Attack, and under Vote-Attack, respectively. The first row corresponds to the statistics on ground-truth classes, and the second row corresponds to the classes with the largest output probabilities that are not ground-truth (L-NGT) classes. In each subplot, the x-axis indicates the cosine similarity value between the vote û û ûj|i and the output capsule v v v j . The blue histogram shows the percentage of votes falling in bins divided by the similarity values in x-axis. The green histogram corresponds to the strength of votes (the averaged length of the votes û û ûj|i ). The red curve presents the averaged weight (i.e., c ij , see Equation (2)) of votes at each bin. Please refer to the main context for more in-depth analysis of this figure.

Figure 3: This figure shows the clean images and the corresponding adversarial images created by Caps-Attack and Vote-Attack in a targeted setting. The attack target class is set to the digit 0. The adversarial images created by the two attack methods are visually similar. The observation also echoes the previous findings in Appendix B, where we show that the perturbations created by Caps-Attack and Vote-Attack have similar norms.

Figure 4: The first subfigure shows clean test images of the SVHN dataset. The second subfigure shows the adversarial images created by Caps-Attack. Different rows correspond to different weights to reduce reconstruction error in Equation (6) (i.e., the second attack step in detection-aware attack method). Some images are flipped, and some hard ones are not. The images in the third subfigure are the adversarial images created by Vote-Attack. There is no obvious visual difference between the adversarial examples created by the two attacks. To be noted that the images are randomly selected (not cherry picked).

The robust accuracy of ResNets and CapsNets are shown under popular attacks on CIFAR10 and SVHN datasets. Vote-Attack is much more effective than Caps-Attack and compatible with different underlying attacks.The standard test accuracy and the robust accuracy under different attacks are reported in Table1. The CapsNets and the counter-part CNNs achieve similar performance on normal test data. The strong attack PGD can mislead all the classifications of ResNet. However, it is less effective to attack output capsules. Our Vote-Attack can reduce the robust accuracy of CapsNets significantly across different attack methods. We also check the 0 , The transferability of the created adversarial examples is investigated in Appendix D. The adversarial examples created by Vote-attack are more transferable than the ones by Caps-attack.

In this experiment, we verify the effectiveness of Vote-Attack in the context of Adversarial Training. We train models with adversarial examples created by Caps-Attack where PGD with 8 iterations is used. For training a more robust model, we also combine Vote-Attack and Caps-Attack to create adversarial examples where a new loss from the two attacks is used.

The robustness of CapsNets with different training schemes on CIFAR10 and SVHN datasets: Vote-Attack is also effective to attack models with adversarial training; It can also be applied to improve adversarial training.

The averaged time required by each attack to create an adversarial example is reported on CIFAR10 test dataset. Vote-Attack requiring less time is more efficient than Caps-Attacks.

Different attacks are applied to circumvent the class-conditional reconstruction adversarial detection method on FMNIST dataset. The attack success rate and undetected rate (S/R) are reported for each attack. The integration of Vote-Attack in the detection-aware attack increases both the attack success rate and the undetected rate significantly.

dive into the inner working of CapsNets and show how it is affected by adversarial examples. Our investigation reveals that adversarial examples can mislead CapsNets by manipulating the votes. Based on the investigation analysis, we propose an effective and efficient Vote-Attack to attack Cap-sNets. The Vote-Attack is more effective and efficient than Caps-Attack in both standard training and adversarial training settings. Furthermore, Vote-Attack also demonstrates the superiority in terms of the transferability of adversarial examples as well as the adversarial robustness on affine-transformed data. Last but not least, we apply our Vote-Attack to increase the undetected rate significantly of the class-conditional capsule reconstruction based adversarial detection.

The 0 , 1 , and 2 norms of perturbations created by different attacks are shown on CI-FAR10 dataset. Overall, the perturbations created by our Vote-Attack have similar norms to the ones by Caps-Attack.

The 0 , 1 , and 2 norms of perturbations created by different attacks are shown on SVHN dataset. In ∞ -attack methods, the perturbations created by our Vote-Attack have similar norms to the ones by Caps-Attack. In 2 -attack methods, our Vote-attack can find smaller perturbations to fool the underlying classifier. also investigate the transferability of adversarial examples created by Caps-Attack and Vote-Attack on CIFAR10 dataset. We consider three models, VGG19(Simonyan & Zisserman, 2015), ResNet18 and CapsNets. The PGD is used as the underlying attack. We measure the transferability using Transfer Sucess Rate (TSR).The TSR of different adversarial examples is reported in Table8. The adversarial examples created on CNNs are more transferable. Especially, the ones created on ResNet18 can be transferred to CapsNets very well. The reason behind this is that CapsNets also the ResNet18 bone to extract primary capsules. By comparing the last two columns in Table8, we can observe that the adversarial example created by Vote-Attack is more transferable than the ones created by Caps-Attack.

The targeted attack success rates (%) are shown on CIFAR10 and SVHN datasets. In the targeted attack setting, our Vote-Attack is significantly more effective than Caps-Attack when combined with popular attacks.On SVHN Dataset, the model accuracy are ResNet 94.46(±0.14) and CapsNet 94.16(±0.02).

The transferability of adversarial examples created on CNNs and CapsNets on CIFAR10 dataset: the ones created on CNNs are more transferable than on CapsNets; the ones created with Vote-Attack are more transferable than the ones with Caps-Attack.

When inputs are affine-transformed in CIFAR10 dataset, the Vote-Attack is still more effective to create adversarial examples than Caps-Attack. ±15• ) (±2, ±30 • ) (±2, ±60 • ) (±2, ±90 • )

A TWO VARIANTS OF VOTE ATTACK

We have another two choices when attacking votes in CapsNet directly. Choices 1: In Equation (8), we first average the votes and squash the averaged vote. Another choice is to first squash their votes and then average the squashed votes. Our experiments show that this option is similarly effective. g(f i v (x x x + δ δ δ))), y y y).Choices 2: Another choice is to average the loss caused by all votes. Instead of opimizing on the loss computed on the squahed averaged vote, we can compute the loss of individual vote seperatedly and average them, namely,The loss of each vote can differ from each other significantly. The large part of loss can be caused by a small part of votes. In other words, the gradients of received by the input can be caused mainly by a few too strong votes. This choice is less effective, compared to the one in Equation ( 8).We use the same emperimental setting as in Sec. 5. Under the same PGD attack on CIFAR10 dataset, the robust accuracy corresponding to the choice 1 is 4.06(±1.12), and it is effective, similar to Equation ( 8). The choice 2 with the robust accuracy 43.31(±2.46) does not work well since the gradients received by inputs are dominated only by a small part of votes.

B NORMS OF PERTURBATIONS CREATED BY DIFFERENT ATTACKS

On CIFAR10 and SVHN datasets, we compute the different norms of perturbations created by different attacks. On each dataset, we first select the examples that are successfully attacked by both Vote-Attack and Caps-Attack on CapsNets as well as the corresponding attack on ResNets from the test dataset. Then, we obtain the created perturbations created by the corresponding attacks. The 0 , 1 and 2 norm of perturbations are shown in Table 5 on CIFAR10 dataset and Table 6 on SVHN dataset.In most cases, Vote-Attack and Caps-Attack create perturbations with similar norms. Under BIM attack, we can observe that 1 and 2 norms corresponding to Vote-Attack are higher than the ones to Caps-Attack. Both are smaller than the ones corresponding to other multi-step attacks (e.g., PGD). The reason behind this is that the BIM attack does not converge since only 10 iterations are used by default in FoolBox tool (50 iterations in PGD). Given the same iterations before the convergence, Vote-Attack accumulates the relatively consistent gradients. Vote-Attack converges faster than Caps-Attack, which explains our observation.In addition, the 2 attack find the minimal perturbations to misled the classifier. The different norms of perturbations is small. Our Vote-Attack finds samller perturbations in SVHN dataset and similar ones in CIFAR10 dataset. This obervation indicate that the performance of attack method can also depend on the datasets.

C VOTE TARGETED ATTACK

We create adversarial examples in targeted attack settings on CIFAR10 and SVHN datasets. The used models are the same as in the untargeted setting. The target classes are selected uniformly at random from the non-ground-truth classes. The attack is successful if the created adversarial examples are classified as the corresponding target classes by the underlying classifier.The attack success rate (%) is reported in Table 7 . In the targeted attack setting, our Vote-Attack achieves a significantly higher attack success rate than Caps-Attack. This exeriment show that our Vote-Attack is still effective when extended to the targeted attack setting.Published as a conference paper at ICLR 2021 The integration of our Vote-attack into detection-aware attack is effective to bypass the classconditional reconstruction detection method. To verify this, we also conduct experiments on different datasets, such as MNIST and SVHN. The results are reported in Table 11 . On the All three datasets, both detection-aware and detection-agnostic attacks achieve high attack success rate and undetected rate, when combined with our Vote-attack.Table 11 : Different attacks are applied to circumvent the class-conditional reconstruction adversarial detection method. The attack success rate and undetected rate (S/R) are reported for each attack. On all the three popular datasets, the integration of Vote-Attack in the detection-aware attack increases both the attack success rate and the undetected rate significantly. 

