IMPROVING HIERARCHICAL ADVERSARIAL ROBUST-NESS OF DEEP NEURAL NETWORKS

Abstract

Do all adversarial examples have the same consequences? An autonomous driving system misclassifying a pedestrian as a car may induce a far more dangerous -and even potentially lethal-behavior than, for instance, a car as a bus. In order to better tackle this important problematic, we introduce the concept of hierarchical adversarial robustness. Given a dataset whose classes can be grouped into coarse-level labels, we define hierarchical adversarial examples as the ones leading to a misclassification at the coarse level. To improve the resistance of neural networks to hierarchical attacks, we introduce a hierarchical adversarially robust (HAR) network design that decomposes a single classification task into one coarse and multiple fine classification tasks, before being specifically trained by adversarial defense techniques. As an alternative to an end-to-end learning approach, we show that HAR significantly improves the robustness of the network against 2 and ∞ bounded hierarchical attacks on the CIFAR-100 dataset.

1. INTRODUCTION

Deep neural networks (DNNs) are highly vulnerable to attacks based on small modification of the input to the network at test time (Szegedy et al., 2013) . Those adversarial perturbations are carefully crafted in a way that they are imperceptible to human observers, but when added to clean images, can severely degrade the accuracy of the neural network classifier. Since their discovery, there has been a vast literature proposing various attack and defence techniques for the adversarial settings (Szegedy et al., 2013; Goodfellow et al., 2014; Kurakin et al., 2016; Madry et al., 2017; Wong et al., 2020) . These methods constitute important first steps in studying adversarial robustness of neural networks. However, there exists a fundamental flaw in the way we assess a defence or an attack mechanism. That is, we overly generalize the mistakes caused by attacks. Particularly, the current approaches focuses on the scenario where different mistakes caused by the attacks are treated equally. We argue that some context do not allow mistakes to be considered equal. In CIFAR-100 (Krizhevsky et al., 2009) , it is less problematic to misclassify a pine tree as a oak tree than a fish as a truck. As such, we are motivated to propose the concept of hierarchical adversarial robustness to capture this notion. Given a dataset whose classes can be grouped into coarse labels, we define hierarchical adversarial examples as the ones leading to a misclassification at the coarse level; and we present a variant of the projected gradient descent (PGD) adversaries (Madry et al., 2017) to find hierarchical adversarial examples. Finally, we introduce a simple and principled hierarchical adversarially robust (HAR) network which decompose the end-to-end robust learning task into a single classification task into one coarse and multiple fine classification tasks, before being trained by adversarial defence techniques. Our contribution are • We introduce the concept of hierarchical adversarial examples: a special case of the standard adversarial examples which causes mistakes at the coarse level (Section 2). • We present a worst-case targeted PGD attack to find hierarchical adversarial examples. The attack iterates through all candidate fine labels until a successful misclassification into the desired target (Section 2.1). • We propose a novel architectural approach, HAR network, for improving the hierarchical adversarial robustness of deep neural networks (Section 3). We empirically show that HAR networks significantly improve the hierarchical adversarial robustness against ∞ attacks ( = 8 255 ) (Section 4) and 2 attacks ( = 0.5) (Appendix A.4) on CIFAR-100. • We benchmark using untargeted PGD20 attacks as well as the proposed iterative targeted PGD attack. In particular, we include an extensive empirical study on the improved hierarchical robustness of HAR by evaluating against attacks with varying PGD iterations and . We find that a vast majority of the misclassifications from the untargeted attack are within the same coarse label, resulting a failed hierarchical attack. The proposed iterative targeted attacks provides a better empirical representation of the hierarchical adversarial robustness of the model (Section 4.2). • We show that the iterative targeted attack formulated based on the coarse network are weaker hierarchical adversarial examples compared to the ones generated using the entire HAR network (Section 4.3).

2. HIERARCHICAL ADVERSARIAL EXAMPLES

The advancement in DNN image classifiers is accompanied by the increasing complexity of the network design (Szegedy et al., 2016; He et al., 2016) , and those intricate networks has provided state-of-the-art results on many benchmark tasks (Deng et al., 2009; Geiger et al., 2013; Cordts et al., 2016; Everingham et al., 2015) . Unfortunately, the discovery of adversarial examples has revealed that neural networks are extremely vulnerable to maliciously perturbed inputs at test time (Szegedy et al., 2013) . This makes it difficult to apply DNN-based techniques in mission-critical and safetycritical areas. Another important develeopment along with the advancement of DNN is the growing complexity of the dataset, both in size and in number of classes: i.e. from the 10-class MNIST dataset to the 1000-class ImageNet dataset. As the complexity of the dataset increases exponentially, dataset can often be divided into several coarse classes where each coarse class consists of multiple fine classes. In this paper, we use the term label and class interchangeably. The concept of which an input image is first classified into coarse labels and then into fine labels are referred to as hierarchical classification (Tousch et al., 2012) . Intuitively, the visual separability between groups of fine labels can be highly uneven within a given dataset, and thus some coarse labels are more difficult to distinguish than others. This motivates the use of more dedicated classifiers for specific groups of classes, allowing the coarse labels to provide information on similarities between the fine labels at an intermediate stage. The class hierarchy can be formed in different ways, and it can be learned strategically for optimal performance of the downstream task (Deng et al., 2011) . Note that it is also a valid strategy to create a customized class hierarchy and thus be able to deal with sensitive missclassification. To illustrate our work, we use the predefined class hierarchy of the CIFAR-10 and the CIFAR-100 dataset (Krizhevsky et al., 2009) : fine labels are grouped into coarse labels by semantic similarities. All prior work on adversarial examples for neural networks, regardless of defences or attacks, focuses on the scenario where all misclassifications are considered equally (Szegedy et al., 2013; Goodfellow et al., 2014; Kurakin et al., 2016; Madry et al., 2017; Wong et al., 2020) . However, in practice, this notion overly generalizes the damage caused by different types of attacks. For example, in an autonomous driving system, confusing a perturbed image of a traffic sign as a pedestrian should not be treated the same way as confusing a bus as a pickup truck. The former raises a major security threat for practical machine learning applications, whereas the latter has very little impact to the underlying task. Moreover, misclassification across different coarse labels poses potential ethical concerns when the dataset involves sensitive features such as different ethnicities, genders, people with disabilities and age groups. Mistakes across coarse classes leads to much more severe consequences compared to mistakes within coarse classes. As such, to capture this different notion of attacks, we propose the term hierarchical adversarial examples. They are a specific case of adversarial examples where the resulting misclassification occurs between fine labels that come from different coarse labels. Here, we provide a clear definition of the hierarchical adversarial examples to differentiate it from the standard adversarial examples. We begin with the notation for the classifier. Consider a neural network F (x) : R d → R c with a softmax as its last layer (Hastie et al., 2009) , where d and c denote the input dimension and the number of classes, respectively. The prediction is given by arg max i F (x) i . In the hierarchical classification framework, the classes are categorized (e.g. by the user) into fine classes and coarse classesfoot_0 . The dataset consists of image and fine label pairs: {x, y} n . In the later, we use the set theory symbol ∈ to characterize the relationship between a fine and a coarse label: y ∈ z if the fine label y is part of the coarse class z. Note that this relation holds for both disjoint and overlapping coarse classes. Given an input data x, suppose its true coarse and fine labels are z * and y * respectively. Under the setting defined above, a hierarchical adversarial example must satisfy all the following properties: • the unperturbed input data x is correctly classified by the classifier: arg max i F (x) i = y * ; • the perturbed data x = x + δ is perceptually indistinguishable from the original input x; • the perturbed data x is classified incorrectly: arg max i F (x ) i = y where y = y * ; • the misclassified label belongs to a different coarse class: y ∈ z * . Notice that satisfying the first three properties is sufficient to define a standard adversarial examples, and that hierarchical adversarial examples are special cases of adversarial examples. It is worth mentioning that measuring perceptual distance can be difficult (Li et al., 2003) , thus the second property is often replaced by limiting that the adversary can only modify any input x to x + δ with δ ∈ ∆. Commonly used constraints are -balls w.r.t. p -norms, though other constraint sets have been used too (Wong et al., 2019) . In this work, we focus on ∞ -and 2 -norm attacks.

2.1. GENERATING HIERARCHICAL ADVERSARIAL PERTURBATIONS

A common class of attack techniques are gradient-based attacks, such as FGSM (Goodfellow et al., 2014) , BIM (Kurakin et al., 2016) and PGD (Madry et al., 2017) , that utilize gradient (first-order) information of the network to compute perturbations. Such methods are motivated by linearizing the loss function and solving for the perturbation that optimizes the loss subject to the p -norm constraint. Their popularity is largely due to its simplicity, because the optimization objective can be accomplished in closed form at the cost of one back-propagation. The main idea of gradient-based attacks can be summarized as follows. Given the prediction of F (x) and a target label y, the loss function of the model is denoted by (x, y) (F (x), y), e.g., a cross-entropy loss. Here, we omit the network parameter w in the loss because it is assumed to be fixed while generating adversarial perturbations. Note that the choice of y and whether to maximize or minimize the loss depend on if the attack is targeted or untargeted. For a targeted ∞ attack, gradient-based methods rely on the opposite direction of the loss gradient, -sign ∇ x (x, y), to solve for the perturbation that minimizes the loss with respect to a non-true target label (y = y * ). Despite its simplicity, gradient-based attacks are highly effective at finding p -bounded perturbations that lead to misclassifications. In our work, we introduce a simple variant of the projected gradient descent (PGD) adversary to find hierarchical adversarial examples. Given an input image with true coarse and fine labels z * and y * respectively. Let x j denote the perturbed input at iteration j, we define: x j+1 = Π B∞(x, ) {x j -α sign (∇ x (x j , ŷ))} (1) where the target label ŷ comes from a different coarse class: ŷ ∈ z * . Algorithm 1 summarize the procedures for generating an ∞ -constrained hierarchical adversarial examples. The projection operator Π after each iteration ensures that the perturbation is in an -neighbourhood of the original image. We also adopt the random initialization in PGD attacks (Madry et al., 2017) : x 0 = x + η, where η = (η 1 , η 2 , . . . , η d ) and η i ∼ U(-, ). There are several approaches to choose the target class (Carlini & Wagner, 2017) . The target class can be chosen in an average-case approach where the class is selected uniformly at random among all eligible labels. Alternatively, they can be chosen in a strategic way, a best-case attack, to find the target class which requires the least number of PGD iterations for misclassifications. In our work, we consider a worst-case attack by iterating through all candidate target labels, i.e., fine labels that Algorithm 1: A worst-case approach for generating ∞ -bounded hierarchical adversarial example based on a targeted PGD attack. Input : A pair of input data (x, y * ), where fine label y * belongs to the coarse label z * ; a neural network F (•); loss function (•); ∞ constraint of ; number of PGD iterations k; PGD step-size α. Define S = {y | y ∈ z * }, a collection of all fine labels that do not belong in the coarse label z * ; for ŷ ∈ S do x 0 ← x + η, where η ← (η 1 , η 2 , . . . , η d ) and η i ∼ U(-, ). for j = 0, . . . , k -1 do x j+1 = Π B∞(x, ) {x j -α sign (∇ x (x j , ŷ))} where Π is the projection operator. end if arg max i F (x k ) i = ŷ then Terminate (successful attack); else S \ ŷ; if S is empty then Terminate (failed attack); end end end do not belong in the same coarse class. This iterative targeted attack process terminates under two conditions: 1. perturbation results in a successful targeted misclassification; 2. all candidate fine labels have been used as targets.

2.2. RELATED WORK ON HIERARCHICAL CLASSIFICATION

In image classification domain, there is a sizable body of work exploiting class hierarchy of the dataset (Tousch et al., 2012) . For classification with a large number of classes, it is a common technique to divide the end-to-end learning task into multiple classifications based on the semantic hierarchy of the labels (Marszałek & Schmid, 2008; Liu et al., 2013; Deng et al., 2012) . They are motivated by the intuition that some coarse labels are more difficult to distinguish than others, and specific category of classes requires more dedicated classifiers. A popular hierarchy structure is to divide the fine labels into a label tree with root nodes and leaf nodes. Deng et al. (2011) propose an efficient technique to simultaneously determine the structure of the tree as well as learning the classifier for each node in the tree. Instead of learning the optimal tree structure, it is also common to use the predefined hierarchy of the dataset Deng et al. (2012) .

3. HIERARCHICAL ADVERSARIALLY ROBUST (HAR) NETWORK

To improve the hierarchical adversarial robustness of neural networks, we propose a simple and principled hierarchical adversarially robust (HAR) network which decompose the end-to-end robust learning task into two parts. First, we initialize a neural network for the coarse classification task, along with multiple networks for the fine classification tasks. Next, all the networks are trained using adversarial defence techniques to improve the robustness of their task at hand. The final probability distribution of all the fine classes are computed based on Bayes Theorem. For brevity, we use coarse neural network (CNN) and fine neural network (FNN) to denote the two different types of networks. Intuitively, the HAR network design benefits from a single robustified CNN with improved robustness between coarse classes, and multiple robustified FNN with improved the robustness between visually similar fine classes.

3.1. ARCHITECTURE DESIGN OF HAR

Instead of the traditional flat design of the neural network, HAR consists of one CNN for the coarse labels and several FNNs for the fine labels. Note that there is a one-to-one correspondence between a particular FNN and a specific group of fine labels. Such a module design mimics the hierarchical structure of the dataset where the fine classes are grouped into coarse classes. Recall that our definition of the neural network F (x) includes the softmax function as its last layer, so the output of the network can be interpreted as the probability distribution of the classes: P (y | x). Conditioned on the probability of the coarse class, we can define the fine class probability as P (y | x) = P (y | x, z)P (z | x). (2) Here, the probability distribution of the fine classes are computed as the product of two terms. Given an input x, the first term P (y | x, z) represents the probability of x being a fine label y where y belongs to a coarse class z. This is essentially FNNs' prediction of the fine classes within a coarse category. The second term P (z | x) represents the probability of x being a coarse label z, and it can be understood as the prediction from the CNN. With this decomposition of the original learning task, we can reconstruct the fine label distribution by probabilistically combining the predictions from the different networks. An important advantage of this flexible modular network design is that it allows us to train the component networks using adversarial defence techniques to improve the robustness of their associated task. Especially, a robustified coarse neural network leads to improved hierarchical adversarial robustness between coarse labels. During training, each component of the HAR network is trained independently, allowing them to be trained in parallel. We use the entire dataset with the coarse labels, {x, z}, to train the coarse class network G(x), followed by training multiple fine class network H(x) using only a portion of the dataset. The inference procedure can be described as follows. Suppose the number of coarse classes in a dataset is C, and each coarse class contains j number of fine classes. Similar to the definition of F (x), we use G(x) to denote the output of the CNN: G(x) = [g 1 , . . . , g c ]. We use H i (x) to denote the output of the FNN: H i (x) = h i 1 , . . . , h i j , where j is an positive integer indicating the number of fine classes in the coarse class i. In this setting, the output of the combined neural network is: F (x) = [g 1 H 1 (x), . . . , g C H C (x)] . (3)

3.2. RELATED WORK ON ADVERSARIAL DEFENCE METHODS

A plethora of defence mechanisms have been proposed for the adversarial setting. Adversarial training (Szegedy et al., 2013) is one of the standard approaches for improving the robustness of deep neural networks against adversarial examples. It is a data augmentation method that replaces unperturbed training data with adversarial examples and updates the network with the replaced data points. Intuitively, this procedure encourages the DNN not to make the same mistakes against an adversary. By adding sufficiently enough adversarial examples, the network gradually becomes robust to the attack it was trained on. Existing adversarial training methods (Szegedy et al., 2013; Goodfellow et al., 2014; Kurakin et al., 2016; Madry et al., 2017; Wong et al., 2020) differ in the adversaries used in the training. Another related line of adversarial defence methods focuses on regularizing the loss function instead of data augmentation. TRADES (Zhang et al., 2019) introduces a regularization term that penalizes the difference between the output of the model on a training data and its corresponding adversarial example. The regularized loss consists of a standard crossentropy loss on the unperturbed data and a KL-divergence term measuring the difference between the distribution of clean training data and adversarially perturbed training data.

4. EXPERIMENTS

In this section, we evaluate the hierarchical adversarial robustness of the HAR network design, incorporating two popular adversarial defence methods: adversarial training with PGD10 adversaries (Madry et al., 2017) and TRADES (Zhang et al., 2019) . In this section, we focus on evaluations based on ∞ norm attacks, and defer evaluations on 2 in Appendix A.4. Compared to the traditional flat design of neural network, our experiments show that HAR leads to a significant improvement in hierarchical adversarial robustness under various targeted and untargeted ∞ attacks.

4.1. EVALUATION SETUP

We use network architectures from the ResNet family (He et al., 2016) on the CIFAR-100 dataset. The hierarchical structure of classes within the two dataset is illustrated in Table 6 . To establish a baseline, we train ResNet50 networks using the following methods: ( 1 For training HAR networks, we use ResNet10 for both the coarse network and the fine network. We use models with a lower capacity to reduce the difference in the order of magnitude of parameters between a single ResNet50 model and multiple ResNet10 models. This is to eliminate the concern of which the improved hierarchical adversarial robustness is obtained due to the increasing network complexity. A comparison between the number of trainable parameters is included in Appendix A.1. Note that in the HAR network, all component networks (CNN and FNNs) are trained using the same adversarial defence approach. As a concrete example, a HAR network trained with TRADES on CIFAR100 consists of one coarse classifier and twenty fine classifiers where they are all trained using TRADES. For all four methods (Standard, ADV, ADV-T and TRADES), networks are trained for a total of 200 epochs, with an initial learning rate of 0.1. The learning rate decays by an order of magnitude at epoch 100 and 150. We used a minibatch size of 128 for testing and training. We used SGD optimizer with momentum of 0.9 and a weight decay of 2e-4. For TRADES, we performed a hyperparameter sweep on the strength of the regularization term β and selected one that resulted in the highest accuracy against untargeted ∞ bounded PGD20 attacks. The optimization procedure is used for both the vanilla models and all component models in the HAR network.

4.2. HIERARCHICAL ROBUSTNESS UNDER UNTARGETED AND TARGETED ATTACKS

There are several threat models to consider while evaluating adversarial robustness, regardless of standard or hierarchical robustness. The white-box threat model specifies that the model architecture and network parameters are fully transparent to the attacker (Goodfellow et al., 2014) . Despite many white-box attack methods exist, perturbations generated using iterations of PGD remain as one of the most common benchmarks for evaluating adversarial robustness under the white-box setting. As such, we use PGD as the main method to generate both untargeted and targeted attacks. Specifically, we perform 20, 50, 100 and 200 iterations of PGD for the untargeted attacks in Table 1 . Due to the large number of fine labels in CIFAR-100, we randomly selected 1000 test set input and perform the iterative, worst-case hierarchical adversarial perturbations introduced in Section 2.1. Note that the evaluation on untargeted attacks uses the entire test set. The results on the worst-case targeted attack is included in Table 2 . Note that, besides = 8/255, we also evaluated HAR against attacks with Along with the two attacks, we also include results on unperturbed testset data (Clean). For clean and untargeted attacks, we report the percentage of correct fine class prediction as fine accuracy, and the percentage of fine class prediction belonging to the correct coarse class as coarse accuracy. For targeted, its accuracy refers to the percentage of the testset data where the targeted attack fails to alter the final prediction to the desired target, even after iterating through all eligible target labels. It is important to realize that a successful targeted attack implies misclassification for both coarse and fine classes. Table 1 summarize the accuracy of the HAR model and vanilla models on standard unperturbed data, and against untargeted and targeted attacks. 

4.2.1. DISCUSSIONS

Before making the comparison between the HAR model and the vanilla model, we make an interesting observation: untargeted attacks often results in misclassification within the same coarse class, shown by the high coarse accuracy under Untargeted. In particular, vanilla networks trained with unperturbed training data have 0% fine accuracy under Untargeted, while a vast majority of the misclassified classes still belong to the correct coarse class. This shows that the untargeted attacks do not provide a good representation of the hierarchical adversarial robustness as an empirical evaluation. On the other hand, the iterative targeted attack leads to a severe damage in hierarchical adversarial robustness for vanilla models trained with all three methods. On Standard trained models, despite the high hierarchical robustness under untargeted attacks, nearly all of the CIFAR10 and CIFAR100 testset data can be perturbed into a desired target class from another coarse class. As such, we emphasize the use of the iterative worst-case targeted attack for a more accurate evaluations for the hierarchical adversarial robustness of the model. For vanilla models trained with ADV and TRADES, we notice that the improved adversarial robustness on fine classes also translates to an improvement on hierarchical adversarial robustness. We observed that vanilla models trained using ADV-T shows improved robustness against untargeted PGD attacks compared to the original adversarial training method. However, we noticed that there is a significant decrease in the hierarchical robustness of ADV-T models against the hierarchical adversarial examples, leading to a worse hierarchical robust accuracy compared to ADV. Finally, we observe that HAR network trained with ADV and TRADES significantly improves the robustness against iterative targeted attacks compared to the vanilla counterparts. On CIFAR-100, HAR network achieves a 1.2% improvement on PGD20 adversaries ( = 8/255) when trained with ADV.

4.3. HIERARCHICAL ROBUSTNESS UNDER TARGETED ATTACKS BASED ON THE COARSE NETWORK

Under the white-box threat model, attackers with a complete knowledge of the internal structure of HAR can also generate perturbations based on the coarse network. During evaluations, we investigate whether the targeted PGD adversaries based on the coarse network are stronger hierarchical adversarial examples compared to the ones generated using the entire network. Such attacks can be understood as finding a more general perturbation which alters the probability distribution of the coarse class: P (z | x). Similar to the attack proposed in Section 2.1, we perform an iterative, worstcase targeted PGD20 attack based on the coarse neural network. Specifically, we replace (F (x), y) with (G(x), z) in Eq. 1, and iterate through all eligible coarse classes as target labels. For example, to generate such attacks for HAR with ADV-trained component networks, the iterative targeted attack is performed based on the ADV-trained coarse network in the original HAR network. Note that there is a distinction between the above attack procedure and a transfer-based attack where the perturbation is transferred from an independently trained source model (Papernot et al., 2017) . Since the perturbation is generated using part of the HAR network, such attacks still belongs in the whitebox setting. Our results in Table 3 show that the perturbations generated using the coarse network are weaker attacks compared to the ones generated using the entire network.

5. CONCLUSION

In this work, we introduced a novel concept called hierarchical adversarial examples. For dataset which classes can be further categorized into fine and coarse classes, we defined hierarchical adversarial examples as the ones leading to a misclassication at the coarse level. To improve the hierarchical adversarial robustness of the neural network, we proposed the HAR network design, a composite of a coarse network and fine networks where each component network is trained independently by adversarial defence techniques. We empirically showed that HAR leads to a significant increase in hierarchical adversarial robustness under white-box untargeted/targeted attacks on CIFAR-10 and CIFAR-100-5x5. The rapid adoption of machine learning applications has also led to an increasing importance in improving the robustness and reliability of such techniques. Mission-critical and safety-critical systems which rely on DNN in their decision-making process shall incorporate robustness, along with accuracy, in their development process. The introduction of the hierarchical adversarial examples and ways to defend against them is an important step towards a more safe and trustworthy AI system.

A APPENDIX

A.1 COMPARISON OF TRAINABLE MODEL PARAMETERS In our evaluations, we use ResNet50 for the vanilla models, and use multiple ResNet10 for the HAR network. We use models with a lower capacity to reduce the difference in the order of magnitude of parameters between a single ResNet50 model and multiple ResNet10 models. This helps to address the concern of which the improved hierarchical adversarial robustness is obtained due to the increasing network complexity. The following results show the hyperparameter sweep on TRADES. We include the one with the highest PGD20 accuracy in Section 4. 



We could go beyond this 2-level hierarchy. Here we keep the presentation simple for didactic purposes.



Figure 1: Pipeline of the proposed HAR network design to improve hierarchical adversarial robustness of the neural network.

) Standard: training with unperturbed data; (2) ADV: training with 10-step untargeted PGD examples (2) ADV-T: training with 10-step randomly targeted PGD examples and (4) TRADES. ADV-T is a targeted-version of the PGD adversarial training. Specifically, given an input pair from the training set (x, y) and y ∈ z * , the perturbation is computed based on a targeted 10-step PGD attack where the target label is uniformly random sampled from {y | y ∈ z * }. We refer the flat models as vanilla models.

Accuracy of different models on CIFAR100 against ∞ bounded white-box untargeted PGD attacks. (a higher score indicates better performance)

Accuracy of different models on CIFAR100 against ∞ bounded worst-case targeted PGD attacks generated based on Algorithm 1. (a higher score indicates better performance)

Accuracy of the hierarchical classifier on CIFAR-100 against ∞ bounded targeted attacks ( = 8/255) generated using the coarse network (Coarse). As a comparison, the attack counterpart generated using the entire HAR network is also included (HAR). (a higher score indicates better performance)

Number of trainable parameters in ResNet10 and ResNet34

Hyperparameter sweep of TRADES on ResNet10: evaluation based performance on CIFAR-10 against ∞ bounded adversarial perturbations ( = 8/255).

A.3 HIERARCHICAL STRUCTURE OF CLASSES WITHIN THE CIFAR10 DATASET

Overall, we observe a similar robustness improvement with the HAR network. Note that we were not able to achieve reasonable robustness results with TRADES against 2 attacks. One possible reason is that the hyper-parameter sweet was based on the ∞ results, as such not suitable for 2 attacks. For this reason, we omit TRADES in this section. 

