IMPROVING HIERARCHICAL ADVERSARIAL ROBUST-NESS OF DEEP NEURAL NETWORKS

Abstract

Do all adversarial examples have the same consequences? An autonomous driving system misclassifying a pedestrian as a car may induce a far more dangerous -and even potentially lethal-behavior than, for instance, a car as a bus. In order to better tackle this important problematic, we introduce the concept of hierarchical adversarial robustness. Given a dataset whose classes can be grouped into coarse-level labels, we define hierarchical adversarial examples as the ones leading to a misclassification at the coarse level. To improve the resistance of neural networks to hierarchical attacks, we introduce a hierarchical adversarially robust (HAR) network design that decomposes a single classification task into one coarse and multiple fine classification tasks, before being specifically trained by adversarial defense techniques. As an alternative to an end-to-end learning approach, we show that HAR significantly improves the robustness of the network against 2 and ∞ bounded hierarchical attacks on the CIFAR-100 dataset.

1. INTRODUCTION

Deep neural networks (DNNs) are highly vulnerable to attacks based on small modification of the input to the network at test time (Szegedy et al., 2013) . Those adversarial perturbations are carefully crafted in a way that they are imperceptible to human observers, but when added to clean images, can severely degrade the accuracy of the neural network classifier. Since their discovery, there has been a vast literature proposing various attack and defence techniques for the adversarial settings (Szegedy et al., 2013; Goodfellow et al., 2014; Kurakin et al., 2016; Madry et al., 2017; Wong et al., 2020) . These methods constitute important first steps in studying adversarial robustness of neural networks. However, there exists a fundamental flaw in the way we assess a defence or an attack mechanism. That is, we overly generalize the mistakes caused by attacks. Particularly, the current approaches focuses on the scenario where different mistakes caused by the attacks are treated equally. We argue that some context do not allow mistakes to be considered equal. In CIFAR-100 (Krizhevsky et al., 2009) , it is less problematic to misclassify a pine tree as a oak tree than a fish as a truck. As such, we are motivated to propose the concept of hierarchical adversarial robustness to capture this notion. Given a dataset whose classes can be grouped into coarse labels, we define hierarchical adversarial examples as the ones leading to a misclassification at the coarse level; and we present a variant of the projected gradient descent (PGD) adversaries (Madry et al., 2017) to find hierarchical adversarial examples. Finally, we introduce a simple and principled hierarchical adversarially robust (HAR) network which decompose the end-to-end robust learning task into a single classification task into one coarse and multiple fine classification tasks, before being trained by adversarial defence techniques. Our contribution are • We introduce the concept of hierarchical adversarial examples: a special case of the standard adversarial examples which causes mistakes at the coarse level (Section 2). • We present a worst-case targeted PGD attack to find hierarchical adversarial examples. The attack iterates through all candidate fine labels until a successful misclassification into the desired target (Section 2.1). • We propose a novel architectural approach, HAR network, for improving the hierarchical adversarial robustness of deep neural networks (Section 3). We empirically show that HAR networks significantly improve the hierarchical adversarial robustness against ∞ attacks ( = 8 255 ) (Section 4) and 2 attacks ( = 0.5) (Appendix A.4) on CIFAR-100.

