LEARN ROBUST FEATURES VIA ORTHOGONAL MULTI-PATH

Abstract

It is now widely known that by adversarial attacks, clean images with invisible perturbations can fool deep neural networks. To defend adversarial attacks, we design a block containing multiple paths to learn robust features and the parameters of these paths are required to be orthogonal with each other. The so-called Orthogonal Multi-Path (OMP) block could be posed in any layer of a neural network. Via forward learning and backward correction, one OMP block makes the neural networks learn features that are appropriate for all the paths and hence are expected to be robust. With careful design and thorough experiments on e.g., the positions of imposing orthogonality constraint, and the trade-off between the variety and accuracy, the robustness of the neural networks is significantly improved. For example, under white-box PGD attack with l ∞ bound 8/255 (this is a fierce attack that can make the accuracy of many vanilla neural networks drop to nearly 10% on CIFAR10), VGG16 with the proposed OMP block could keep over 50% accuracy. For black-box attacks, neural networks equipped with an OMP block have accuracy over 80%. The performance under both white-box and black-box attacks is much better than the existing state-of-the-art adversarial defenders.

1. INTRODUCTION

In recent years, Deep Neural Networks (DNNs) have been widely applied in many fields (Goodfellow et al., 2016) . Despite the great progress, vulnerability of DNNs has also been found. For example, in classification task in computer vision, by adding well-designed, visually-imperceptible perturbations on clean images, the resulting perturbed images, a.k.a. adversarial examples, can successfully fool many well-trained DNNs (Szegedy et al., 2014; Goodfellow et al., 2015) . Such a process of generating adversarial examples is called adversarial attack. Since its proposal, there have been many interesting adversarial attacks, which can be categorized into two types, black-box attack (Papernot et al., 2017; Liu et al., 2017; Chen et al., 2017; Su et al., 2019) and white-box attack (Goodfellow et al., 2015; Kurakin et al., 2017; Carlini & Wagner, 2017; Madry et al., 2018; Tang et al., 2019) . As the name suggests, white-box attacks need complete information of the target model. While black-box attacks rely on the output of target model or transferability across models. For a neural network f (x; θ) with input x and parameters θ, we denote the trained parameters as θ and the example to be attacked as x 0 . Adversarial attack tries to find a small ∆x such that f (x 0 + ∆x; θ) = f (x 0 ; θ). To defend the attack, i.e., to keep the both sides equal, adversarial training (Szegedy et al., 2014; Goodfellow et al., 2015; Madry et al., 2018) includes a group of adversarial perturbations in training process to keep f (x 0 + ∆x; θ) = f (x 0 ; θ). Generally speaking, adversarial training is the most efficient defence strategy until now (Athalye et al., 2018; Tramer et al., 2020) , but the attack needs to be known in advance. To adapt to all perturbations, researchers consider the response to perturbations ∆x in B ε = {∆x| ∆x ≤ ε}. If the maximum change is small, then a certified robustness could be guaranteed (Raghunathan et al., 2018; Cohen et al., 2019) . Motivated by the coupling of samples and parameters, one could also impose randomness on parameters to enhance the robustness. Consider a linear layer, which includes convolution layer and fully-connected layer. Imposing perturbation into samples is equal to giving randomness to parameters: ∀∆x ∈ B ε , there exists ∆θ that satisfies x 0 + ∆x, θ = x 0 , θ + ∆θ . Pioneering and representative works on this direction could be found in (He et al., 2019; Liu et al., 2018b) , which aim to learn a distribution for the parameters. The advantages include network diversity and low dependence on the attacks. However, its learning is not very effective: the learned distribution tends to shrink to one optimal solution. In this paper, we propose to embed multiple paths into a neural network. In a regular neural network, a block with input z k and output z k+1 is denoted as z k+1 = g(z k ; θ g ). For the mapping g, we will train multiple paths g i (z k ; θ g i ), then it could give multiple outputs and the rest layers are trained to adapt all the paths. A key issue here is that we require the parameters of all paths orthogonal to each other, which guarantees the diversity and coverage. Fig. 1 (a) gives a comparison illustration of a regular network and a network embedded with L paths. The proposed Orthogonal Multi-Path (OMP) block can be posed in any layer of a neural network. It is not surprising that the follow-up layers are more robust since they are capable to handle features from multiple paths. Let us consider a simple example. A VGG16 is trained on CIFAR10 and we put an OMP block in the first layer. The average feature change ∆z k+2 ∞ at the layer after OMP block caused by perturbations ∆x ∞ ≤ 8/255 on test images is bounded by 1.39 (which is actually a certificated robustness measure) in the vanilla VGG16 and is improved to 0.82 by the OMP block. Interestingly, OMP is also helpful for the front layers. In Fig. 1 (b), we visualize the learned features after the first layer in a vanilla network and the same network with an OMP block posed on the last layer. Although the OMP block is posed on the final layer, it could correct the learned features at the first layer, resulting in much more alike features of clean and perturbed images. This phenomenon is explained by the backward correction theory proposed by Allen-Zhu & Li (2020) . Actually, this result could be a strong evidence to verify that training higher-level layers improves the features of lower-level ones. With forward learning and backward correction, OMP block could make the feature extractor adaptive to multiple paths and then enhance the robustness of the whole networks. For example, under white-box PGD attack with l ∞ bound 8/255, which could destroy the accuracy of many vanilla networks to nearly 10% on CIFAR10, VGG16 with the proposed OMP block could keep over 50% accuracy. The contributions of this work are summarized as follows: • A novel defence method is proposed, which introduces orthogonal multiple paths into a neural network to enhance the robustness. • Extensive empirical results against different white-box and black-box attacks indicate the superior robustness of networks with OMP block in vanilla and adversarial training. • A thorough empirical analysis on different positions of the OMP block is provided, illustrating the distinct properties. Ablation study also demonstrates the necessity and effectiveness of the OMP block.



Figure 1: (a) OMP block is to replace a single path by multiple ones and can be posed anywhere in a neural network. (b) When clean examples (1st row) are adversarially perturbed (2nd row, generated by attacking a third neural network), the features learned by a vanilla VGG16 change a lot. After imposing an OMP block even in the last layer, the learned features by the first layer become much more robust.

