LEARN ROBUST FEATURES VIA ORTHOGONAL MULTI-PATH

Abstract

It is now widely known that by adversarial attacks, clean images with invisible perturbations can fool deep neural networks. To defend adversarial attacks, we design a block containing multiple paths to learn robust features and the parameters of these paths are required to be orthogonal with each other. The so-called Orthogonal Multi-Path (OMP) block could be posed in any layer of a neural network. Via forward learning and backward correction, one OMP block makes the neural networks learn features that are appropriate for all the paths and hence are expected to be robust. With careful design and thorough experiments on e.g., the positions of imposing orthogonality constraint, and the trade-off between the variety and accuracy, the robustness of the neural networks is significantly improved. For example, under white-box PGD attack with l ∞ bound 8/255 (this is a fierce attack that can make the accuracy of many vanilla neural networks drop to nearly 10% on CIFAR10), VGG16 with the proposed OMP block could keep over 50% accuracy. For black-box attacks, neural networks equipped with an OMP block have accuracy over 80%. The performance under both white-box and black-box attacks is much better than the existing state-of-the-art adversarial defenders.

1. INTRODUCTION

In recent years, Deep Neural Networks (DNNs) have been widely applied in many fields (Goodfellow et al., 2016) . Despite the great progress, vulnerability of DNNs has also been found. For example, in classification task in computer vision, by adding well-designed, visually-imperceptible perturbations on clean images, the resulting perturbed images, a.k.a. adversarial examples, can successfully fool many well-trained DNNs (Szegedy et al., 2014; Goodfellow et al., 2015) . Such a process of generating adversarial examples is called adversarial attack. Since its proposal, there have been many interesting adversarial attacks, which can be categorized into two types, black-box attack (Papernot et al., 2017; Liu et al., 2017; Chen et al., 2017; Su et al., 2019) and white-box attack (Goodfellow et al., 2015; Kurakin et al., 2017; Carlini & Wagner, 2017; Madry et al., 2018; Tang et al., 2019) . As the name suggests, white-box attacks need complete information of the target model. While black-box attacks rely on the output of target model or transferability across models. For a neural network f (x; θ) with input x and parameters θ, we denote the trained parameters as θ and the example to be attacked as x 0 . Adversarial attack tries to find a small ∆x such that f (x 0 + ∆x; θ) = f (x 0 ; θ). To defend the attack, i.e., to keep the both sides equal, adversarial training (Szegedy et al., 2014; Goodfellow et al., 2015; Madry et al., 2018) includes a group of adversarial perturbations in training process to keep f (x 0 + ∆x; θ) = f (x 0 ; θ). Generally speaking, adversarial training is the most efficient defence strategy until now (Athalye et al., 2018; Tramer et al., 2020) , but the attack needs to be known in advance. To adapt to all perturbations, researchers consider the response to perturbations ∆x in B ε = {∆x| ∆x ≤ ε}. If the maximum change is small, then a certified robustness could be guaranteed (Raghunathan et al., 2018; Cohen et al., 2019) . Motivated by the coupling of samples and parameters, one could also impose randomness on parameters to enhance the robustness. Consider a linear layer, which includes convolution layer and fully-connected layer. Imposing perturbation into samples is equal to giving randomness to parameters: ∀∆x ∈ B ε , there exists ∆θ that satisfies x 0 + ∆x, θ = x 0 , θ + ∆θ . Pioneering and

