ALFA: ADVERSARIAL FEATURE AUGMENTATION FOR ENHANCED IMAGE RECOGNITION

Abstract

Adversarial training is an effective method to combat adversarial attacks in order to create robust neural networks. By using an auxiliary batch normalization on adversarial examples, it has been shown recently to possess great potential in improving the generalization ability of neural networks for image recognition as well. However, crafting pixel-level adversarial perturbations is computationally expensive. To address this issue, we propose AdversariaL Feature Augmentation (ALFA), which advocates adversarial training on the intermediate layers of feature embeddings. ALFA utilizes both clean and adversarial augmented features jointly to enhance standard trained networks. To eliminate laborious tuning of key parameters such as locations and strength of feature augmentations, we further design a learnable adversarial feature augmentation (L-ALFA) framework to automatically adjust the perturbation magnitude of each perturbed feature. Extensive experiments demonstrate that our proposed ALFA and L-ALFA methods achieve significant and consistent generalization improvement over strong baselines on CIFAR-10, CIFAR-100 and ImageNet benchmarks across different backbone networks for image recognition.

1. INTRODUCTION

Neural networks often fall vulnerable when presented adversarial examples injected with imperceptible perturbations, and suffer significant performance drop when facing such attacks (Szegedy et al., 2013; Goodfellow et al., 2015b) . Such susceptibility has motivated abundant studies on adversarial defense mechanisms for training robust neural networks (Schmidt et al., 2018; Sun et al., 2019; Nakkiran, 2019; Stutz et al., 2019; Raghunathan et al., 2019) , among which adversarial training based methods (Madry et al., 2018b; Zhang et al., 2019a) have achieved consistently superior robustness than others. The general focus of adversarial training is to enhance the robustness of gradient-based adversarial examples. A few recent studies (Zhu et al., 2020; Gan et al., 2020) turn to investigate the generalization ability of adversarial training on language models. However, in-depth exploration of extending this to the vision domain is still missing. Xie et al. (2020) proposes to utilize adversarial examples with an auxiliary batch normalization to improve standard accuracy for image recognition, but it still suffers from expensive computational cost from the generation of pixel-level perturbations. To address this issue, we propose AdversariaL Feature Augmentation (ALFA) as a natural extension of adversarial training, with a focus on leveraging adversarial perturbations in the feature space to improve image recognition on clean data. As illustrated in Figure 1 , ALFA introduces adversarial perturbations to multiple intermediate layers. These perturbed feature embeddings act as a special feature augmentation and implicit regularization to enhance the generalization ability of deep neural networks. Consequently, two challenges arise: (i) how to efficiently find the best locations to introduce adversarial perturbations; and (ii) how to decide on the strength of the created perturbations. Although a few recent works (Zhu et al., 2020; Gan et al., 2020; Sankaranarayanan et al., 2017) look into this field, they either add perturbations in the input embeddings or all the intermediate features, yet have not reached a coherent conclusion. To efficiently learn an optimal strategy of perturbation injection, we further propose a learnable adversarial feature augmentation (L-ALFA) framework, which is capable of automatically adjusting the position and strength of introduced feature perturbations. The proposed approach not only

