ALFA: ADVERSARIAL FEATURE AUGMENTATION FOR ENHANCED IMAGE RECOGNITION

Abstract

Adversarial training is an effective method to combat adversarial attacks in order to create robust neural networks. By using an auxiliary batch normalization on adversarial examples, it has been shown recently to possess great potential in improving the generalization ability of neural networks for image recognition as well. However, crafting pixel-level adversarial perturbations is computationally expensive. To address this issue, we propose AdversariaL Feature Augmentation (ALFA), which advocates adversarial training on the intermediate layers of feature embeddings. ALFA utilizes both clean and adversarial augmented features jointly to enhance standard trained networks. To eliminate laborious tuning of key parameters such as locations and strength of feature augmentations, we further design a learnable adversarial feature augmentation (L-ALFA) framework to automatically adjust the perturbation magnitude of each perturbed feature. Extensive experiments demonstrate that our proposed ALFA and L-ALFA methods achieve significant and consistent generalization improvement over strong baselines on CIFAR-10, CIFAR-100 and ImageNet benchmarks across different backbone networks for image recognition.

1. INTRODUCTION

Neural networks often fall vulnerable when presented adversarial examples injected with imperceptible perturbations, and suffer significant performance drop when facing such attacks (Szegedy et al., 2013; Goodfellow et al., 2015b) . Such susceptibility has motivated abundant studies on adversarial defense mechanisms for training robust neural networks (Schmidt et al., 2018; Sun et al., 2019; Nakkiran, 2019; Stutz et al., 2019; Raghunathan et al., 2019) , among which adversarial training based methods (Madry et al., 2018b; Zhang et al., 2019a) have achieved consistently superior robustness than others. The general focus of adversarial training is to enhance the robustness of gradient-based adversarial examples. A few recent studies (Zhu et al., 2020; Gan et al., 2020) turn to investigate the generalization ability of adversarial training on language models. However, in-depth exploration of extending this to the vision domain is still missing. Xie et al. (2020) proposes to utilize adversarial examples with an auxiliary batch normalization to improve standard accuracy for image recognition, but it still suffers from expensive computational cost from the generation of pixel-level perturbations. To address this issue, we propose AdversariaL Feature Augmentation (ALFA) as a natural extension of adversarial training, with a focus on leveraging adversarial perturbations in the feature space to improve image recognition on clean data. As illustrated in Figure 1 , ALFA introduces adversarial perturbations to multiple intermediate layers. These perturbed feature embeddings act as a special feature augmentation and implicit regularization to enhance the generalization ability of deep neural networks. Consequently, two challenges arise: (i) how to efficiently find the best locations to introduce adversarial perturbations; and (ii) how to decide on the strength of the created perturbations. Although a few recent works (Zhu et al., 2020; Gan et al., 2020; Sankaranarayanan et al., 2017) look into this field, they either add perturbations in the input embeddings or all the intermediate features, yet have not reached a coherent conclusion. To efficiently learn an optimal strategy of perturbation injection, we further propose a learnable adversarial feature augmentation (L-ALFA) framework, which is capable of automatically adjusting the position and strength of introduced feature perturbations. The proposed approach not only circumvents laborious hyper-parameter tuning, but also fully unleashes the power of adversarial feature augmentation. Experiments show that this strategy gains a substantial performance margin over existing feature augmentation methods (Li et al., 2020) . In addition, we find that learnable ALFA and exhaustively-tuned ALFA exhibit consistent patterns: applying weak adversarial feature augmentations to the last layers of deep neural networks can boost generalization performance. The main contributions are summarized as follows. (i) We introduce a new approach of adversarial feature augmentation (ALFA) to improve the generalization ability of neural networks, which applies adversarial perturbations to the feature space rather than raw image pixels. (ii) To tackle the dilemma of laborious hyper-parameter tuning in generating adversarial features, we propose learnable adversarial feature augmentation (L-ALFA) to automatically tailor target perturbations and their locations. (iii) Comprehensive experiments on CIFAR-10, CIFAR-100, and ImageNet datasets across multiple backbone networks demonstrate the superiority of the proposed methods.

2. RELATED WORK

Adversarial Training Deep neural networks are notoriously vulnerable to adversarial samples (Szegedy et al., 2013; Goodfellow et al., 2015b) , which are crafted with malicious yet negligible perturbations (Goodfellow et al., 2015a; Kurakin et al., 2016; Madry et al., 2018a) . In order to improve the robustness against adversarial samples, various defense mechanisms have been proposed (Zhang et al., 2019a; Schmidt et al., 2018; Sun et al., 2019; Nakkiran, 2019; Stutz et al., 2019; Raghunathan et al., 2019) . Among these works, adversarial-training-based methods (Madry et al., 2018b; Zhang et al., 2019a) have achieved consistently superior performance in defending stateof-the-art adversarial attacks (Goodfellow et al., 2015a; Kurakin et al., 2016; Madry et al., 2018a) . Although adversarial training substantially improves model robustness, it usually comes at the price of compromising the standard accuracy (Tsipras et al., 2019) , which has been demonstrated both empirically and theoretically (Zhang et al., 2019a; Schmidt et al., 2018; Sun et al., 2019; Nakkiran, 2019; Stutz et al., 2019; Raghunathan et al., 2019) . Recently, researchers start to investigate improving clean set accuracy with adversarial training (Xie et al., 2020; Zhu et al., 2020; Wang et al., 2019a; Gan et al., 2020; Wei & Ma, 2019) (Ishii & Sato, 2019) . Xie et al. (2020) shows that performance on the clean dataset can be enhanced by using adversarial samples with pixel-level perturbation generation. Zhu et al. (2020) and Wang et al. (2019a) apply adversarial training to natural language understanding and language modeling, both successfully achieving better standard accuracy. Gan et al. (2020) achieves similar success on many vision-and-language tasks. There also exist parallel studies that employ handcrafted or auto-



Figure 1: Overview of Adversarial Feature Augmentation for image recognition. From left to right, clean images are fed into network backbones to extract clean feature embeddings. Then, adversarial perturbations are generated to augment several intermediate features (in the direction of purple paths). In the end, both adversarial augmented and clean feature embeddings are taken as inputs by the classifer, and optimized by adversarial (L at ) and standard training (L std ) objectives.

