FINE-GRAINED SYNTHESIS OF UNRESTRICTED ADVERSARIAL EXAMPLES Anonymous

Abstract

We propose a novel approach for generating unrestricted adversarial examples by manipulating fine-grained aspects of image generation. Unlike existing unrestricted attacks that typically hand-craft geometric transformations, we learn stylistic and stochastic modifications leveraging state-of-the-art generative models. This allows us to manipulate an image in a controlled, fine-grained manner without being bounded by a norm threshold. Our approach can be used for targeted and non-targeted unrestricted attacks on classification, semantic segmentation and object detection models. Our attacks can bypass certified defenses, yet our adversarial images look indistinguishable from natural images as verified by human evaluation. Moreover, we demonstrate that adversarial training with our examples improves performance of the model on clean images without requiring any modifications to the architecture. We perform experiments on LSUN, CelebA-HQ and COCO-Stuff as high resolution datasets to validate efficacy of our proposed approach.

1. INTRODUCTION

Adversarial examples, inputs resembling real samples but maliciously crafted to mislead machine learning models, have been studied extensively in the last few years. Most of the existing papers, however, focus on normconstrained attacks and defenses, in which the adversarial input lies in an -neighborhood of a real sample using the L p distance metric (commonly with p = 0, 2, ∞). For small , the adversarial input is quasi-indistinguishable from the natural sample. For an adversarial image to fool the human visual system, it is sufficient to be normconstrained; but this condition is not necessary. Moreover, defenses tailored for norm-constrained attacks can fail on other subtle input modifications. This has led to a recent surge of interest on unrestricted adversarial attacks in which the adversary is not bounded by a norm threshold. These methods typically hand-craft transformations to capture visual similarity. Spatial transformations [Engstrom et al. (2017) In this paper, we focus on fine-grained manipulation of images for unrestricted adversarial attacks. We build upon state-of-the-art generative models which disentangle factors of variation in images. We create fine and coarsegrained adversarial changes by manipulating various latent variables at different resolutions. Loss of the target network is used to guide the generation process. The pre-trained generative model constrains the search space for our adversarial examples to realistic images, thereby revealing the target model's vulnerability in the natural image space. We verify that we do not deviate from the space of realistic images with a user study as well as a t-SNE plot comparing distributions of real and adversarial images (see Fig. 7 in the appendix). As a result, we observe that including these examples in training the model enhances its accuracy on clean images. Our contributions can be summarized as follows: • We present the first method for fine-grained generation of high-resolution unrestricted adversarial examples in which the attacker controls which aspects of the image to manipulate, resulting in a diverse set of realistic, on-the-manifold adversarial examples. 



; Xiao et al. (2018); Alaifari et al. (2018)], viewpoint or pose changes [Alcorn et al. (2018)], inserting small patches [Brown et al. (2017)], among other methods, have been proposed for unrestricted adversarial attacks.

We demonstrate that adversarial training with our examples improves performance of the model on clean images. This is in contrast to training with norm-bounded perturbations which degrades the model's accuracy. Unlike recent approaches such as Xie et al. (2020) which use a separate auxiliary batch norm for adversarial examples, our method does not require any modifications to the architecture. • We propose the first method for generating unrestricted adversarial examples for semantic segmentation and object detection. Training with our examples improves segmentation results on clean images.

