ANALYZING AND IMPROVING GENERATIVE ADVER-SARIAL TRAINING FOR GENERATIVE MODELING AND OUT-OF-DISTRIBUTION DETECTION Anonymous

Abstract

Generative adversarial training (GAT) is a recently introduced adversarial defense method. Previous works have focused on empirical evaluations of its application to training robust predictive models. In this paper we focus on theoretical understanding of the GAT method and extending its application to generative modeling and out-of-distribution detection. We analyze the optimal solutions of the maximin formulation employed by the GAT objective, and make a comparative analysis of the minimax formulation employed by GANs. We use theoretical analysis and 2D simulations to understand the convergence property of the training algorithm. Based on these results, we develop an unconstrained GAT algorithm, and conduct comprehensive evaluations of the algorithm's application to image generation and adversarial out-of-distribution detection. Our results suggest that generative adversarial training is a promising new direction for the above applications.

1. INTRODUCTION

Generative adversarial training (GAT) (Yin et al., 2020) is a recently introduced defense mechanism that could be used for adversarial example detection and robust classification. The defense consists of a committee of detectors (binary discriminators), with each one trained to discriminate natural data of a particular class from adversarial examples perturbed from data of other classes. Like most other work in the area of robust machine learning, the defense is specially designed for defending against norm-constrained adversaries -adversaries that are constrained to perturb the data up to a certain amount as measured by some norm. The defense's robustness is achieved by training each detector model against adversarial examples produced by the norm-constrained PGD attack (Madry et al., 2017) . Existing work: training and evaluating robust predictive models A detector trained with GAT has strong interpretability -an unbounded attack that maximizes the detector's output results in images that resemble the target class data -this suggests the detector has learned the target class data distribution. However, all previous works (Yin et al., 2020; Tramer et al., 2020) focus on the empirical evaluations of GAT's application to training robust predictive models; a theoretical understanding of why this training method causes the detector to learn the data distribution is missing. This work: theoretical understanding, improved training algorithm, and extended applications In order to better understand the GAT method, we first analyze the optimal solutions of the training objective. We start with a maximin formulation (eq. ( 5)) of the objective, and try to connect it with the minimax formulation (eq. ( 1)) that is employed by GANs (Goodfellow et al., 2014) . We find that the differences between solutions of these two formulations become immediately clear when we take a game-theory perspective. We then use theoretical analysis and 2D simulations to understand the convergence property of the GAT training algorithm. Building upon these theoretical and experimental insights, we develop an unconstrained GAT algorithm, and apply it to the tasks of generative modeling and out-of-distribution detection. We find the maximin-based generative model to be more stable to train than its minimax counterpart (GANs), and at the same time more flexible as it does not have a fixed generator and can transform arbitrary inputs to the target distribution data, which might be particularly useful for certain applications (e.g., face manipulation). The model

