ANALYZING AND IMPROVING GENERATIVE ADVER-SARIAL TRAINING FOR GENERATIVE MODELING AND OUT-OF-DISTRIBUTION DETECTION Anonymous

Abstract

Generative adversarial training (GAT) is a recently introduced adversarial defense method. Previous works have focused on empirical evaluations of its application to training robust predictive models. In this paper we focus on theoretical understanding of the GAT method and extending its application to generative modeling and out-of-distribution detection. We analyze the optimal solutions of the maximin formulation employed by the GAT objective, and make a comparative analysis of the minimax formulation employed by GANs. We use theoretical analysis and 2D simulations to understand the convergence property of the training algorithm. Based on these results, we develop an unconstrained GAT algorithm, and conduct comprehensive evaluations of the algorithm's application to image generation and adversarial out-of-distribution detection. Our results suggest that generative adversarial training is a promising new direction for the above applications.

1. INTRODUCTION

Generative adversarial training (GAT) (Yin et al., 2020) is a recently introduced defense mechanism that could be used for adversarial example detection and robust classification. The defense consists of a committee of detectors (binary discriminators), with each one trained to discriminate natural data of a particular class from adversarial examples perturbed from data of other classes. Like most other work in the area of robust machine learning, the defense is specially designed for defending against norm-constrained adversaries -adversaries that are constrained to perturb the data up to a certain amount as measured by some norm. The defense's robustness is achieved by training each detector model against adversarial examples produced by the norm-constrained PGD attack (Madry et al., 2017) . Existing work: training and evaluating robust predictive models A detector trained with GAT has strong interpretability -an unbounded attack that maximizes the detector's output results in images that resemble the target class data -this suggests the detector has learned the target class data distribution. However, all previous works (Yin et al., 2020; Tramer et al., 2020) focus on the empirical evaluations of GAT's application to training robust predictive models; a theoretical understanding of why this training method causes the detector to learn the data distribution is missing. This work: theoretical understanding, improved training algorithm, and extended applications In order to better understand the GAT method, we first analyze the optimal solutions of the training objective. We start with a maximin formulation (eq. ( 5)) of the objective, and try to connect it with the minimax formulation (eq. ( 1)) that is employed by GANs (Goodfellow et al., 2014) . We find that the differences between solutions of these two formulations become immediately clear when we take a game-theory perspective. We then use theoretical analysis and 2D simulations to understand the convergence property of the GAT training algorithm. Building upon these theoretical and experimental insights, we develop an unconstrained GAT algorithm, and apply it to the tasks of generative modeling and out-of-distribution detection. We find the maximin-based generative model to be more stable to train than its minimax counterpart (GANs), and at the same time more flexible as it does not have a fixed generator and can transform arbitrary inputs to the target distribution data, which might be particularly useful for certain applications (e.g., face manipulation). The model trained with the unconstrained GAT algorithm also outperforms several state-of-the-art methods on the task of adversarial out-of-distribution detection. In summary, our key contributions are: • We analyze the optimal solutions of the GAT objective and convergence property of the training algorithm. We discuss the implications of these results on improved training of robust predictive models, generative modeling, and out-of-distribution detection. • We develop an unconstrained generative adversarial training algorithm. We conduct a comprehensive evaluation of the algorithm's application to image generation and adversarial out-of-distribution detection. • Our comparative analysis of the maximin and minimax problem clarifies misconceptions and provides new insights into how they could be utilized to solve different problems.

2. RELATED WORK AND BACKGROUND

Generative adversarial networks (GANs) The GANs framework (Goodfellow et al., 2014) learns a generator function G and a discriminator function D by solving the following minimax problem min G max D V (D, G) = E x∼pdata [log D(x)] + E z∼pz [log(1 -D(G(z)))]. The generator G implicitly defines a distribution p g by mapping a prior distribution p z from a lowdimensional latent space Z ⊆ R z to the high-dimensional data space X ⊆ R d . D : X → [0, 1] is a function that discriminates the target data distribution p data from the generated distribution p g . The minimax problem is solved by alternating between the optimization of D and optimization of G; under certain conditions, the alternating training procedure converges to a solution where p g matches p data (Jensen-Shannon divergence is zero), and D outputs 1 2 on support of p data .

Generative adversarial training (GAT)

The GAT method (Yin et al., 2020) is designed for training adversarial examples detection and robust classification models. In a K class classification problem, the robust detection/classification system consists of K base detectors, with each one trained by minimizing the following objective L(D) = -E x∼p k [log D(x)] -E x∼p -k [log(1 -max x ∈B(x, ) D(x )))]. In the above objective, p k is k-th class's data distribution, p -k is the mixture distribution of all other classes: p -k = 1 K-1 i=1,...,K,i =k p i , and B(x, ) is a neighborhood of x: {x ∈ X : x -x 2 ≤ }. The objective is characterized by an inner maximization problem and an outer minimization problem; when the inner maximization is perfectly solved and D achieves a vanishing loss, D becomes a perfectly robust model capable of separating data p k , from any -constrained adversarial examples perturbed from data of p -k . A committee of K detectors then provides a complete solution for detecting any adversarial example perturbed from an arbitrary class. Objective 2 is solved using a alternating gradient method (Algorithm 1), with the first step crafting adversarial examples by solving the inner maximization, and the second step improving the D model on these adversarial examples. Clearly, the detector's robustness depends on how well the inner maximization is solved. Despite the fact that D is a highly non-concave function when it's parameterized by a deep neural network, Madry et al. (2017) showed that the inner problem could be reasonably solved using projected gradient descent (PGD attack) -a first-order method that employs the following iterative gradient update rule (at initialization x 0 ← x, we consider L 2 -based attack) x i+1 ← Proj(x i + γ ∇ log D(x i ) ∇ log D(x i ) 2 ), where λ is some step size, and Proj is the operation of projecting onto the feasible set B(x, ). The normalized steepest ascent rule inside the Proj function, was introduced for dealing with the issue of vanishing gradient when optimizing with the cross-entropy loss (Kolter & Madry, 2019) . The PGD attack also employs random restarting to improve its effectiveness. The idea is that for a input x, first generate a set of randomized inputs by uniformly sampling from B(x, ), perform PGD attack on each of them, and use the most effective one as the actual attack. A review of related work on out-of-distribution detection in provided in Appendix A.

