ADVERSARIAL PROBLEMS FOR GENERATIVE NET-WORKS

Abstract

We are interested in the design of generative networks. The training of these mathematical structures is mostly performed with the help of adversarial (min-max) optimization problems. We propose a simple methodology for constructing such problems assuring, at the same time, consistency of the corresponding solution. We give characteristic examples developed by our method, some of which can be recognized from other applications and some are introduced here for the first time. We compare various possibilities by applying them to well known datasets using neural networks of different configurations and sizes.

1. INTRODUCTION

The problem we are interested in, can be summarized as follows: We are given two collections of training data {z j } and {x i }. In the first set the samples follow the origin probability density h(z) and in the second the target density f (x). The target density f (x) is considered unknown while h(z) can either be known with the possibility to produce samples z j every time it is necessary or unknown in which case we have a second fixed training set {z j }. Our goal is to design a deterministic transformation G(z) so that the data {y j } produced by applying the transformation y = G(Z) onto {z j } follow the target density f (y). Of course one may wonder whether the proposed problem enjoys any solution, namely, whether there indeed exists a transformation G(z) capable of transforming z into y with the former following the origin density h(z) and the latter the target density f (y). The problem of transforming random vectors has been analyzed in Box & Cox (1964) where existence is shown under general conditions. Computing, however, the actual transformation is a completely different challenge with one of the possible solutions relying on adversarial approaches applied to neural networks. The most well known usage of this result is the possibility to generate synthetic data that follow the unknown target density f (x). In this case h(z) is selected to be simple (e.g. i.i.d. standard Gaussian or i.i.d. uniform) so that generating realizations from h(z) is straightforward. As mentioned, the adversarial approach can be applied even if the origin density h(z) is unknown provided that we have a dataset {z j } with data following the origin density. Our class of generative adversarial problems establishes a one-to-one correspondence with f-gans under the ideal (non data-driven) setup. However, we believe that our approach enjoys certain signif-icant advantages: First, the definition of the two functions φ(z), ψ(z) in Equ. equation 8 is straightforward while Nowozin et al. (2016) requires to solve an additional optimization problem for the derivation of each GAN loss. An additional benefit of our approach is the complete control over the result of the maximization problem that defines the discriminator. In other words, we can decide what function, the discriminator must estimate. In Nowozin et al. ( 2016) such flexibility does not exist. This is important because we can select the approximation function properly so that we avoid the need to impose difficult constraints on the discriminator output (e.g. positivity) since such constraints tend to seriously affect the approximation quality of the corresponding neural network. Further, there is no need for the discriminator to be a Lipschitz function, as WGAN or WGAN-GP something that needs extra operations to ensure that the discriminator is Lipschitz. Furthermore, we will show that the function the discriminator tries to approximate is a transformation of the likelihood ratio r(x) = g(x)/f (x) and there are important applications in Statistics where one is interested in estimating only the transformation of the likelihood ratio with the most common cases being the likelihood ratio itself, its logarithm (log-likelihood ratio), or the ratio r(x) 1+r(x) which plays the role of the posterior probability between two densities. In other words, there are applications where one is interested only in the "max" part of the min-max problem. Finally, because we know what transformation of the likelihood function the discriminator tries to approximate, it is possible to compare the different GANs on how closely they reach the optimal value of the likelihood ratio r(x) = 1 meaning f (x) = g(x). As in Nowozin et al. ( 2016), we will show that our methods provides an abundance of adversarial problems that are capable of identifying the appropriate transformation G(z). Furthermore, we will also provide a simple recipe as to how we can successfully construct such problems. Arguing along the same lines of the existing min-max formulations: We would like to optimally specify a vector transformation G(z), the generator, and a scalar function D(x), the discriminator. To achieve this, for each combination {G(z), D(x)} we define the cost function (2) J(G, D) = E x∼f φ D(x) + E z∼h ψ D(G(z)) We must point out that our goal is not to solve equation 2, but rather find a class of functions φ(z), ψ(z) so that the transformation G(z) that will come out of the solution of equation 2 is such that y = G(z) follows the target density f (y) when z follows the origin density h(z). If z is random following h(z) then y = G(z) is also random and we denote with g(y) its corresponding probability density. Clearly, there exists a correspondence between transformations G(z) and densities g(y) when the density h(z) of z is fixed. Since we can write E z∼h ψ D(G(z)) = E y∼g ψ D(y) , this allows us to argue that the min-max problem in equation 2 is equivalent to min g(y) max D(x) E x∼f φ D(x) + E y∼g ψ D(y) It is now possible to combine the two expectations by applying a change of measure and a change of variables and equivalently write equation 3 as follows: (4)



It wasGoodfellow et al. (2014)  that first introduced the idea of adversarial (min-max) optimization and demonstrated that it results in the determination of the desired transformation G(z) (consistency). Alternative adversarial approaches were subsequently suggested by Martin Arjovsky & Bottou (2017); Bińkowski et al. (2018) and shown to also deliver the correct transformation G(z). We must mention the work of Nowozin et al. (2016) in which a class of min-max optimizations, f-GANs, was defined to design generator/discriminator pairs. Then, Liu et al. (2017) defined the adversarial divergences class of objective function which further combined f-GANs, MMD-GAN (Li et al., 2017), WGAN, WGAN-GP (Gulrajani et al., 2017), and entropic regularized optimal transport problems. Also, they investigated under what conditions the discriminator's class has the effect of matching generalized moments. Next, the work of Song & Ermon (2019) connected f-GANs and Wasserstein GANs (WGANs) (Martin Arjovsky & Bottou, 2017), and later Birrell et al. (2020) generalized the results by introducing the (f, Γ)-divergencies which allowed to bridge fdivergencies and integral probability metrics.

where φ(z), ψ(z) are two scalar functions of the scalar z and E x∼f [•], E z∼h [•] denote expectation with respect to the density f (x), h(z) respectively. The optimum combination generator/discriminator is then identified by solving the following min-max problem min x∼f φ D(x) + E z∼h ψ D(G(z)) .

x∼f φ D(x) + E x∼f r(x)ψ D(x) } = min g(x) max D(x) E x∼f φ D(x) + r(x)ψ D(x) }where r(x) = g(x)/f (x) denotes the corresponding likelihood ratio. Since f (x) is also fixed, there is again a correspondence between r(x) and g(x), hence the previous min-max problem becomes equivalent to minr(x)∈L f max D(x)E x∼f φ D(x) + r(x)ψ D(x) .

