SINGLE-LEVEL ADVERSARIAL DATA SYNTHESIS BASED ON NEURAL TANGENT KERNELS Anonymous

Abstract

Generative adversarial networks (GANs) have achieved impressive performance in data synthesis and have driven the development of many applications. However, GANs are known to be hard to train due to their bilevel objective, which leads to the problems of convergence, mode collapse, and gradient vanishing. In this paper, we propose a new generative model called the generative adversarial NTK (GA-NTK) that has a single-level objective. The GA-NTK keeps the spirit of adversarial learning (which helps generate plausible data) while avoiding the training difficulties of GANs. This is done by modeling the discriminator as a Gaussian process with a neural tangent kernel (NTK-GP) whose training dynamics can be completely described by a closed-form formula. We analyze the convergence behavior of GA-NTK trained by gradient descent and give some sufficient conditions for convergence. We also conduct extensive experiments to study the advantages and limitations of GA-NTK and propose some techniques that make GA-NTK more practical. 1

1. INTRODUCTION

Generative adversarial networks (GANs) (Goodfellow et al., 2014; Radford et al., 2016) , a branch of deep generative models based on adversarial learning, have received much attention due to their novel problem formulation and impressive performance in data synthesis. Variants of GANs have also driven recent developments of many applications, such as super-resolution (Ledig et al., 2017) , image inpainting (Xu et al., 2014), and video generation (Vondrick et al., 2016) . A GANs framework consists of a discriminator network D and a generator network G parametrized by θ D and θ G , respectively. Given a d-dimensional data distribution P data and a c-dimensional noise distribution P noise , the generator G maps a random noise z ∈ R c to a point G(z) ∈ R d in the data space, while the discriminator D takes a point x ∈ R d as the input and tells whether x is real or fake, i.e., D(x ) = 1 if x ∼ P data and D(x ) = 0 if x ∼ P gen , where P gen is the distribution of G(z) and z ∼ P noise . The objective of GANs is typically formulated as a bilevel optimization problem: arg min θ G max θ D E x∼Pdata [log D(x)] + E z∼Pnoise [log(1 -D(G(z)))]. The discriminator D and generator G aim to break each other through the inner max and outer min objectives, respectively. The studies by Goodfellow et al. ( 2014); Radford et al. (2016) show that this adversarial formulation can lead to a better generator that produces plausible data points/images. However, GANs are known to be hard to train due to the following issues (Goodfellow, 2016) . Failure to converge. In practice, Eq. ( 1) is usually only approximately solved by an alternating first-order method such as the alternating stochastic gradient descent (SGD). The alternating updates for θ D and θ G may cancel each other's progress. During each alternating training step, it is also tricky to balance the number of SGD updates for θ D and that for θ G , as a too small or large number for θ D leads to low-quality gradients for θ G . Mode collapse. The alternating SGD is attracted by stationary points and therefore is not good at distinguishing between a min θ G max θ D problem and a max θ D min θ G problem. When the solution to the latter is returned, the generator tends to always produce the points at modes that best deceive the discriminator, making P gen of low 1 Our code is available on GitHub at https://github.com/ga-ntk/ga-ntk. 1

