CCGAN: CONTINUOUS CONDITIONAL GENERATIVE ADVERSARIAL NETWORKS FOR IMAGE GENERATION

Abstract

This work proposes the continuous conditional generative adversarial network (CcGAN), the first generative model for image generation conditional on continuous, scalar conditions (termed regression labels). Existing conditional GANs (cGANs) are mainly designed for categorical conditions (e.g., class labels); conditioning on regression labels is mathematically distinct and raises two fundamental problems: (P1) Since there may be very few (even zero) real images for some regression labels, minimizing existing empirical versions of cGAN losses (a.k.a. empirical cGAN losses) often fails in practice; (P2) Since regression labels are scalar and infinitely many, conventional label input methods (e.g., combining a hidden map of the generator/discriminator with a one-hot encoded label) are not applicable. The proposed CcGAN solves the above problems, respectively, by (S1) reformulating existing empirical cGAN losses to be appropriate for the continuous scenario; and (S2) proposing a novel method to incorporate regression labels into the generator and the discriminator. The reformulation in (S1) leads to two novel empirical discriminator losses, termed the hard vicinal discriminator loss (HVDL) and the soft vicinal discriminator loss (SVDL) respectively, and a novel empirical generator loss. The error bounds of a discriminator trained with HVDL and SVDL are derived under mild assumptions in this work. A new benchmark dataset, RC-49, is also proposed for generative image modeling conditional on regression labels. Our experiments on the Circular 2-D Gaussians, RC-49, and UTKFace datasets show that CcGAN is able to generate diverse, high-quality samples from the image distribution conditional on a given regression label. Moreover, in these experiments, CcGAN substantially outperforms cGAN both visually and quantitatively.

1. INTRODUCTION

Conditional generative adversarial networks (cGANs), first proposed in (Mirza & Osindero, 2014) , aim to estimate the distribution of images conditioning on some auxiliary information, especially class labels. Subsequent studies (Odena et al., 2017; Miyato & Koyama, 2018; Brock et al., 2019; Zhang et al., 2019) confirm the feasibility of generating diverse, high-quality (even photo-realistic), and class-label consistent fake images from class-conditional GANs. Unfortunately, these cGANs do not work well for image generation with continuous, scalar conditions, termed regression labels, due to two problems: (P1) cGANs are often trained to minimize the empirical versions of their losses (a.k.a. the empirical cGAN losses) on some training data, a principle also known as the empirical risk minimization (ERM) (Vapnik, 2000) . The success of ERM relies on a large sample size for each distinct condition. Unfortunately, we usually have only a few real images for some regression labels. Moreover, since regression labels are continuous, some values may not even appear in the training set. Consequently, a cGAN cannot accurately estimate the image distribution conditional on such missing labels. (P2) In class-conditional image generation, class labels are often encoded by one-hot vectors or label embedding and then fed into the generator and discriminator by hidden concatenation (Mirza & Osindero, 2014) , an auxiliary classifier (Odena et al., 2017) or label projection (Miyato & Koyama, 2018) . A precondition for such label encoding is that the number of distinct labels (e.g., the number of classes) is finite and known. Unfortunately, in the continuous scenario, we may have infinite distinct regression labels. A naive approach to solve (P1)-(P2) is to "bin" the regression labels into a series of disjoint intervals and still train a cGAN in the class-conditional manner (these interval are treated as independent classes) (Olmschenk, 2019). However, this approach has four shortcomings: (1) our experiments in Section 4 show that this approach often makes cGANs collapse; (2) we can only estimate the image distribution conditional on membership in an interval and not on the target label; (3) a large interval width leads to high label inconsistency; (4) inter-class correlation is not considered (images in successive intervals have similar distributions). In machine learning, vicinal risk minimization (VRM) (Vapnik, 2000; Chapelle et al., 2001 ) is an alternative rule to ERM. VRM assumes that a sample point shares the same label with other samples in its vicinity. Motivated by VRM, in generative modeling conditional on regression labels where we estimate a conditional distribution p(x|y) (x is an image and y is a regression label), it is natural to assume that a small perturbation to y results in a negligible change to p(x|y). This assumption is consistent with our perception of the world. For example, the image distribution of facial features for a population of 15-year-old teenagers should be close to that of 16-year olds. We therefore introduce the continuous conditional GAN (CcGAN) to tackle (P1) and (P2). To our best knowledge, this is the first generative model for image generation conditional on regression labels. It is noted that Rezagholizadeh et al. ( 2018) and Rezagholiradeh & Haidar (2018) train GANs in an unsupervised manner and synthesize unlabeled fake images for a subsequent image regression task. Olmschenk et al. ( 2019) proposes a semi-supervised GAN for dense crowd counting. CcGAN is fundamentally different from these works since they do not estimate the conditional image distribution. Our contributions can be summarized as follows: • We propose in Section 2 the CcGAN to address (P1) and (P2), which consists of two novel empirical discriminator losses, termed the hard vicinal discriminator loss (HVDL) and the soft vicinal discriminator loss (SVDL), a novel empirical generator loss, and a novel label input method. We take the vanilla cGAN loss as an example to show how to derive HVDL, SVDL, and the novel empirical generator loss by reformulating existing empirical cGAN losses. • We derive in Section 3 the error bounds of a discriminator trained with HVDL and SVDL. • In Section 4, we propose a new benchmark dataset, RC-49, for the generative image modeling conditional on regression labels, since very few benchmark datasets are suitable for the studied continuous scenario. We conduct experiments on several datasets, and our experiments show that CcGAN not only generates diverse, high-quality, and label consistent images, but also substantially outperforms cGAN both visually and quantitatively.

2. FROM CGAN TO CCGAN

In this section, we provide the solutions (S1)-( S2) to (P1)-(P2) in a one-to-one manner by introducing the continuous conditional GAN (CcGAN). Please note that theoretically cGAN losses (e.g., the vanilla cGAN loss (Mirza & Osindero, 2014), the Wasserstein loss (Arjovsky et al., 2017) , and the hinge loss (Miyato et al., 2018) ) are suitable for both class labels and regression labels; however, their empirical versions fail in the continuous scenario (i.e., (P1)). Our first solution (S1) focuses on reformulating these empirical cGAN losses to fit into the continuous scenario. Without loss of generality, we only take the vanilla cGAN loss as an example to show such reformulation (the empirical versions of the Wasserstein loss and the hinge loss can be reformulated similarly). The vanilla discriminator loss and generator loss (Mirza & Osindero, 2014) 



are defined as:L(D) = -E y∼pr(y) E x∼pr(x|y) [log (D(x, y))] -E y∼pg(y) E x∼pg(x|y) [log (1 -D(x, y))]= -log(D(x, y))p r (x, y)dxdy -log(1 -D(x, y))p g (x, y)dxdy,

