DIFFERENTIAL-CRITIC GAN: GENERATING WHAT YOU WANT BY A CUE OF PREFERENCES

Abstract

This paper proposes Differential-Critic Generative Adversarial Network (DiCGAN) to learn the distribution of user-desired data when only partial instead of the entire dataset possesses the desired properties. Existing approaches select the desired samples first and train regular GANs on the selected samples to derive the userdesired data distribution. DiCGAN introduces a differential critic that can learn the preference direction from the pairwise preferences over the entire dataset. The resultant critic guides the generation of the desired data instead of the whole data. Specifically, apart from the Wasserstein GAN loss, a ranking loss of the pairwise preferences is defined over the critic. It endows the difference of critic values between each pair of samples with the pairwise preference relation. The higher critic value indicates that the sample is preferred by the user. Thus training the generative model for higher critic values encourages the generation of userpreferred samples. Extensive experiments show that our DiCGAN can learn the user-desired data distributions.

1. INTRODUCTION

Learning a good generative model for high-dimensional natural signals, such as images (Zhu et al., 2017 ), video (Vondrick et al., 2016) and audio (Fedus et al., 2018) has long been one of the key milestones of machine learning. Powered by the learning capabilities of deep neural networks, generative adversarial networks (GANs) (Goodfellow et al., 2014) have brought the field closer to attaining this goal. Currently, GANs are applied in a setting where the whole training dataset is of user interest. Therefore, regular GANs no longer meet our requirement when only partial instead of the entire training dataset possesses the desired properties (Killoran et al., 2017) . It is more challenging when the given dataset has a small number of desired data. Adapting vanilla GAN to this setting, a naive way is to first select the samples possessing the desired properties and then perform regular GAN training only on the selected samples to derive the desired distribution. However, vanilla GAN fails when the desired samples are limited. FBGAN overcomes the limited data problem by iteratively introducing desired samples from the generation into the training data. Specifically, FBGAN is pretrained with all training data using the vanilla GAN. In each training epoch, the generator first generates certain amounts of samples. The generated samples possessing the desired properties are selected by an expert selector and used to replace the old training data. Then, regular WGAN is trained with the updated training data. Since the ratio of the desired samples gradually increases in the training data, all training data will be replaced with the desired samples. Finally, FBGAN would derive the desired distribution when convergence. However, bluntly eliminating undesired samples may lead to a biased representation of the real desired data distribution. Because the undesired samples can also reveal useful clues about what is not desired. Suppose we want to generate old face images, however the training data contains only a few old face images whereas it has many young face images. In this case, the young face images can be used as negative sampling (Mikolov et al., 2013) to learn the subtle aging features (e.g. wrinkles, pigmented skin, etc.), which guides the generation of the desired old face images. The conditional variants of GAN, such as CGAN (Mirza and Osindero, 2014) and ACGAN (Odena et al., 2017 ) can be also applied in this setting by introducing condition variables to model the conditional desired data distribution. However, the generation performance of condition-based GAN is governed by the respective conditions with sufficient training observations. When the desired data is limited, the conditional modeling is dominated by the major classes, i.e., undesired data, resulting in a failure to capture the desired distribution. All the literature methods require user-defined criteria to select the desired data in order to learn the distribution of the desired data, which not exist in real applications. Instead of soliciting a ready-to-use criteria, we consider a more general setting where GAN can be guided towards the distribution of user-desired data by the user preference. In particular, pairwise preferences are the most popular form of user preference due to their simplicity and easy accessibility (Lu and Boutilier, 2011). Therefore, our target is to incorporate pairwise preferences into the learning process of GAN, so as to guide the generation of the desired data. Relativistic GAN (RGAN) (Jolicoeur-Martineau, 2019) is a variant of regular GAN and is proposed to learn the whole data distribution. It considers the critic value as the indicator of sample quality and defines the discriminator using the difference in the critic values. The critic value in RGAN is similar to the ranking score, but it is used to describe sample quality. Motivated by this, we consider taking the critic value as the ranking score and define the ranking loss for pairwise preferences based on the critic value directly. In particular, the difference in critic values for each pair of samples reflects the user's preference over the samples. This is why we call our critic the differential critic, and we propose Differential-Critic GAN (DiCGAN) for learning the user-desired data distribution. As shown in Fig. 1 , the differential critic incorporates the user preference direction, which pushes the original critic direction towards the real desired data region instead of the entire real data region. The main contributions are summarized as follows: • We propose DiCGAN to learn the distributions of the desired data from the entire data using pairwise preferences. To the best of our knowledge, this is the first work to promote the ratio of the desired data by incorporating user preferences directly into the data generation. • We introduce the differential critic by defining an additional pairwise ranking loss on the WGAN's critic. It endows the difference in the critic values between each pair of samples with user preferences. • The empirical study shows that DiCGAN learns the distribution of user-desired data and the differential critic can derive the preference direction even from a limited umber of preferences.

2. GENERATIVE ADVERSARIAL NETWORKS

Generative Adversarial Network (GAN) (Goodfellow et al., 2014) performs generative modeling by learning a map from low-dimensional input space Z to data space X , i.e., G θ : Z → X , given samples from the training data distribution, namely, x ∼ p r (x). The goal is to find θ which achieves p θ (x) = p r (x), where p θ (x) is the fake data distribution x = G θ (z). Let p(z) be the input noise distribution and G indicate G θ . GAN defines a discriminator D that is trained to discriminate real data from fake data to guide the learning of G. Wasserstein GAN (WGAN) (Arjovsky et al., 2017) proposes to use the Wasserstein metric as a critic, which measures the quality of fake data in terms of the distance between the real data distribution and the fake data distribution. The Wasserstein distance (W-distance) is approximated by the difference in the average critic values between the real data and the fake data. The empirical experiments show



Figure 1: Illustration of why DiCGAN can learn the user-desired data distribution. (a) DiCGAN's critic pushes fake data towards the real desired data while WGAN's critic pushes fake data towards the entire real data. (b) The change of DiCGAN's critic direction is driven by the preference direction. Note that the preference direction is learned from all pairwise preferences.

