SEARCHING TOWARDS CLASS-AWARE GENERATORS FOR CONDITIONAL GENERATIVE ADVERSARIAL NETWORKS

Abstract

Conditional Generative Adversarial Networks (cGAN) were designed to generate images based on the provided conditions, e.g., class-level distributions. However, existing methods have used the same generating architecture for all classes. This paper presents a novel idea that adopts NAS to find a distinct architecture for each class. The search space contains regular and class-modulated convolutions, where the latter is designed to introduce class-specific information while avoiding the reduction of training data for each class generator. The search algorithm follows a weight-sharing pipeline with mixed-architecture optimization so that the search cost does not grow with the number of classes. To learn the sampling policy, a Markov decision process is embedded into the search algorithm and a moving average is applied for better stability. We evaluate our approach on CIFAR10 and CIFAR100. Besides achieving better image generation quality in terms of FID scores, we discover several insights that are helpful in designing cGAN models.

1. INTRODUCTION

Generative Adversarial Network (GAN) (Goodfellow et al., 2014) has attracted considerable attention and achieved great success in image generation. Conditional GAN (cGAN) (Mirza & Osindero, 2014 ) is a type of GAN using class information to guide the training of the discriminator and generator so that it usually obtains a better generation effect. Most cGANs incorporate class information into the generator through Conditional Batch Normalization (CBN) (de Vries et al., 2017) , or into the discriminator through projection discriminator (Miyato & Koyama, 2018) , multi-hinge loss (Kavalerov et al., 2019 ), auxiliary loss (Odena et al., 2017) , etc. In this paper, we investigate the possibility of designing class-aware generators for cGAN (i.e., using a distinct generator network architecture for each class). To automatically design class-aware generators, we propose a neural architecture search (NAS) algorithm on top of reinforcement learning so that the generator architecture of each class is automatically designed. However, as the number of classes increases, there are three main issues we have to consider. First, the search space will grow exponentially as the number of categories grows (i.e., combinatorial explosion). Second, training the generator separately for each class is prone to insufficient data (Karras et al., 2020) . Furthermore, searching and re-training each generator one by one may be impractical when the number of generators is large. We propose solutions for these challenges. First, we present a carefully designed search space that is both flexible and safe. We refer to flexibility as the ability to assign a distinct generator architecture to each class, which makes the search space exponentially large while its size is still controllable. To guarantee the safety (i.e., enable the limited amount of training data to be shared among a large number of generators), we introduce a new operator named Class-Modulated convolution (CMconv). CMconv shares the same set of convolutional weights with a regular convolution but is equipped with a standalone set of weights to modulate the convolutional weights, allowing the training data to be shared among different architectures and thus alleviating the inefficiency on training data. Second, to make the procedure of search and re-training as simple as possible, we develop mixed-architecture optimization, such that the training procedure of multiple class-aware generators is as simple as that of training only one generator. Integrating these modules produces the proposed Multi-Net NAS (MN-NAS). To the best of our knowledge, this is the first method that can produce a number of generator architectures, one for each class, through one search procedure. Figure 1 shows the overall framework of MN-NAS. It applies a Markov decision process equipped with moving average as the top-level logic for sampling and evaluating candidate architectures. After the search procedure, the optimal architecture for each class is determined and they get re-trained and calibrated for better image generation performance. We perform experiments on some popular benchmarks, including the CIFAR10 and CIFAR100 datasets that have different numbers of classes. We achieve FID scores of 5.85 and 12.28 on CI-FAR10 and CIFAR100 respectively, which are comparable to state-of-the-art results. In addition to achieving good performance, our method has given us some inspiration. For example, we find the phenomenon that the coordination between the discriminator and generator is very important (i.e., to derive distinct class-aware generators, the discriminator must also be class-aware). More interestingly, by analyzing the best model found by NAS, we find that the class-modulated convolution is more likely to appear in the early stage (close to the input noise) of the generator. We think this phenomenon is related to the semantic hierarchy of GANs (Bau et al., 2018; Yang et al., 2020) . We apply this finding as an empirical rule to BigGAN (Brock et al., 2018) , and also observe performance gain. This implies that our algorithm delivers useful and generalized insights to the design of cGAN models. We will release code and pre-trained models to facilitate future research.

2. RELATED WORK

Generative Adversarial Network (GAN) (Goodfellow et al., 2014) have demonstrated impressive generation capabilities (Karras et al., 2017; Brock et al., 2018; Karras et al., 2019a) . Nevertheless, it has notorious issues like vanishing gradient, training instability, and mode collapse. There are a number of improvements for the original GAN, e.g., changing the objective function (Arjovsky et al., 2017; Gulrajani et al., 2017; Mao et al., 2016; Jolicoeur-Martineau, 2019; Qi, 2017) , improving network architecture (Radford et al., 2015; Brock et al., 2018; Karras et al., 2019a; Denton et al., 2015; Zhang et al., 2018; Karnewar & Wang, 2019) , using multiple generators or discriminators (Tolstikhin et al., 2017; Hoang et al., 2018; Arora et al., 2017; Durugkar et al., 2017; Ghosh et al., 2018; Nguyen et al., 2017) . Recently, the surge in neural architecture search (NAS) has triggered a wave of interest in automatically designing the network architecture of GAN (Wang & Huan, 2019; Gong et al., 2019; Tian et al., 2020b; Gao et al., 2019; Tian et al., 2020a; Li et al., 2020; Fu et al., 2020; Kobayashi & Nagao, 2020) . Conditional 



Figure 1: Illustration of Single-Net RL-based NAS and our Multi-Net RL-based NAS. Class-aware generators allow us to discover statistical laws by analyzing multiple network architectures. However, this cannot be achieved by class-agnostic generator because only one generator is searched (i.e., one sample), so we cannot get more information for architecture design.

GAN (cGAN)  (Mirza & Osindero, 2014)  is another type of GAN that incorporates class information into the original GAN, so that achieving promising results for the class-sensitive image generation task. Most of the early methods just incorporated the class information by concatenation(Mirza & Osindero, 2014; Reed et al., 2016). AC-GAN(Odena et al., 2017)  incorporated the label information into the objective function of the discriminator by an auxiliary classifier. Miyato & Koyama (2018) proposed the class-projection (cproj) discriminator, which injected class information into the discriminator in a projection-based way. Furthermore, conditional batch normalization (CBN) (de Vries et al., 2017) is a very effective method to modulate convolutional feature maps by conditional information. Subsequently, cproj and CBN are widely used together, forming some powerful cGANs for class image generation(Zhang et al., 2018; Brock et al., 2018).

