COMMUTE-GAN: COMPTETITIVE MULTIPLE EFFI-CIENT GENERATIVE ADVERSARIAL NETWORKS

Abstract

In complex creative scenarios, co-creativity by multiple agents offers great advantages. Each agent has a specific skill set and a set of abilities, which is generally not enough to perform a large and complex task single-handed. These kinds of tasks benefit substantially from collaboration. In deep learning applications, data generation is an example of such a complex, potentially multi-modal task. Previous Generative Adversarial Networks (GANs) focused on using a single generator to generate multi-modal datasets, which is sometimes known to face issues such as mode-collapse and failure to converge. Single generators also have to be very large so that they can generalize complex datasets, so this method can easily run into memory constraints. The current multi-generator based works such as MGAN, MMGAN, MADGAN and AdaGAN either require training a classifier online, the use of complex mixture models or sequentially adding generators, which is computationally complex. In this work, we present a simple, novel approach of training competitive multiple efficient GANs (ComMutE-GANs), with multiple generators and a single critic/discriminator, without introducing external complexities such as a classifier model. We introduce a new component to the generator loss during GAN training, based on the Total Variation Distance (TVD). Our method offers a robust, stable, memory efficient and easily parallelizable architecture. We present a proof-of-concept on the MNIST dataset, which has 10 modes of data. The individual generators learn to generate different digits from the distribution, and together learn to generate the whole distribution. We compare ComMutE-GANs with larger single-generator GANs and show its memory efficiency and increased accuracy.

1. INTRODUCTION

With respect to human beings, "Creators" refer to any and all who engage in creative thinking. When people learn about new topics, they create cognitive structures that allow them to understand the topics; they generate concepts that are new to them, although possibly already very well known to others. This is creativity at a strictly intra-personal level. When working in a social setting, such as a company or a classroom, one has to broaden this horizon to include "Co-creativity"."Creativity through collaboration" summarizes the definition of co-creativity as defined by Lubart (2017). Often, collaborators have different or complementary skills that enable them to frequently produce shared creations that they could not or would not produce on their own (Lubart & Thornhill-Miller, 2020) . AI-aided co-creation has also been proven to improve general well-being (Yu et al., 2021) . Generative Adversarial Nets (GANs) are implicit generative models where one or more generators play a zero-sum game with a discriminator to recreate and potentially expand a chosen dataset. According to the definition established above, the generator models in modern day GANs such as those described in the works of Karras et al. 2017) exhibit creativity on an intra-personal level. Accordingly, generative networks have been applied in many creative applications such as painting (Ganin et al., 2018; Mellor et al., 2019; Parikh & Zitnick, 2020) , doodling (Ha & Eck, 2017; Cao et al., 2019) and extending fictional languages (Zacharias et al., 2022) . Most noticeable, in all the applications listed above, a single, large, generative agent was applied to perform a complex task rather than breaking it down into smaller, more easily manageable sub-tasks. This approach, although effective, is upper-bounded by memory constraints. Inspiration from co-creativity research aims to resolve these constraints. Other implementations of GANs that use collaboration, such as MGAN (Hoang et al., 2017) , MM-GAN (Pandeva & Schubert, 2019) , MADGAN (Ghosh et al., 2018) and AdaGAN (Tolstikhin et al., 2017) try to rectify the missing co-creativity functionality of GANs by using a mixture of multiple generators, modeling the input latent space as a mixture model, requiring the discriminator to classify as well and sequentially train and add generators to the mixture, respectively. MGAN and MADGAN require the discriminator to have classification capabilities and both implementations force the generators to generate different modes. The AdaGAN implementation poses a problem of computational complexity because of the sequential training nature. MMGAN on the other hand focuses on using mixtures at the input latent space level rather than on separating generators. Our work stands to provide an easier approach to co-creativity than the ones presented above. It does not require the online training of a classifier, we do use a pre-trained MNIST classifier to augment the generator loss with the Total Variation Distance (TVD) to enforce mode separation during training, but mode separation maybe enforced using other methods, not requiring this pre-trained classifier at all. We also show that this is easily and efficiently scalable to more than two generators, which can all be trained in parallel without making complex transformations to the latent space. Analogous to human behaviour in a social setting, collaboration among two or more such generators allows each individual generator to focus on and specialize in a specific sub-task, making the training process more stable, more efficient (memory-wise), the generated images clearer and the distribution of the generated images closer to the actual distribution of the chosen MNIST dataset.

2.1. GANS, WGANS AND WGAN-GP

Generative Adversarial Nets (GANs) (Goodfellow et al., 2014) consist of two neural networks: a generator (G(z)) that takes in randomly sampled noise, and generates fake images; a discriminator (D(x)), that takes in batches of real and fake data points and outputs a 0 for fake images or 1 for real ones. They minimize the well-known GAN objective function min G max D E x∼Pr [log(D(x)] + E x∼Pg [log(1 -D(x)] Where x = G(z), z ∼ P z , P r is the data distribution, P g is the generator distribution and G and D are the generator function and discriminator function, respectively. A variation of this method is the Deep Convolution GAN (DCGAN) (Radford et al., 2015) , that uses Convolutional Neural Networks (CNNs), to improve image data generation significantly. Wasserstein GAN (WGAN) (Arjovsky et al., 2017) improves vanilla GANs by presenting and rectifying problems with the original GAN objective function. In WGANs, the Wasserstein distance between real data and generated data is minimized by optimizing over the objective min G max D∈D E x∼Pr [D(x)] -E x∼Pg [D(x)], The WGAN-GP method (Gulrajani et al., 2017) is an improvement over this method which uses their novel gradient penalty loss, instead of the former weight clipping to implicitly enforce the Lipschitz constraints. The new objective function solved by WGAN-GP is given by min G max D∈D E x∼Pr [D(x)] -E x∼Pg [D(x)] + λ * E x∼P x [(||∇ xD(x)|| 2 -1) 2 ], where the first two terms are the same as equation ( 2) and the third one is the gradient penalty (GP) term. Here, 2020) that provide state-of-the-art performance on large-scale image synthesis improve these basic methods by making their models larger and training procedures more complex. Our method explores another direction, namely, more instances of compact generators competing with each other. The idea takes inspiration from social dynamics and co-creativity research.



(2018); Sauer et al. (2022); Karras et al. (2020); Goodfellow et al. (2014); Radford et al. (2015); Arjovsky et al. (2017); Gulrajani et al. (

x = ϵx + (1 -ϵ)x and ϵ ∼ U [0, 1]. P x is the probability distribution associated with x. Modern GANs such as Karras et al. (2018); Sauer et al. (2022); Karras et al. (

