DO-GAN: A DOUBLE ORACLE FRAMEWORK FOR GENERATIVE ADVERSARIAL NETWORKS

Abstract

In this paper, we propose a new approach to train Generative Adversarial Networks (GANs) where we deploy a double-oracle framework using the generator and discriminator oracles. GAN is essentially a two-player zero-sum game between the generator and the discriminator. Training GANs is challenging as a pure Nash equilibrium may not exist and even finding the mixed Nash equilibrium is difficult as GANs have a large-scale strategy space. In DO-GAN, we extend the double oracle framework to GANs. We first generalize the player strategies as the trained models of generator and discriminator from the best response oracles. We then compute the meta-strategies using a linear program. Next, we prune the weaklydominated player strategies to keep the oracles from becoming intractable. We apply our framework to established architectures such as vanilla GAN, Deep Convolutional GAN, Spectral Normalization GAN and Stacked GAN. Finally, we conduct evaluations on MNIST, CIFAR-10 and CelebA datasets and show that DO-GAN variants have significant improvements in both subjective qualitative evaluation and quantitative metrics, compared with their respective GAN architectures.

1. INTRODUCTION

Generative Adversarial Networks (GANs) (Goodfellow et al., 2014) have been applied in various domains such as image and video generation, image-to-image translation and text-to-image synthesis (Liu et al., 2017; Reed et al., 2016) . Various architectures are proposed to generate more realistic samples (Radford et al., 2015; Mirza & Osindero, 2014; Pu et al., 2016) as well as regularization techniques (Arjovsky et al., 2017; Miyato et al., 2018b) . From the game-theoretic perspective, GANs can be viewed as a two-player game where the generator samples the data and the discriminator classifies the data as real or generated. The two networks are alternately trained to maximize their respective utilities until convergence corresponding to a pure Nash Equilibrium (NE). However, pure NE cannot be reliably reached by existing algorithms as pure NE may not exist (Farnia & Ozdaglar, 2020; Mescheder et al., 2017) . This also leads to unstable training in GANs depending on the data and the hyperparameters. Therefore, mixed NE is a more suitable solution concept (Hsieh et al., 2019) . Several recent works propose mixture architectures with multiple generators and discriminators that consider mixed NE such as MIX+GAN (Arora et al., 2017) and MGAN (Hoang et al., 2018) . MIX+GAN and MGAN cannot guarantee to converge to mixed NE. Mirror-GAN (Hsieh et al., 2019) finds the mixed NE by sampling over the infinite-dimensional strategy space and proposes provably convergent proximal methods. However, the sampling approach may not be efficient as mixed NE may only have a few strategies in the support set. Double Oracle (DO) algorithm (McMahan et al., 2003) is a powerful framework to compute mixed NE in large-scale games. The algorithm starts with a restricted game with a small set of actions and solves it to get the NE strategies of the restricted game. The algorithm then computes players' best-responses using oracles to the NE strategies and add them into the restricted game for the next iteration. DO framework has been applied in various disciplines (Jain et al., 2011; Bošanský et al., 2013) , as well as Multi-agent Reinforcement Learning (MARL) settings (Lanctot et al., 2017) . Inspired by the successful applications of DO framework, we, for the first time, propose a Double Oracle Framework for Generative Adversarial Networks (DO-GAN). This paper presents four key contributions. First, we treat the generator and the discriminator as players and obtain the best responses from their oracles and add the utilities to a meta-matrix. Second, we propose a linear program to obtain the probability distributions of the players' pure strategies (meta-strategies) for the respective oracles. The linear program computes an exact mixed NE of the meta-matrix game in polynomial time. Third, we propose a pruning method for the support set of best response strategies to prevent the oracles from becoming intractable as there is a risk of the meta-matrix growing very large with each iteration of oracle training. Finally, we provide comprehensive evaluation on the performance of DO-GAN with different GAN architectures using both synthetic and real-world datasets. Experiment results show that DO-GAN variants have significant improvements in terms of both subjective qualitative evaluation and quantitative metrics.

2. RELATED WORKS

In this section, we briefly introduce existing GAN architectures, double oracle algorithm and its applications such as policy-state response oracles that are related to our work. GAN Architectures. Various GAN architectures have been proposed to improve the performance of GANs. Deep Convolutional GAN (DCGAN) (Radford et al., 2015) replaces fully-connected layers in the generator and the discriminator with deconvolution layer of Convolutional Neural Networks (CNN). Weight normalization techniques such as Spectral Normalization GAN (SNGAN) (Miyato et al., 2018a) stabilize the training of the discriminator and reduce the intensive hyperparameters tuning. There are also multi-model architectures such as Stacked Generative Adversarial Networks (SGAN) (Huang et al., 2017) that consist of a top-down stack of generators and a bottom-up discriminator network. Each generator is trained to generate lower-level representations conditioned on higher-level representations that can fool the corresponding representation discriminator. Training GANs is very hard and unstable as pure NE for GANs might not exist and cannot be reliably reached by the existing approaches (Mescheder et al., 2017) . Considering mixed NE, MIX+GAN (Arora et al., 2017) maintains a mixture of generators and discriminators with the same network architecture but have their own trainable parameters. However, training a mixture of networks without parameter sharing makes the algorithm computationally expensive. Mixture Generative Adversarial Nets (MGAN) (Hoang et al., 2018) propose to capture diverse data modes by formulating GAN as a game between a classifier, a discriminator and multiple generators with parameter sharing. However, MIX+GAN and MGAN cannot converge to mixed NE. Mirror-GAN (Hsieh et al., 2019) finds the mixed NE by sampling over the infinite-dimensional strategy space and proposes provably convergent proximal methods. The sampling approach may be inefficient to compute mixed NE as the mixed NE may only have a few strategies with positive probabilities in the infinite strategy space. Double Oracle Algorithm. Double Oracle (DO) algorithm starts with a small restricted game between two players and solves it to get the player strategies at NE of the restricted game. The algorithm then exploits the respective best response oracles for additional strategies of the players. The DO algorithm terminates when the best response utilities are not higher than the equilibrium utility of the current restricted game, hence, finding the NE of the game without enumerating the entire strategy space. Moreover, in two-player zero-sum games, DO converges to a min-max equilibrium (McMahan et al., 2003) . DO framework is used to solve large-scale normal-form and extensive-form games such as security games (Tsai et al., 2012; Jain et al., 2011) , poker games (Waugh et al., 2009) and search games (Bosansky et al., 2012) . DO framework is also used in MARL settings (Lanctot et al., 2017; Muller et al., 2020) . Policy-Space Response Oracles (PSRO) generalize the double oracle algorithm in a multi-agent reinforcement learning setting (Lanctot et al., 2017) . PSRO treats the players' policies as the best responses from the agents' oracles, builds the meta-matrix game and computes the mixed NE but it uses Projected Replicator Dynamics that update the changes in the probability of each player's policy at each iteration. Since the dynamics need to simulate the update for several iterations, the use of dynamics takes a longer time to compute the meta-strategies and does not guarantee to compute an exact NE of the meta-matrix game. However, in DO-GAN, we can use a linear program to compute the players' meta-strategies in polynomial time since GAN is a two-player zero-sum game (Schrijver, 1998) .

3. PRELIMINARIES

In this section, we mathematically explain the preliminary works that are needed to explain our DO-GAN approach including generative adversarial networks and game theory concepts such as normal-form game and double oracle algorithm.

3.1. GENERATIVE ADVERSARIAL NETWORKS

Generative Adversarial Networks (GANs) (Goodfellow et al., 2014) have become one of the dominant methods for fitting generative models to complicated real-life data. GANs are deep neural net architectures comprised of two neural networks trained in an adversarial manner to generate data that resembles a distribution. The first neural network, a generator G, is given some random distribution p z (z) on the input noise z and a real data distribution p data (x) on training data x. The generator is supposed to generate as close as possible to p data (x). The second neural network, a discriminator D, is to discriminate between two different classes of data (real or fake) from the generator. Let the generator's differentiable function be denoted as G(z, π g ) and similarly D(x, π d ) for the discriminator, where G and D are two neural networks with parameters π g and π d . Thus, D(x) represents the probability that x comes from the real data. The generator loss L G and the discriminator loss L D are defined as: L D = E x∼p data (x) [-log D(x)] + E z∼pz(z) [-log(1 -D(G(z))], (1) L G = E z∼pz(z) [log(1 -D(G(z))]. (2) GAN is then set up as a two-player zero-sum game between G and D as follows: min G max D E x∼p data (x) [log D(x)] + E z∼pz(z) [log(1 -D(G(z))]. During training, the parameters of G and D are updated alternately until we reach the global optimal solution D(G(z)) = 0.5. Next, we let Π g and Π d be the set of parameters for G and D, considering the set of probability distributions σ g and σ d , the mixed strategy formulation (Hsieh et al., 2019) is: min σg max σ d Eπ d ∼σ d E x∼p data (x) [log D(x, π d )] + Eπ d ∼σ d Eπ g ∼σg E z∼pz(z) [log(1 -D(G(z, πg), π d )]. (4) Similarly to GANs, DCGAN, SNGAN and SGAN can also be viewed as two-player zero-sum games with mixed strategies of the players. DCGAN modifies the vanilla GAN by replacing fully-connected layers with the convolutional layers. SGAN trains multiple generators and discriminators using the loss as a linear combination of 3 loss terms: adversarial loss, conditional loss and entropy loss.

3.2. NORMAL FORM GAME AND DOUBLE ORACLE ALGORITHM

A normal-form game is a tuple (Π, U, n) where n is the number of players, Π = (Π 1 , . . . , Π n ) is the set of strategies for each player i ∈ N, where N = {1, . . . , n} and U : Π → R n is a payoff table of utilities R for each joint policy played by all players. Each player chooses the strategy to maximize own expected utility from Π i , or by sampling from a distribution over the set of strategies σ i ∈ ∆(Π i ). We can use linear programming, fictitious play (Berger, 2007) or regret minimization (Roughgarden, 2010) to compute the probability distribution over players' strategies. In the Double Oracle (DO) algorithm (McMahan et al., 2003) , there are two best response oracles for the row and column player respectively. The algorithm creates restricted games from a subset of strategies at the point of each iteration t for row and column players, i.e., Π t r ⊂ Π r and Π t c ⊂ Π c as well as a meta-matrix U t at the t th iteration. We then solve the meta-matrix to get the probability distributions on Π t r and Π t c . Given a probability distribution σ c of the column player strategies, BR r (σ c ) gives the row player's best response to σ c . Similarly, given probability distribution σ r of the row player's strategies, BR c (σ r ) is the column player's best response to σ r . The best responses are added to the restricted game for the next iteration. The algorithm terminates when the best response utilities are not higher than the equilibrium utility of current restricted game. Although in the worst-case, the entire strategy space may be added to the restricted game, DO is guaranteed to converge to mixed NE in two-player zero-sum games. DO is also extended to the multi-agent reinforcement learning in PSRO (Lanctot et al., 2017) to approximate the best responses to the mixtures of agents' policies, and compute the meta-strategies for the policy selection. As discussed in previous sections, computing mixed NE for GANs is challenging as there is an extremely large number of pure strategies, i.e., possible parameter settings of the generator and discriminator networks. Thus, we propose a double oracle framework for GANs (DO-GAN) to compute the mixed NE efficiently. DO-GAN builds a restricted meta-matrix game between the two players and computes the mixed NE of the meta-matrix game, then DO-GAN iteratively adds more generators and discriminators into the meta-matrix game until termination.

4.1. GENERAL FRAMEWORK OF DO-GAN

GAN can be translated as a two-player zero-sum game between the generator player g and the discriminator player d. To compute the mixed NE of GANs, at iteration t, DO-GAN creates a restricted meta-matrix game U t with the trained generators and discriminators as the strategies of the two players, where the generators and discriminators are parameterized by π g ∈ G and π d ∈ D. We use U t (π g , π d ) to denote the generator player's payoff when playing π g against π d , which is defined as L D . Since GAN is zero-sum, the discriminator player's payoff is -U t (π g , π d ). We define σ t g and σ t d as the mixed strategies of generator player and discriminator player, respectively. With a slight abuse of notation, we define the generator player's expected utility of the mixed strategies σ t g , σ t d as U t (σ t g , σ t d ) = πg∈G π d ∈D σ t g (π g ) • σ t d (π d ) • U t (π g , π d ). We use σ t * g , σ t * d to denote the mixed NE of the restricted meta-matrix game U t . We solve U t to obtain the mixed NE, compute the best responses and add them into U t for the next iteration. Figure 1 presents an illustration of DO-GAN and Algorithm 1 describes the overview of the framework. 

End

π d (1) π d (2) π g (1) -2 -1 π g (2) 0 -3 Figure 1 : An illustration of DO-GAN. Figure adapted from (Lanctot et al., 2017) . Our algorithm starts by initializing two empty arrays G and D to store multiple generators and discriminators (line 1). We train the first π g and π d with the canonical training procedure of GANs (line 2). We store the parameters of the trained models in the two arrays G and D (line 3), compute the adversarial loss L D and add it to the meta-matrix U 0 (line 4). We initialize the meta-strategies σ 0 * g = [1] and σ 0 * d = [1] since there is only one pair of generator and discriminator available (line 5). For each epoch, we use generatorOracle() and discriminatorOracle() to obtain the best responses π g and π d to σ t * d and σ t * g via Adam Optimizer, respectively, then add them into G and D (lines 7-10). We then augment U t-1 by adding π g and π d and calculating U t (π g , π d ) to obtain U t and compute the missing entries (line 11). We compute the missing payoff entries U t (π g , π d ), ∀π d ∈ D and U t (π g , π d ), ∀π g ∈ G by sampling a few batches of training data. After that, we compute the mixed NE σ t * g , σ t * d of U t with linear programming (line 12). The algorithm terminates if the criteria described in Algorithm 2 is satisfied (line 13). We also prune the support strategy set of the players as described in Algorithm 3 (line 14) to avoid G and D becoming intractable. In generatorOracle(), we train π g to obtain the best response against σ t * d , i.e., U t (π g , σ  * d = [1]; for epoch t ∈ {1, 2, ...} do π g ← generatorOracle(σ t * d , D); G ← G ∪ {π g }; π d ← discriminatorOracle(σ t * g , G); D ← D ∪ {π d }; Augment U t- ) // U t is of size m × n // |G| = m, |D| = n Compute U t (σ t * g , σ t * d ); Compute U t (σ t * g , D[n]); Compute U t (G[m], σ t * d ); genInc = U t (G[m], σ t * d ) -U t (σ t * g , σ t * d ); disInc = -U t (σ t * g , D[n]) -(-U t (σ t * g , σ t * d )); if genInc < && -disInc < then return True else return False ; Algorithm 3: PruneMetaMatrix(U t , σ t * g , σ t * d ) 1 // I stores indices to be pruned from G and D // G stores models to be pruned from G and D 2 Ig = ∅; I d = ∅ 3 Kg = ∅; K d = ∅ if |G| > s then 4 for i ∈ {0, . . . , |G| -1} do 5 if σ t * g (G[i]) == min σ t * g then Ig ← Ig ∪ {i}; Kg ← Kg ∪ {G[i]} ; 6 if |D| > s then 7 for j ∈ {0, . . . , |D| -1} do 8 if σ t * d (D[j]) == min σ t * d then I d ← I d ∪ {j}; K d ← K d ∪ {D[j]} ; 9 G ← G \ Kg; D ← D \ K d ; 10 U ← JI g ,m • U t • J T I d ,n

4.2. LINEAR PROGRAM OF META-MATRIX GAME

Since the current restricted meta-matrix game U t is a zero-sum game, we can use a linear program to compute the mixed NE in polynomial time (Schrijver, 1998) . Given the generator player g's mixed strategy σ t g , the discriminator player d will play strategies that minimize the expected utility of g. Thus, the mixed NE strategy for the generator player σ t * g is to maximize the worst-case expected utility, which is obtained by solving the following linear program: σ t * g = arg max σ t g {v : σ t g ≥ 0, i∈G σ t g (i) = 1, U t (σ t g , π d ) ≥ v, ∀π d ∈ D}. Similarly, we can obtain the mixed NE strategy for the discriminator σ t * d by solving a linear program that maximizes the worst-case expected utility of the discriminator player. Therefore, we obtain the mixed NE σ t * g , σ t * d of the restricted meta-matrix game U t .

4.3. TERMINATION CHECK

DO terminates the training by checking whether the best response π g (or π d ) is in the support set G (or D) (Jain et al., 2011) , but we cannot apply this approach to DO-GAN as GAN has infinite-dimensional strategy space (Hsieh et al., 2019) . Hence, we terminate the training if the best responses cannot bring a higher utility to the two players than the entries of the current support sets, as discussed in (Lanctot et al., 2017; Muller et al., 2020) . Specifically, we first compute U t (σ t * g , σ t * d ) and the expected utilities for new generator and discriminator  U t (G[m], σ t * d ), U t (σ t * g , D[n]) (line 1-3).

4.4. PRUNING META-MATRIX

As the meta-matrix grows with every epoch of DO, there is a risk that the support strategy set becomes very large and G and D become intractable. To avoid this, we adapt the greedy pruning algorithm from (Cheng & Wellman, 2007) , as depicted in Algorithm 3. When either |G| or |D| is greater than the limit of the support set size s, we prune at least one strategy with the least probability, which is the strategy that contributes the least to the player's winning. Specifically, we define J I,b where I is the set of row numbers to be removed, b is the total rows of a matrix. To remove the 2 nd row of a matrix having 3 rows, we define I = {1}, b = 3 and J {1},3 = 1 0 0 0 0 1 . If |G| > s, at least one strategy with minimum probability is pruned from G, similarly for D (lines 3-9). Finally, we prune the meta-matrix using matrix multiplication (line 10).

5. EXPERIMENTS

We conduct our experiments on a machine with Xeon(R) CPU E5-2683 v3@2.00GHz and 4× Tesla v100-PCIE-16GB running Ubuntu operating system. We evaluate the double oracle framework for some established GAN architectures such as vanilla GAN (Goodfellow et al., 2014) , DCGAN (Radford et al., 2015) , SNGAN (Miyato et al., 2018a) and SGAN (Huang et al., 2017) . We adopt the parameter settings and criterion of the GAN architectures as published. We set s = 10 unless mentioned otherwise. We compute the mixed NE of the meta-matrix game with Nashpyfoot_0 . The evaluation details are shown in Appendix B. To illustrate the effectiveness of the architecture, we train a double oracle framework with the simple vanilla GAN architecture on a 2D mixture of 8 Gaussian mixture components with cluster standard deviation 0.1 which follows the experiment by (Metz et al., 2017) . Figure 2 shows the evolution of 512 samples generated by GAN and DO-GAN through 20000 epochs. The goal of GAN and DO-GAN is to correctly generate samples at 8 modes as shown in the target. The results show that GAN can only identify 6 out of 8 modes of the synthetic Gaussian data distribution, while the DO-GAN can obtain all the 8 modes of the distribution. Furthermore, DO-GAN takes shorter time (less than 5000 epochs) to identify all 8 modes of the data distribution. We present a more detailed evolution of data samples through the training process on 2D Gaussian Mixtures in Appendix C. Ablations. We also varied the support set size of the training s = 5, 10, 15 and recorded the computation time as discussed in Appendix D. We found that the training cannot converge when s = 5 and takes a long time when s = 15. Thus, we chose s = 10 for the training process.

5.2. EVALUATION ON REAL-WORLD DATASETS

We evaluate the performance of the double oracle framework which takes several established GAN architectures as the backbone as discussed in Appendix G, i.e., GAN (Goodfellow et al., 2014) , DCGAN (Radford et al., 2015) and SGAN (Huang et al., 2017) with convolutional layers for the deep neural networks of GAN as well as SNGAN (Miyato et al., 2018a) which uses normalization techniques. We run experiments on MNIST (LeCun & Cortes, 2010), CIFAR-10 ( Krizhevsky et al., 2009) and CelebA (Liu et al., 2015) datasets. MNIST contains 60,000 samples of handwritten digits with images of 28 × 28. CIFAR-10 contains 50, 000 training images of 32 × 32 of 10 classes. CelebA is a large-scaled face dataset with more than 200K images of size 128 × 128.

5.2.1. QUALITATIVE EVALUATION

We choose the CelebA dataset for the qualitative evaluation since the training images contain noticeable artifacts (aliasing, compression, blur) that make the generator difficult to produce perfect and faithful images. We compare the performance of DO-DCGAN, DO-SNGAN and DO-SGAN with their counterpartsfoot_1 . SNGAN which is trained for 40 epochs with termination of 5 × 10 

5.2.2. QUANTITATIVE EVALUATION

In this section, we evaluate the performance of various architectures by quantitative metrics. Inception Score. We first leverage the Inception Score (IS) (Salimans et al., 2016) FID Score. Fréchet Inception Distance (FID) measures the distance between the feature vectors of real and generated images using Inception_v3 model (Heusel et al., 2017) . Here, we let p and q be the distributions of the representations obtained by projecting real and generated samples to the last hidden layer of Inception model. Assuming that p and q are the multivariate Gaussian distributions, FID measures the 2-Wasserstein distance between the two distributions. Hence, FID Score can capture the similarity of generated images to real ones better than the Inception score. Note: MIX+DCGAN and MGAN results are directly copied from (Arora et al., 2017; Hoang et al., 2018) . Results. The results are shown in Table 2 . In CIFAR-10 dataset, DO-GAN, DO-DCGAN and DO-SNGAN obtain much better results (7.2±0.16, 7.86±0.14 and 8.55±0.08) than GAN, DCGAN and SNGAN (3.84 ± 0.09, 6.32 ± 0.05 and 7.58 ± 0.12). However, we do not see a significant improvement in DO-SGAN compared to SGAN 8.62 ± 0.12 and 8.69 ± 0.10 since SGAN already can generate diverse images. We did not include IS for CelebA dataset as IS cannot reflect the real image quality for the CelebA, as observed in (Heusel et al., 2017) . In CIFAR-10 dataset, DO-GAN, DO-DCGAN, DO-SNGAN and DO-SGAN obtain much lower FID scores (31.44, 22.25, 16.56, 18.20) respectively. The trend follows in CelebA obtaining 7.11 for DO-DCGAN while 10.92 for DCGAN, 7.62 for SNGAN while 6.92 for DO-SNGAN, 6.98 for SGAN and 6.32 for DO-SGAN respectively. Although we see a significant improvement in the quality of DO-SGAN images, FID score for DO-SGAN is affected by the distortions. According to the results, we can see that DO framework performs better than each of their original counterpart architectures. More details can be found in Appendix F.

6. CONCLUSION

We propose a novel double oracle framework to GANs, which starts with a restricted game and incrementally adds the best responses of the generator and the discriminator to compute the mixed NE. We then compute the players' meta-strategies by using a linear program. We also prune the support strategy set of players. We apply DO-GAN approach to established GAN architectures such as vanilla GAN, DCGAN, SNGAN and SGAN. Extensive experiments with the 2D Gaussian synthetic data set as well as real-world datasets such as MNIST, CIFAR-10 and CelebA show that DO-GAN variants have significant improvements in comparison to their respective GAN architectures both in terms of subjective image quality as well as in terms of quantitative metrics. We implement our proposed method with Python 3.7, Pytorch=1.4.0 and Torchvision=0.5.0. We set the hyperparameters as the original implementations. We present the hyperparameters set in Table 3 .

B IMPLEMENTATION DETAILS

We use Nashpy to compute the equilibria of the meta-matrix game. C FULL TRAINING PROCESS OF 2D GAUSSIAN DATASET 4 and Figure 9 . We find that if the support size is too small, e.g., s = 5, the best responses which are not optimal yet have better utilities than the models in the support set are added and pruned from the meta-matrix repeatedly making the training not able to converge. However, s = 15 takes a significantly longer time as the time for the augmenting of meta-matrix becomes exponentially long with the support set size. Hence, we chose s = 10 as our experiment support set size since we observed that there is no significant trade-off and shorter runtime. 



https://nashpy.readthedocs.io/en/stable/index.html We do not evaluate the performance of vanilla GAN and its DO variant on CelebA dataset since DCGAN and SGAN outperform vanilla GAN in image generation tasks(Radford et al., 2015)



Double Oracle Framework for GAN (DO-GAN) Initialize generator and discriminator arrays G = ∅ and D = ∅; Train generator & discriminator to get the first πg and π d ; G ← G ∪ {πg}; D ← D ∪ {π d }; Compute the adversarial loss LD and add it to meta-matrix U 0 ; Initialize σ 0 * g = [1] and σ 0

Figure 2: Comparison of GAN and DO-GAN on 2D synthetic dataset

-5 for DO-SNGAN where other architectureas are trained for 25 epochs with termination of 5 × 10 -5 for the double oracle variants. The generated CelebA images of DCGAN and DO-DCGAN are shown in Figure3, where we find that DCGAN suffers mode-collapse, while DO-DCGAN does not. We also present the generated images of SNGAN vs DO-SNGAN and SGAN vs DO-SGAN using fixed noise at different training epochs in Figure4and 5. From the results, we can see that SNGAN, SGAN, DO-SNGAN and DO-SGAN are able to generate various faces, i.e., no mode-collapse. Judging from subjective visual quality, we find that DO-SNGAN and DO-SGAN are able to generate plausible images faster than SNGAN and SGAN during training, i.e., 17 epochs for DO-SGAN and 20 epochs for SGAN. More experiment results on CIFAR-10 can be found in Appendix E.

Figure 3: Training images with fixed noise for DCGAN and DO-DCGAN until termination.

Figure 4: Training images with fixed noise for SNGAN and DO-SNGAN until termination.

Figure 5: Training images with fixed noise for SGAN and DO-SGAN until termination.

by using Inception_v3(Szegedy et al., 2016) as the inception model. To compute the inception score, we first compute the Kullback-Leibler (KL) divergence for all generated images and use the equation IS = exp(E x [KL(D(p(y|x) p(y)))]) where p(y) is the conditional label distributions for the images in the split and p(y|x) is that of the image x estimated by the reference inception model. Inception score evaluates the quality and diversity of all generated images rather than the similarity to the real data from the test set.

Figure 6: Full comparison of GAN and DO-GAN on 2D Synthetic Gaussian Dataset

Figure 8: GAN and DO-GAN comparison with Gaussian Mixture 9 modes

Figure 9: Training evolution on 2D Gaussian Dataset with s = 5, 10, 15

Figure 13: Generated images of CelebA dataset for DO-SNGAN and SNGAN

Comparison of Terminologies between Game Theory and GAN

New generator/discriminator π g , π d are added

t * d ) ≥ U t (π g , σ t * d ), ∀π g ∈ Π g . Similarly, in disciminatorOracle(), we train π d to obtain the best response against σ t * g , i.e., U t (σ t * g , π d ) ≥ U t (σ t * g , π d ), ∀π d ∈ Π d . Full details of generator oracle and discriminator oracle can be found in Appendix A.

Then, we calculate the utility increment (lines 4-5) and returns True if both U t (G[m], σ t * d ) and U t (σ

Inception scores (higher is better) and FID scores (lower is better). The mean and standard deviation are drawn from running 10 splits on 10000 generated images. The magenta values are the improvements of the DO-GAN variants compared with their counterparts.



A FULL ALGORITHM OF DO-GAN

Update the generator G's parameters π g via Adam optimizer:Initialize a discriminator D with random parameter setting π d ; for iteration k 0 . . . k n do Sample a minibatch of data x;for a minibatch do Sample noise z;Initialize a generator G with a parameter setting π g ;Generate and add to mixture G(z);Update the discriminator D's parameters π d via Adam optimizer:We train the oracles for some iterations which we denote as k 0,1,2,... . For experiments, we train each oracle for an epoch for the real-world datasets and 50 iterations for the 2D Synthetic Gaussian Dataset. At each iteration t, we sample the generators from the support set G with the meta-strategy σ t * g to generate the images for evaluation. Similarly, we conduct the performance evaluation with the generators sampled from G with the final σ * g at termination. SGAN consists of a top-down stack of GANs, e.g, for a stack of 2, Generator 1 is the first layer stacked on Generator 0 with each of them connected to Discriminator 1 and 0 respectively. Hence, in DO-SGAN, we store the meta-strategies for the Generator 0 and 1 in σ t * g and the Discriminator 1 and 0 for σ t * d . In GeneratorOracle(), we first sample Discriminator 1 and 0 from discriminator distribution σ t * d and train Generator 1 first then followed by calculating loss with Discriminator 1 and train Generator 0 subsequently, and finally calculate final loss with Discriminator 0 and train the whole model end to end. We perform the same process for DisciminatorOracle().

E GENERATED IMAGES OF CELEBA AND CIFAR-10

In this section, we present the training images of CelebA and CIFAR-10 datasets. 

G CHOICE OF GAN ARCHITECTURES FOR EXPERIMENTS

Figure 15 : Taxonomy of GAN Architectures from (Wang et al., 2019) We carried out experiments with the variants of GANs to evaluate the performance of our DO-GAN framework. We refer to the taxonomy of GANs (Wang et al., 2019) 

