DO-GAN: A DOUBLE ORACLE FRAMEWORK FOR GENERATIVE ADVERSARIAL NETWORKS

Abstract

In this paper, we propose a new approach to train Generative Adversarial Networks (GANs) where we deploy a double-oracle framework using the generator and discriminator oracles. GAN is essentially a two-player zero-sum game between the generator and the discriminator. Training GANs is challenging as a pure Nash equilibrium may not exist and even finding the mixed Nash equilibrium is difficult as GANs have a large-scale strategy space. In DO-GAN, we extend the double oracle framework to GANs. We first generalize the player strategies as the trained models of generator and discriminator from the best response oracles. We then compute the meta-strategies using a linear program. Next, we prune the weaklydominated player strategies to keep the oracles from becoming intractable. We apply our framework to established architectures such as vanilla GAN, Deep Convolutional GAN, Spectral Normalization GAN and Stacked GAN. Finally, we conduct evaluations on MNIST, CIFAR-10 and CelebA datasets and show that DO-GAN variants have significant improvements in both subjective qualitative evaluation and quantitative metrics, compared with their respective GAN architectures.

1. INTRODUCTION

Generative Adversarial Networks (GANs) (Goodfellow et al., 2014) have been applied in various domains such as image and video generation, image-to-image translation and text-to-image synthesis (Liu et al., 2017; Reed et al., 2016) . Various architectures are proposed to generate more realistic samples (Radford et al., 2015; Mirza & Osindero, 2014; Pu et al., 2016) as well as regularization techniques (Arjovsky et al., 2017; Miyato et al., 2018b) . From the game-theoretic perspective, GANs can be viewed as a two-player game where the generator samples the data and the discriminator classifies the data as real or generated. The two networks are alternately trained to maximize their respective utilities until convergence corresponding to a pure Nash Equilibrium (NE). However, pure NE cannot be reliably reached by existing algorithms as pure NE may not exist (Farnia & Ozdaglar, 2020; Mescheder et al., 2017) . This also leads to unstable training in GANs depending on the data and the hyperparameters. Therefore, mixed NE is a more suitable solution concept (Hsieh et al., 2019) . Several recent works propose mixture architectures with multiple generators and discriminators that consider mixed NE such as MIX+GAN (Arora et al., 2017) and MGAN (Hoang et al., 2018) . MIX+GAN and MGAN cannot guarantee to converge to mixed NE. Mirror-GAN (Hsieh et al., 2019) finds the mixed NE by sampling over the infinite-dimensional strategy space and proposes provably convergent proximal methods. However, the sampling approach may not be efficient as mixed NE may only have a few strategies in the support set. Double Oracle (DO) algorithm (McMahan et al., 2003) is a powerful framework to compute mixed NE in large-scale games. The algorithm starts with a restricted game with a small set of actions and solves it to get the NE strategies of the restricted game. The algorithm then computes players' best-responses using oracles to the NE strategies and add them into the restricted game for the next iteration. DO framework has been applied in various disciplines (Jain et al., 2011; Bošanský et al., 2013) , as well as Multi-agent Reinforcement Learning (MARL) settings (Lanctot et al., 2017) . Inspired by the successful applications of DO framework, we, for the first time, propose a Double Oracle Framework for Generative Adversarial Networks (DO-GAN). This paper presents four key contributions. First, we treat the generator and the discriminator as players and obtain the best responses from their oracles and add the utilities to a meta-matrix. Second, we propose a linear program to obtain the probability distributions of the players' pure strategies (meta-strategies) for the respective oracles. The linear program computes an exact mixed NE of the meta-matrix game in polynomial time. Third, we propose a pruning method for the support set of best response strategies to prevent the oracles from becoming intractable as there is a risk of the meta-matrix growing very large with each iteration of oracle training. Finally, we provide comprehensive evaluation on the performance of DO-GAN with different GAN architectures using both synthetic and real-world datasets. Experiment results show that DO-GAN variants have significant improvements in terms of both subjective qualitative evaluation and quantitative metrics.

2. RELATED WORKS

In this section, we briefly introduce existing GAN architectures, double oracle algorithm and its applications such as policy-state response oracles that are related to our work. ) that consist of a top-down stack of generators and a bottom-up discriminator network. Each generator is trained to generate lower-level representations conditioned on higher-level representations that can fool the corresponding representation discriminator. Training GANs is very hard and unstable as pure NE for GANs might not exist and cannot be reliably reached by the existing approaches (Mescheder et al., 2017) . Considering mixed NE, MIX+GAN (Arora et al., 2017) maintains a mixture of generators and discriminators with the same network architecture but have their own trainable parameters. However, training a mixture of networks without parameter sharing makes the algorithm computationally expensive. Mixture Generative Adversarial Nets (MGAN) (Hoang et al., 2018) propose to capture diverse data modes by formulating GAN as a game between a classifier, a discriminator and multiple generators with parameter sharing. However, MIX+GAN and MGAN cannot converge to mixed NE. Mirror-GAN (Hsieh et al., 2019) finds the mixed NE by sampling over the infinite-dimensional strategy space and proposes provably convergent proximal methods. The sampling approach may be inefficient to compute mixed NE as the mixed NE may only have a few strategies with positive probabilities in the infinite strategy space. Double Oracle Algorithm. Double Oracle (DO) algorithm starts with a small restricted game between two players and solves it to get the player strategies at NE of the restricted game. The algorithm then exploits the respective best response oracles for additional strategies of the players. The DO algorithm terminates when the best response utilities are not higher than the equilibrium utility of the current restricted game, hence, finding the NE of the game without enumerating the entire strategy space. Moreover, in two-player zero-sum games, DO converges to a min-max equilibrium (McMahan et al., 2003) . DO framework is used to solve large-scale normal-form and extensive-form games such as security games (Tsai et al., 2012; Jain et al., 2011 ), poker games (Waugh et al., 2009) and search games (Bosansky et al., 2012) . DO framework is also used in MARL settings (Lanctot et al., 2017; Muller et al., 2020) . Policy-Space Response Oracles (PSRO) generalize the double oracle algorithm in a multi-agent reinforcement learning setting (Lanctot et al., 2017) . PSRO treats the players' policies as the best responses from the agents' oracles, builds the meta-matrix game and computes the mixed NE but it uses Projected Replicator Dynamics that update the changes in the probability of each player's policy at each iteration. Since the dynamics need to simulate the update for several iterations, the use of dynamics takes a longer time to compute the meta-strategies and does not guarantee to compute an exact NE of the meta-matrix game. However, in DO-GAN, we can use a linear program to compute the players' meta-strategies in polynomial time since GAN is a two-player zero-sum game (Schrijver, 1998) .



GAN Architectures. Various GAN architectures have been proposed to improve the performance of GANs. Deep Convolutional GAN (DCGAN)(Radford et al., 2015)  replaces fully-connected layers in the generator and the discriminator with deconvolution layer of Convolutional Neural Networks (CNN). Weight normalization techniques such as Spectral Normalization GAN (SNGAN)(Miyato  et al., 2018a)  stabilize the training of the discriminator and reduce the intensive hyperparameters tuning. There are also multi-model architectures such as Stacked Generative Adversarial Networks (SGAN)(Huang et al., 2017

