ENSEMBLES OF GENERATIVE ADVERSARIAL NETWORKS FOR DISCONNECTED DATA Anonymous

Abstract

Most computer vision datasets are composed of disconnected sets, such as images of different objects. We prove that distributions of this type of data cannot be represented with a continuous generative network without error, independent of the learning algorithm used. Disconnected datasets can be represented in two ways: with an ensemble of networks or with a single network using a truncated latent space. We show that ensembles are more desirable than truncated distributions for several theoretical and computational reasons. We construct a regularized optimization problem that rigorously establishes the relationships between a single continuous GAN, an ensemble of GANs, conditional GANs, and Gaussian Mixture GANs. The regularization can be computed efficiently, and we show empirically that our framework has a performance sweet spot that can be found via hyperparameter tuning. The ensemble framework provides better performance than a single continuous GAN or cGAN while maintaining fewer total parameters.

1. INTRODUCTION

Generative networks, such as generative adversarial networks (GANs) (Goodfellow et al., 2014) and variational autoencoders (Kingma & Welling, 2013) , have shown impressive performance in generating highly realistic images that were not observed in the training set (Karras et al., 2017; 2019a; b) . However, even state of the art generative networks such as BigGAN (Brock et al., 2018) generate poor quality imagery if conditioned on certain classes of ILSVRC2012 (Russakovsky et al., 2015) . We argue that this is due to the inherent disconnected structure of the data. In this paper, we theoretically analyze the effects of disconnected data on GAN performance. By disconnected, we mean that the data points are drawn from an underlying topological space that is disconnected (the rigorous definition is provided below in Section 3.1). As an intuitive example, consider the collection of all images of badgers and all images of zebras. These two sets are disconnected, because images of badgers do not resemble images of zebras, and modeling the space connecting these sets does not represent real images of animals. We rigorously prove that one cannot use a single continuous generative network to learn a data distribution perfectly under the disconnected data model. Because generative networks are continuous, they cannot map a connected latent space (R ) into the disconnected image space, resulting in the generation of data outside of the true data space. In related work, (Khayatkhoei et al., 2018) has empirically studied disconnected data but does not formally prove the results in this paper. In addition, the authors use a completely unsupervised approach to attempt to find the disconnected components as a part of learning. In contrast, we use class labels and hence work in the supervised learning regime. Our suggested approach to best deal with disconnected data is to use ensembles of GANs. We study GANs in particular for concreteness and because of their widespread application; however, our methods can be extended to other generative networks with some modification. Ensembles of GANs are not new, e.g., see (Nguyen et al., 2017; Ghosh et al., 2018; Tolstikhin et al., 2017; Arora et al., 2017) , but there has been limited theoretical study of their properties. We prove that ensembles can learn the data distribution under the disconnected data assumption and study their relationship to single GANs. Specifically, we develop a first-of-its-kind theoretic framework that relates single GANs, ensembles of GANs, conditional GANs, and Gaussian mixture GANs. The framework makes

