TRAINING FEDERATED GANS WITH THEORETICAL GUARANTEES: A UNIVERSAL AGGREGATION APPROACH

Abstract

Recently, Generative Adversarial Networks (GANs) have demonstrated their potential in federated learning, i.e., learning a centralized model from data privately hosted by multiple sites. A federated GAN jointly trains a centralized generator and multiple private discriminators hosted at different sites. A major theoretical challenge for the federated GAN is the heterogeneity of the local data distributions. Traditional approaches cannot guarantee to learn the target distribution, which is a mixture of the highly different local distributions. This paper tackles this theoretical challenge, and for the first time, provides a provably correct framework for federated GAN. We propose a new approach called Universal Aggregation, which simulates a centralized discriminator via carefully aggregating the mixture of all private discriminators. We prove that a generator trained with this simulated centralized discriminator can learn the desired target distribution. Through synthetic and real datasets, we show that our method can learn the mixture of largely different distributions where existing federated GAN methods fail.

1. INTRODUCTION

Generative Adversarial Networks (GANs) have attracted much attention due to their ability to generate realistic-looking synthetic data (Goodfellow et al., 2014; Zhang et al., 2018; Liu et al., 2019b; Shaham et al., 2019; Dai et al., 2017; Kumar et al., 2017) . In order to obtain a powerful GAN model, one needs to use data with a wide range of characteristics (Qi, 2019) . However, these diverse data are often owned by different sources, and to acquire their data is often infeasible. For instance, most hospitals and research institutions are unable to share data with the research community, due to privacy concerns (Annas et al., 2003; Mercuri, 2004; lex, 2014; Gostin et al., 2009) and government regulations (Kerikmäe, 2017; Seddon & Currie, 2013) . To circumvent the barrier of data sharing for GAN training, one may resort to Federated Learning (FL), a promising new decentralized learning paradigm (McMahan et al., 2017) . In FL, one trains a centralized model but only exchanges model information with different data sources. Since the central model has no direct access to data at each source, privacy concerns are alleviated (Yang et al., 2019; Kairouz et al., 2019) . This opens the opportunity for a federated GAN, i.e., a centralized generator with multiple local and privately hosted discriminators (Hardy et al., 2019) . Each local discriminator is only trained on its local data and provides feedback to the generator w.r.t. synthesized data (e.g., gradient). A federated GAN empowers GAN with much more diversified data without violating privacy constraints. Despite the promises, a convincing approach for training a federated GAN remains unknown. The major challenge comes from the non-identical local distributions from multiple data sources/entities. The centralized generator is supposed to learn a mixture of these local distributions from different entities, whereas each discriminator is only trained on local data and learns one of the local distributions. The algorithm and theoretical guarantee of traditional single-discriminator GAN (Goodfellow et al., 2014) do not easily generalize to this federated setting. A federated GAN should integrate feedback from local discriminators in an intelligent way, so that the generator can 'correctly' learn the mixture distribution. Directly averaging feedbacks from local discriminators (Hardy et al., 2019) results in a strong bias toward common patternsowever, such non-identical distribution setting is classical in federated learning (Zhao et al., 2018; Smith et al., 2017; Qu et al., 2020) and characteristic of local data improves the diversity of data. In this paper, we propose the first theoretically guaranteed federated GAN, that can correctly learn the mixture of local distributions. Our method, called Universal Aggregation GAN (UA-GAN), focuses on the odds value rather than the predictions of local discriminators. We simulate an unbiased centralized discriminator whose odds value approximates that of the mixture of local discriminators. We prove that by aggregating gradients from local discriminators based on the odds value of the central discriminator, we are guaranteed to learn the desired mixture of local distributions. A second theoretical contribution of this paper is an analysis of the quality of the federated GAN when the local discriminators cannot perfectly learn with local datasets. This is a real concern in a federated learning setting; the quantity and quality of local data can be highly variant considering the limitation of real-world institutions/sites. Classical theoretical analysis of GAN (Goodfellow et al., 2014) assumes an optimal discriminator. To understand the consequence of suboptimal discriminators, we develop a novel analysis framework of the Jensen-Shannon Divergence loss (Goodfellow et al., 2014; Lin, 1991) through the odds value of the local discriminators. We show that when the local discriminators behave suboptimally, the approximation error of the learned generator deteriorates linearly to the error. It is worth noting that our theoretical result on suboptimality also applies to the classical GAN. To the best of our knowledge, this is the first suboptimality bound on the federated or classical GAN. The data from multiple entities may share some common distribution while retain its own individual distribution. Here, we use the MNIST dataset that the 10 digits belong to different data center as common pattern, and MNIST Fashion dataset that randomly split into different data centers as distinct features. (see Sec. 4.2 for details). We can see the UA-GAN carefully aggregates the feedback from different entities and manage to digest the universal distribution generate unbiased synthetic data. In summary, the contributions are threefold. • We propose UA-GAN, a novel federated GAN approach that aggregates feedback from local discriminators through their odds value rather than posterior probability. • We prove that UA-GAN correctly learns the mixture of local distributions when they are perfectly modeled by local discriminators. • We prove when the discriminators are suboptimal in modeling their local distributions, the generator's approximation error is also linear. We also show that our bound is tight. We show with various experiments that our method (UA-GAN) outperforms the state-of-theart federated GAN approaches both qualitatively and quantitatively. Training on large scale heterogeneous datasets makes it possible to unleash the power of GANs. Federated GANs show their promise in utilizing unlimited amount of sensitive data without privacy and regulatory concerns. Our method, as the first theoretically guaranteed GAN, will be one step further in building such a foundation. Fig. 1 shows the workflow of UA-GAN.

2. RELATED WORK

The Generative Adversarial Networks (GANs) have enjoyed much success in various machine learning and computer vision tasks (Zhang et al., 2018; Liu et al., 2019b; Shaham et al., 2019; Dai et al., 2017; Kumar et al., 2017) 



Figure 1: UA-GAN framework:The data from multiple entities may share some common distribution while retain its own individual distribution. Here, we use the MNIST dataset that the 10 digits belong to different data center as common pattern, and MNIST Fashion dataset that randomly split into different data centers as distinct features. (see Sec. 4.2 for details). We can see the UA-GAN carefully aggregates the feedback from different entities and manage to digest the universal distribution generate unbiased synthetic data.

Liu et al., 2018),WGAN-QC (Liu et al., 2019a)  etc. A common approach in practice is the conditional GAN (cGAN)(Mirza & Osindero, 2014), which uses supervision from data (e.g., class labels) to improve GAN's performance.Multi-discriminator/-generator GANs have been proposed for various learning tasks. To train these GANs, one common strategy is to directly exchange generator/discriminator model parameters during training(Xin et al., 2020; Hardy et al., 2019). This is very expensive in communication; a simple ResNet18(He et al., 2016a)  has 11 million parameters (40MB). Closest to us is MD-GAN (Hardy

