

Abstract

We show that with improved training, the standard approach for differentially private GANs -updating the discriminator with noisy gradients -achieves or competes with state-of-the-art results for private image synthesis. Existing instantiations of this approach neglect to consider how adding noise only to discriminator updates disrupts the careful balance between generator and discriminator necessary for successful GAN training. We show that a simple fix -taking more discriminator steps between generator steps -restores parity and improves training. Furthermore, with the goal of restoring parity between the generator and discriminator, we experiment with further modifications to improve discriminator training and see further improvements in generation quality. For MNIST at ε = 10, our private GANs improve the record FID from 48.4 to 13.0, and record downstream classifier accuracy from 83.2% to 95.0%.

1. INTRODUCTION

Differential privacy (DP) (Dwork et al., 2006b) has emerged as a compelling approach for training machine learning models on sensitive data. However, incorporating DP requires significant changes to the training process. Notably, it prevents the modeller from working directly with private data, complicating debugging and exploration. Furthermore, the modeller can no longer interact with a private dataset after exhausting their allocated privacy budget. One approach to alleviate these issues is by producing differentially private synthetic data, which can be plugged directly into existing machine learning pipelines, without further concern for privacy. A recent line of work studies leveraging deep generative models to produce DP synthetic data. Early efforts focused on privatizing generative adversarial networks (GANs) (Goodfellow et al., 2014) by using differentially private stochastic gradient descent (DPSGD) (Abadi et al., 2016) to update the GAN discriminator -an approach referred to as DPGAN (Xie et al., 2018; Beaulieu-Jones et al., 2019; Torkzadehmahani et al., 2019) . However, follow-up work has significantly departed from this baseline DPGAN approach, either in terms of: (a) the privatization scheme, in favor of approaches based on subsample-and-aggregate which divide the data into ≥ 1000 disjoint partitions and train teacher discriminators separately on each one (Jordon et al., 2019; Long et al., 2021; Chen et al., 2020; Wang et al., 2021) ; or (b) the generative modelling framework altogether, opting instead to minimize notions of statistical distance between real and generated data, such as maximum mean discrepancy (Harder et al., 2021; Vinaroz et al., 2022) , or Sinkhorn divergences (Cao et al., 2021) . For labelled image synthesis, these custom generative models designed specifically for privacy fall short of GANs when evaluated at their non-private limits (ε → ∞), suggesting limited scalability to larger, higher-resolution datasets.foot_0 On the other hand, the literature corroborates that under modest privacy budgets, these departures from the baseline DPGAN lead to significant improvements in generation quality. Proposed explanations attribute these results to inherent limitations of the DPGAN framework, suggesting that either: (a) privatizing discriminator training is sufficient for privacy, but may be overkill when only the generator needs to be released (Long et al., 2021) ; or (b) adversarial objectives may be unsuited for training under privacy (Cao et al., 2021) . Our contributions. We demonstrate that the reported poor results of DPGANs should not be attributed to inherent limitations of the framework, but rather, training issues. Specifically, we propose that the asymmetric noise addition in DPGANs (adding noise to discriminator updates only) weakens the discriminator relative to the generator, disrupting the careful balance necessary for successful GAN training. We propose that taking more discriminator steps between generator updates addresses the imbalance introduced by noise. With this change, DPGANs improve significantly (see Figure 1 ), going from non-competitive to achieving or competing with state-of-the-art results in private image synthesis. Furthermore, we show this perspective on private GAN training ("restoring parity to a discriminator weakened by DP noise") can be applied to improve training. We make other modifications to discriminator training -large batch sizes and adaptive discriminator step frequency -to further improve upon the aforementioned results. In summary, we make the following contributions: 1. We find that taking more discriminator steps between generator steps significantly improves DPGANs. Contrary to the previous results in the literature, DPGANs do compete with state-of-the-art generative modelling approaches designed with privacy in mind. 2. We present empirical findings towards understanding why more frequent discriminator steps help. We propose an explanation based on asymmetric noise addition for why vanilla DPGANs do not perform well, and why taking more steps helps. 3. We put our explanation to the test. We employ it as a principle for designing better private GAN training recipes, and indeed are able to improve over the aforementioned results.

2. PRELIMINARIES

Our goal is to train a generative model on sensitive data that is safe to release, i. 



For example, the record FID for MNIST at ε = 10 is 48.4 (Cao et al., 2021). When evaluated at ε = ∞, their method achieves an FID of 43.4. Our non-private GANs obtain an FID of 3.2.



Figure 1: DPGAN results on MNIST synthesis at (10, 10 -5 )-DP. (a) We find that increasing nD, the number of discriminator steps taken between generator steps, significantly improves image synthesis results. Using nD = 50 instead of nD = 1 improves FID from 205.9 → 19.4, which improves over the record FID of 48.4 from Cao et al. (2021). nD = 50 also improves the record downstream classification accuracy to 92.9% (see Figure2a), an improvement over the record accuracy of 83.2% from Cao et al. (2021). (b) Corresponding synthesized images. We observe that large nD improves visual quality, and low nD leads to mode collapse.

e., it does not leak the secrets of individuals in the training dataset. We do this by ensuring the training algorithm Awhich takes as input the sensitive dataset D ∈ U and returns the parameters of a trained (generative) model θ ∈ Θ -satisfies differential privacy. Definition 1 (Differential Privacy(Dwork et al., 2006b)). A randomized algorithm A : U → Θ is (ε, δ)-differentially private if for every pair of neighbouring datasets D, D ′ ∈ U, we haveP{A(D) ∈ S} ≤ exp(ε) • P{A(D ′ ) ∈ S} + δ for all S ⊆ Θ.In this work, we adopt the add/remove definition of DP, and say two datasets D and D ′ are neighbouring if they differ in at most one entry, that is, D = D ′ ∪ {x} or D ′ = D ∪ {x}.

