DIFFERENTIALLY PRIVATE GENERATIVE MODELS THROUGH OPTIMAL TRANSPORT

Abstract

Although machine learning models trained on massive data have led to breakthroughs in several areas, their deployment in privacy-sensitive domains remains limited due to restricted access to data. Generative models trained with privacy constraints on private data can sidestep this challenge and provide indirect access to the private data instead. We propose DP-Sinkhorn, a novel optimal transportbased generative method for learning data distributions from private data with differential privacy. DP-Sinkhorn relies on minimizing the Sinkhorn divergence-a computationally efficient approximation to the exact optimal transport distancebetween the model and the data in a differentially private manner and also uses a novel technique for conditional generation in the Sinkhorn framework. Unlike existing approaches for training differentially private generative models, which are mostly based on generative adversarial networks, we do not rely on adversarial objectives, which are notoriously difficult to optimize, especially in the presence of noise imposed by the privacy constraints. Hence, DP-Sinkhorn is easy to train and deploy. Experimentally, despite our method's simplicity we improve upon the state-of-the-art on multiple image modeling benchmarks. We also show differentially private synthesis of informative RGB images, which has not been demonstrated before by differentially private generative models without the use of auxiliary public data.

1. INTRODUCTION

As the full value of data comes to fruition through a growing number of data-centric applications (e.g. recommender systems (Gomez-Uribe & Hunt, 2016), personalized medicine (Ho et al., 2020) , face recognition (Wang & Deng, 2020) , speech synthesis (Oord et al., 2016) , etc.), the importance of privacy protection has become apparent to both the public and academia. At the same time, recent Machine Learning (ML) algorithms and applications are increasingly data hungry and the use of personal data will eventually be a necessity. Differential Privacy (DP) is a rigorous definition of privacy that quantifies the amount of information leaked by a user participating in any data release (Dwork et al., 2006; Dwork & Roth, 2014) . DP was originally designed for answering queries to statistical databases. In a typical setting, a data analyst (party wanting to use data, such as a healthcare or marketing company) sends a query to a data curator (party in charge of safekeeping the database, such as a hospital), who makes the query on the database and replies with a semi-random answer that preserves privacy. Differentially Private Stochastic Gradient Descent (DPSGD) 1 (Abadi et al., 2016) is the most popular method for training general machine learning models with DP guarantees. DPSGD involves large numbers of queries, in the form of gradient computations, to be answered quickly by the curator. This requires technology transfer of model design from analyst to curator, and strong computational capacity be present at the curator. Furthermore, if the analyst wants to train on multiple tasks, the curator must subdivide the privacy budget to spend on each task. As few institutions have simultaneous access to private data, computational resources, and expertise in machine learning, these requirements significantly limit adoption of DPSGD for learning with privacy guarantees. To address this challenge, generative models-models with the capacity to synthesize new datacan be applied as a general medium for data-sharing (Xie et al., 2018; Augenstein et al., 2020) . The curator first encodes private data into a generative model; then, the model is used by the analysts to synthesize similar yet different data that can train other ML applications. So long as the generative model is learned "privately", the user can protect their privacy by controlling how specific the generative model is to their own data. Differentially private learning of generative models has been studied mostly under the Generative Adversarial Networks (GAN) framework (Xie et al., 2018; Torkzadehmahani et al., 2019; Frigerio et al., 2019; Yoon et al., 2019; Chen et al., 2020) . While GANs in the non-private setting have demonstrated the ability to synthesize complex data like high definition images (Brock et al., 2019; Karras et al., 2020) , their application in the private setting is more challenging. This is in part because GANs suffer from training instability problems (Arjovsky & Bottou, 2017; Mescheder et al., 2018) , which can be exacerbated when adding noise to the network's gradients during training, a common technique to implement DP. Because of that, GANs typically require careful hyperparameter tuning and supervision during training to avoid model collapse. This goes against the principle of privacy, where repeated interactions with data need to be avoided (Chaudhuri & Vinterbo, 2013) . Optimal Transport (OT) is another method to train generative models. In the optimal transport setting, the problem of learning a generative model is framed as minimizing the optimal transport distance, a type of Wasserstein distance, between the generator-induced distribution and the real data distribution (Bousquet et al., 2017; Peyré & Cuturi, 2019) . Unfortunately, exactly computing the OT distance is generally expensive. Nevertheless, Wasserstein distance-based objectives are actually widely used to train GANs (Arjovsky et al., 2017; Gulrajani et al., 2017b) . However, these approaches typically estimate a Wasserstein distance using an adversarially trained discriminator. Hence, training instabilities remain (Mescheder et al., 2018) . An alternative to adversarial-based OT estimation is provided by the Sinkhorn divergence (Genevay et al., 2016; Feydy et al., 2019; Genevay et al., 2018) . The Sinkhorn divergence is an entropyregularized version of the exact OT distance, for which the optimal transport plan can be computed efficiently via the Sinkhorn algorithm (Cuturi, 2013) . In this paper, we propose DP-Sinkhorn, a novel method to train differentially private generative models using the Sinkhorn divergence as objective. Since the Sinkhorn approach does not intrinsically rely on adversarial components, it avoids any potential training instabilities and removes the need for early stopping. This makes our method easy to train and deploy in practice. As a side, we also develop a simple yet effective way to perform conditional generation in the Sinkhorn framework, by forcing the optimal transport plan to couple same-label data closer together. To the best of our knowledge, DP-Sinkhorn is the first fully OT-based approach for differentially private generative modeling. Experimentally, despite its simplicity DP-Sinkhorn achieves state-of-the-art results on image-based classification benchmarks that use data generated under differential privacy for training. We can also generate informative RGB images, which, to the best of our knowledge, has not been demonstrated by any generative models trained with differential privacy and without auxiliary public data. We make the following contributions: (i) We propose DP-Sinkhorn, a flexible and robust optimal transport-based framework for training differentially private generative models. (ii) We introduce a simple technique to perform label-conditional synthesis in the Sinkhorn framework. (iii) We achieve state-of-the-art performance on widely used image modeling benchmarks. (iv) We present informative RGB images generated under strict differential privacy without the use of public data.

2. BACKGROUND 2.1 NOTATIONS AND SETTING

Let X denote a sample space, P(X ) all possible measures on X , and Z ⊆ R d the latent space. We are interested in training a generative model g : Z → X such that its induced distribution µ = g • ξ with noise source ξ ∈ P(Z) is similar to observed ν through an independently sampled finite sized set of observations D = {y} N . In our case, g is a trainable parametric function with parameters θ.

2.2. GENERATIVE LEARNING WITH OPTIMAL TRANSPORT

Optimal Transport-based generative learning considers minimizing variants of the Wasserstein distance between real and generated distributions (Bousquet et al., 2017; Peyré & Cuturi, 2019) .



Including any variants that use gradient perturbation for ensuring privacy.

