SINGLE SMPC INVOCATION DPHELMET: DIFFERENTIALLY PRIVATE DISTRIBUTED LEARNING ON A LARGE SCALE

Abstract

We introduce a distributing differentially private machine learning training protocol that locally trains support vector machines (SVMs) and computes their averages using a single invocation of a secure summation protocol. With state-of-the-art secure summation protocols and using a strong foundation model such as SimCLR, this approach scales to a large number of users and is applicable to non-trivial tasks, such as CIFAR-10. Our experimental results illustrate that for 1,000 users with 50 data points each, our scheme outperforms state-of-the-art scalable distributed learning methods (differentially private federated learning, short DP-FL) while requiring around 500 times fewer communication costs: For CIFAR-10, we achieve a classification accuracy of 79.7 % for an ε = 0.59 while DP-FL achieves 57.6 %. More generally, we prove learnability properties for the average of such locally trained models: convergence and uniform stability. By only requiring strongly convex, smooth, and Lipschitz-continuous objective functions, locally trained via stochastic gradient descent (SGD), we achieve a strong utility-privacy trade-off.

1. INTRODUCTION

Scalable distributed privacy-preserving machine learning methods have a plethora of applications, ranging from medical institutions that want to learn from distributed patient data, over edge AI health applications, to decentralized recommendation systems. Preserving each person's privacy during distributed learning raises two challenges: (1) during the distributed learning process the inputs of all parties have to be protected and (2) the resulting model itself should not leak information about the contribution of any person to the training data. To tackle (1), secure multi-party computation protocols (SMPC) can protect data during distributed computation. To tackle (2), differentially private (DP) mechanisms provide guarantees for using or releasing the model in a privacy-preserving manner. The literature contains a rich body of work on this kind of privacy-preserving distributed machine learning (PPDML) which is frequently evaluated with respect to scalability with the number of users who participate in the distributed learning, expressivity of the learning method with the goal of encompassing complex learning tasks, and a good utility-privacy trade-off without a significant loss in accuracy for protecting each person's data, optimally the same utility-privacy trade-off as the centralized training scheme while only adding little communication overhead. 2022) have shown that pre-trained models can improve the performance of a differentially private machine learning method (DP-SGD) for non-trivial tasks (e.g., CIFAR-10). While such models require sufficient public data, they exist and provide simplifying representations for various domains: SimCLR for pictures, Facenet for portrait pictures, UNet for medical segmentation imagery, or GPT-3 for natural language. Yet, this prior work does not excel at all three metrics simultaneously: scalability, expressivity, and utility-privacy trade-off. This places an inherent disadvantage when comparing current distributed training processes to a centralized training process. Contributions. Our Secure Distributed DP-Helmet work extends on prior work (Jayaraman et al., 2018) such that it is scalable, expressive, and has a good utility-privacy trade-off. Table 1 compares our approach with Jayaraman et al. ( 2018)'s approaches and DP-FL. In summary, we make two tangible contributions: 1. For SGD-based strongly convex ERM, we prove a tighter utility bound which essentially states that we only need the average of locally trained models, e.g. support vector machines (SVMs) or logistic regression (LR), to converge to the optimal centrally trained model with rate O( 1 /M) for M iterations (cf. Thm. 21). We also show train-test generalization by proving uniform stability which states that averaging our models linearly improves the stability bound (cf. Thm. 19). 2. In Cor. 10 we show how with enough data, guarantees as in local DP can be achieved, even without assumptions on the training algorithm beyond a norm-bounded parameter space: we protect the entire input of a user while achieving strong utility bounds (> 80% test accuracy for CIFAR-10). an SVM, via a learning algorithm T , and finally contributes a model which is carefully noised with a spherical Σ-parameterized Gaussian to a single invoked secure summation step which results in an averaged and (ε, δ)-DP model. ξ denotes some hyperparameters and K a set of classes.

Systems overview.

Secure Distributed DP-Helmet achieves scalable, distributed privacypreserving training on sensitive data with a strong classification performance. A schematic overview



Jayaraman et al. (2018)  introduced a theoretic result where the model optimum is noised (output perturbation). Here, each of the n users locally trains a convex empirical risk minimization (ERM) model on m data points and contributes the parameters of this model, carefully noised to a single invoked SMPC step, resulting in an averaged differentially private model. This approach achieves DP(Chaudhuri et al., 2011), requires as little noise as the centralized setting (O( 1 /nm)), and incurs little communication overhead, with one SMPC invocation. However, they use untight utility bounds Pathak et al. (2010) that scale with the number of local data points (O( 1 /m)) and not with the combined number of data points across all users (O( 1 /nm)). Jayaraman et al. (2018) prove strong utility bounds with another scheme, the gradient perturbation: each user contributes the gradients of each local training iteration carefully noised to a single invoked SMPC step which results in an averaged differentially private gradient step. This construction adds as little noise as centralized training (O( 1 /nm)) and achieves strong utility bounds which scale with the number of data points across all users (O( 1 /nm)). However, it has considerable communication overhead since it requires one SMPC invocation per training iteration. Federated learning (McMahan et al., 2017) with a DP-SGD approximation (Abadi et al., 2016) (DP-FL) constitutes another line of research with moderate utility bounds and moderate communication overhead. In DP-FL, an untrusted aggregator combines the gradient updates from each user, while each user satisfies DP. DP-FL does not require SMPC for similar security guarantees but needs O(#training_steps) communication rounds. The utility bounds are comparatively high since the noise scales with O(m √ n). Appx. C discusses related work in more detail. Concerning expressivity, Abadi et al. (2016); Tramèr & Boneh (2021); De et al. (

Figure 1: Schematic overview of Secure Distributed DP-Helmet. Each user locally extracts a simplified data representation via a pre-training feature extractor (SimCLR), then trains a model, e.g. an SVM, via a learning algorithm T , and finally contributes a model which is carefully noised with a spherical Σ-parameterized Gaussian to a single invoked secure summation step which results in an averaged and (ε, δ)-DP model. ξ denotes some hyperparameters and K a set of classes.

