SINGLE SMPC INVOCATION DPHELMET: DIFFERENTIALLY PRIVATE DISTRIBUTED LEARNING ON A LARGE SCALE

Abstract

We introduce a distributing differentially private machine learning training protocol that locally trains support vector machines (SVMs) and computes their averages using a single invocation of a secure summation protocol. With state-of-the-art secure summation protocols and using a strong foundation model such as SimCLR, this approach scales to a large number of users and is applicable to non-trivial tasks, such as CIFAR-10. Our experimental results illustrate that for 1,000 users with 50 data points each, our scheme outperforms state-of-the-art scalable distributed learning methods (differentially private federated learning, short DP-FL) while requiring around 500 times fewer communication costs: For CIFAR-10, we achieve a classification accuracy of 79.7 % for an ε = 0.59 while DP-FL achieves 57.6 %. More generally, we prove learnability properties for the average of such locally trained models: convergence and uniform stability. By only requiring strongly convex, smooth, and Lipschitz-continuous objective functions, locally trained via stochastic gradient descent (SGD), we achieve a strong utility-privacy trade-off.

1. INTRODUCTION

Scalable distributed privacy-preserving machine learning methods have a plethora of applications, ranging from medical institutions that want to learn from distributed patient data, over edge AI health applications, to decentralized recommendation systems. Preserving each person's privacy during distributed learning raises two challenges: (1) during the distributed learning process the inputs of all parties have to be protected and (2) the resulting model itself should not leak information about the contribution of any person to the training data. To tackle (1), secure multi-party computation protocols (SMPC) can protect data during distributed computation. To tackle (2), differentially private (DP) mechanisms provide guarantees for using or releasing the model in a privacy-preserving manner. The literature contains a rich body of work on this kind of privacy-preserving distributed machine learning (PPDML) which is frequently evaluated with respect to scalability with the number of users who participate in the distributed learning, expressivity of the learning method with the goal of encompassing complex learning tasks, and a good utility-privacy trade-off without a significant loss in accuracy for protecting each person's data, optimally the same utility-privacy trade-off as the centralized training scheme while only adding little communication overhead. 



Jayaraman et al. (2018)  introduced a theoretic result where the model optimum is noised (output perturbation). Here, each of the n users locally trains a convex empirical risk minimization (ERM) model on m data points and contributes the parameters of this model, carefully noised to a single invoked SMPC step, resulting in an averaged differentially private model. This approach achieves DP(Chaudhuri et al., 2011), requires as little noise as the centralized setting (O( 1 /nm)), and incurs little communication overhead, with one SMPC invocation. However, they use untight utility bounds Pathak et al. (2010) that scale with the number of local data points (O( 1 /m)) and not with the combined number of data points across all users (O( 1 /nm)).Jayaraman et al. (2018) prove strong utility bounds with another scheme, the gradient perturbation: each user contributes the gradients of each local training iteration carefully noised to a single invoked SMPC step which results in an averaged differentially private gradient step. This construction adds as little noise as centralized training (O( 1 /nm)) and achieves strong utility bounds which scale with

