A GENERAL DIFFERENTIALLY PRIVATE LEARNING FRAMEWORK FOR DECENTRALIZED DATA Anonymous authors Paper under double-blind review

Abstract

Decentralized consensus learning has been hugely successful that minimizing a finite sum of expected objectives over a network of agents. However, the local communication across neighbouring agents in the network may lead to the leakage of private information. To address this challenge, we propose a general differentially private (DP) learning framework that is applicable to direct and indirect communication networks without a central coordinator. We show that the proposed algorithm retains the performance guarantee in terms of generalization and finite sample performance. We investigate the impact of local privacy-preserving computation on the global DP guarantee. Further, we extend the discussion by adopting a new class of noise-adding DP mechanisms based on generalized Gaussian distributions to improve the utility-privacy trade-offs. Our numerical results demonstrate the effectiveness of our algorithm and its better performance over the state-of-the-art baseline methods in various decentralized settings.

1. INTRODUCTION

Decentralized learning is a process of learning a consensus model using the datasets that are distributed across different agents, such as machines, hospitals, and mobile devices (Shi et al., 2014; Han et al., 2017; Gong et al., 2016; Beyan et al., 2020) . During the process, each local agent (1) keeps its own private data locally; (2) requires no exchange of raw data; and (3) communicates only with its connected agents to train its local model and updates the global parameters directly without a central coordinator. In particular, as medical data are inherently decentralized, i.e., owned or distributed across different institutions, direct sharing or central aggregation of such distributed medical data is increasingly restricted due to either ownership or other regulatory constraints. As a consequence, the advancement of decentralized learning will offer innovative solutions to transform healthcare sectors (Warnat-Herresthal et al., 2021) . Although decentralized learning only requires parallel computation at each local agent and sharing of the estimates or perhaps other intermediate parameters (auxiliary variables) with connected neighbouring agents, past experience has demonstrated the possibility of privacy leakage in the process: the attacker can still recover sensitive information from local communications (e.g., Fredrikson et al. 2015; Shokri et al. 2017) . One defence procedure is to adopt a private variant of the learning algorithm using Differential Privacy (DP) to secure the training process. Very few DP algorithms focus on decentralized learning systems, with the exception of recent works in Xu et al. (2022); Yu et al. (2021a); Huang & Gong (2020) . However, when introducing perturbation into the iterative learning process, these earlier methods only focus on achieving (ϵ, δ)-DP guarantee for each agent. Due to the communications with neighbouring agents during the iterative process, the overall privacy guarantee of the algorithm is no longer (ϵ, δ)-DP. Importantly, it is unclear how one can split the privacy budgets among all the agents in order to achieve a global (ϵ, δ)-DP guarantee for the algorithm when using these earlier methods. Finally, these existing methods consider a standard Gaussian noise-adding mechanism. The added unbounded noise could lead to unstable results, which can severely affect the learning efficiency and degrade the performance of the trained global model (Farokhi, 2022) . This paper aims to provide a unified solution to address these issues and discuss the theoretical guarantees of the proposed algorithm.

