A GENERAL DIFFERENTIALLY PRIVATE LEARNING FRAMEWORK FOR DECENTRALIZED DATA Anonymous authors Paper under double-blind review

Abstract

Decentralized consensus learning has been hugely successful that minimizing a finite sum of expected objectives over a network of agents. However, the local communication across neighbouring agents in the network may lead to the leakage of private information. To address this challenge, we propose a general differentially private (DP) learning framework that is applicable to direct and indirect communication networks without a central coordinator. We show that the proposed algorithm retains the performance guarantee in terms of generalization and finite sample performance. We investigate the impact of local privacy-preserving computation on the global DP guarantee. Further, we extend the discussion by adopting a new class of noise-adding DP mechanisms based on generalized Gaussian distributions to improve the utility-privacy trade-offs. Our numerical results demonstrate the effectiveness of our algorithm and its better performance over the state-of-the-art baseline methods in various decentralized settings.

1. INTRODUCTION

Decentralized learning is a process of learning a consensus model using the datasets that are distributed across different agents, such as machines, hospitals, and mobile devices (Shi et al., 2014; Han et al., 2017; Gong et al., 2016; Beyan et al., 2020) . During the process, each local agent (1) keeps its own private data locally; (2) requires no exchange of raw data; and (3) communicates only with its connected agents to train its local model and updates the global parameters directly without a central coordinator. In particular, as medical data are inherently decentralized, i.e., owned or distributed across different institutions, direct sharing or central aggregation of such distributed medical data is increasingly restricted due to either ownership or other regulatory constraints. As a consequence, the advancement of decentralized learning will offer innovative solutions to transform healthcare sectors (Warnat-Herresthal et al., 2021) . Although decentralized learning only requires parallel computation at each local agent and sharing of the estimates or perhaps other intermediate parameters (auxiliary variables) with connected neighbouring agents, past experience has demonstrated the possibility of privacy leakage in the process: the attacker can still recover sensitive information from local communications (e.g., Fredrikson et al. 2015; Shokri et al. 2017) . One defence procedure is to adopt a private variant of the learning algorithm using Differential Privacy (DP) to secure the training process. Very few DP algorithms focus on decentralized learning systems, with the exception of recent works in Xu et al. (2022) ; Yu et al. (2021a) ; Huang & Gong (2020) . However, when introducing perturbation into the iterative learning process, these earlier methods only focus on achieving (ϵ, δ)-DP guarantee for each agent. Due to the communications with neighbouring agents during the iterative process, the overall privacy guarantee of the algorithm is no longer (ϵ, δ)-DP. Importantly, it is unclear how one can split the privacy budgets among all the agents in order to achieve a global (ϵ, δ)-DP guarantee for the algorithm when using these earlier methods. Finally, these existing methods consider a standard Gaussian noise-adding mechanism. The added unbounded noise could lead to unstable results, which can severely affect the learning efficiency and degrade the performance of the trained global model (Farokhi, 2022) . This paper aims to provide a unified solution to address these issues and discuss the theoretical guarantees of the proposed algorithm.

1.1. RELATED WORKS

In the setting of centralized learning for distributed data, a handful number of papers have studied how to integrate privacy-preserving techniques, such as DP, into the training process (Jayaraman et al., 2018; Li et al., 2022a; Guo et al., 2020; Cai et al., 2018; Li et al., 2022b; Huang et al., 2020; Cao et al., 2021) . For example, Jayaraman et al. (2018) proposed DP algorithms for convex problems where ensuring the information obtained from each local model satisfies DP guarantees before being aggregated in a central coordinator; Li et al. (2022b) proposed a unified centralized learning framework to ensure DP guarantees for each local agent for a general class of non-convex problems. However, these algorithms cannot be directly adapted to solve our problem; they focus on the setting where there exists a central coordinator that is responsible for aggregating information obtained from each local agent. There have been very few recent developments in decentralized learning algorithms with DP guarantees. We are aware of only three recent works in the literature. Among them, Huang & Gong (2020) is the first one; they proposed a DP Alternating Direction Method of Multipliers (ADMM) algorithm for a wide range of convex learning problems, where they perturb the objective function before solving the minimization associated with the local dataset at each local agent. More recently, Yu et al. (2021a) proposed a DP decentralized stochastic gradient descent (SGD)-based algorithm by perturbing the intermediate parameter updates at each local agent before communicating the perturbed parameter updates with its connected neighbouring agents; Xu et al. (2022) proposed a blockchain-enabled decentralized DP learning algorithm through gradient perturbation. However, these gradient-based methods impose restrictions on the objectives, such as smoothness, and therefore have limited application in broader settings. In contrast, we don't restrict ourselves to using gradient descent to find the minimums of the target objective functions. Instead, by using operator theory, we solve the optimization problem by defining a suitable operator or mapping such that the fixed points are the solutions to the original problem; in other words, we consider a broader generic problem of finding a fixed point of averaged iteration of a nonexpansive mapping. Under such an operator theoretical framework, the SGD and ADMM algorithms previously considered in Huang & Gong (2020) ; Yu et al. (2021a) ; Xu et al. (2022) can be considered special cases of our proposed generic algorithm.

1.2. OUR CONTRIBUTIONS

In this paper, we propose a general framework of a decentralized learning algorithm with DP guarantee, referring to Differentially Private decentralized Krasnosel'skiǐ-Mann iteration (DP-dKM). Our contributions are summarized as follows. 1. Built on the Krasnosel'skiǐ-Mann(KM) iteration (Krasnosel'skii, 1955; Mann, 1953) 

2. PROBLEM STATEMENT AND PRELIMINARIES

In this section, we first start with the problem set. We then present preliminaries about decentralized learning schemes as well as differential privacy. Throughout this paper, we denote the ℓ 1 , ℓ 2 , ℓ ∞ norm of x ∈ R p as ∥x∥ 1 , ∥x∥ and ∥x∥ ∞ . For a function l(x, ξ) : R p × Ξ → R, ∥l∥ ∞ := sup x,ξ |l(x, ξ)|. For a matrix, A denotes A ⊤ , ∥A∥ op , ∥A∥ F as its transpose, spectral norm, and Frobenius norm respectively. Given another matrix B, A ≻ B means that A -B is positively defined, and A ⪰ B means A -B is positive semidefinite. Problem formulation: Consider a network with M agents, each of which holds a dataset Ξ m = {ξ i(m) } Ni i=1 , for m = 1, • • • , M , where N i is the number of training samples in the dataset Ξ m , ξ i(m) ∈ R p is the i-th sample stored in the m-th agent. We assume that data are evenly collected and each agent has an equal sample size of N for ease of presentation. Our primary focus is on solving a stochastic decentralized optimization approximated by its corresponding empirical loss, L(x) := min x∈R p 1 M N M m=1 N i=1 l m x, ξ i(m) , x := arg min x∈R p 1 M N M m=1 N i=1 l m x, ξ i(m) , where x is the target parameter, and l m (•)s are the objectives that measure the performance of the local models. Throughout, we assume the objectives are convex, closed, and proper (c.c.p) but not necessarily differentiable. The goal is to learn a globally optimal solution, referring to consensus parameter (Cao et al., 2021; Shi et al., 2014) x = 1 M M m=1 x(m), on M agents across a network diagram, where x(m) is the solution of the local parameter on the m-th agent. Among the agents, the estimates per iterate are peer-to-peer without the existence of a central coordinator, and its connection is typically modeled as a graph, e.g. Figure 4 in the Appendix. We stress that each agent operates independently and the average is only taken in the last iteration.

Communication graph:

We now formally define the mathematical concept of graphs to characterize the communication among the agents. We define the connected network by, G = (V, E) with vertex set V = {1, . . . , M } and edge set E ⊆ V ×V. We denote N (m) as the neighbour set of agents m. Edge (m, l) ∈ E represents the interconnection between agent m and its neighbors l ∈ N (m). The decentralized optimization is associated with a given network topology that can be formulated mathematically by a mixing matrix (Alghunaim et al., 2019; Ying et al., 2021) and its properties can be summarized as follows. Definition 1 (Mixing Matrix) For any given graph G = (V, E), the mixing matrix W = [w m,l ] ∈ R M ×M is defined on the edge set V that satisfies: (1) if m ̸ = l and (m, l) / ∈ E, then w ml = 0; otherwise, w ml > 0; (2) W = W ⊤ ; (3) null{I -W} = span{1}; (4) I ⪰ W ≻ -I. We remark that W is a double gossip matrix that characterizes the communication among the agents and the matrix is non-unique for a given graph (Ying et al., 2021; Sun et al., 2021) . Let λ := max {|λ 2 | , |λ M |}, where λ i denotes the ith largest eigenvalue of W ∈ R M ×M . The spectral gap as 1 -λ measures the connectivity of gossip communications among these agents (Zhu et al., 2022) . Definition 1 implies that 0 ≤ λ < 1. A larger value of λ indicates less exchange communication among local agents. The KM iteration, as a simple implementation and fast convergence method in practice, has a long history and has been the most useful method in modern computing including operator-splitting and alternating-direction methods (Wotao, 2019; Wang et al., 2022) . It is the basic and one of the most popular iterative schemes for finding one fixed point of a nonexpansive operator. Specifically, the KM algorithm offers several advantages over traditional optimization methods (e.g., Newton-type algorithms and interior-point methods) (Davis & Yin, 2016; Ryu & Yin, 2022; Liang, 2016) : the former can easily handle nonsmooth terms and abstract linear operators, requires only simple arithmetic operations and scales well with the dimension of the problem. The KM additionally applies a decomposition procedure in which the original problem is broken into subproblems that can easily be solved (Ryu & Boyd, 2016) . The KM iteration is widely applied to centralized learning (Chraibi & Takác, 2019; Saber Malekmohammadi, 2021; Malinovsky et al., 2020) .However, it is still an open question to perform the KM iteration in a decentralized learning setting. We next fill this gap by proposing the decentralized KM iteration and presenting detailed schemes for the local agents. Definition 2 (Stochastic Decentralized Krasnosel'skiǐ-Mann (dKM)) Suppose the training sample set Ξ := M m=1 Ξ m is distributed-stored in M agents with total sample size N M , where Ξ m is a training dataset located in the m-th agent for m = 1, • • • , M . We assume that ξ i(m) ∼ P with ξ i(m) ∈ Ξ m for any m, i. For each agent, given a nonexpansive operator T , the iterative formula of the stochastic dKM algorithm, A, is defined as, x k+1 (m) = A x k (m); Ξ = l∈N (m) w ml x k (l) + α k T x k (m); ξ i k (m) -x k (m) , where w ml is the element of a given matrix W satisfying Definition 1, α k ∈ (0, 1]. i k is an i.i.d. variable drawn from the uniform distribution over {1, • • • , N } at the k-th iteration. Further, let X = [x(1), • • • , x(M )] ⊤ ∈ R M ×p that stores all local parameters across the network, T(X; Ξ) = [T (x(1); Ξ 1 ) , • • • , T (x(M ); Ξ M )] ⊤ ∈ R M ×p stacking all local updating w.r.t. the first argument. Iteration (1) has the matrix form, X k+1 = WX k + α k T X k ; Ξ -X k . As we consider the general framework of a decentralized learning problem with mild conditions (c.c.p.) for the loss, general computational procedures with wide coverage and flexibility that can be used to handle numerically inconvenient loss come more naturally. Specifically, the form of T in Definition 2 depends on the specific algorithm we adopt. For example, dKM implies gradient descent, proximal gradient descent, and ADMM algorithms in a decentralized setting when choosing T as a forward operator, forward-backward operator, and Douglas-Rachford operator. Please refer to Table 1 in the appendix for some forms of T . Additionally, the stochastic dKM algorithm provides a guideline to design a new decentralized learning algorithm by specifying the form of T . Privacy Concern: Despite each agent communicating with its neighbours by sending parameters instead of directly exchanging raw data, the risk of leaking information still exists: the attacker can recover the sensitive information of data from shared parameters as discussed in Shokri et al. (2017) , Fredrikson et al. (2015) . This motivates us to consider privacy preserving iteration procedure with efficient communication while it retains a performance guarantee. Differential Privacy (DP), introduced by Dwork et al. (2006) , is a widely adopted definition due to its important advantages over other privacy which differs by only one sample, we have log P A(Ξ) (A(Ξ)∈R0)-δ P A(Ξ ′ ) (A(Ξ ′ )∈R0) ≤ ε. The common interpretation of (ε, δ)-differential privacy is that it is ε-differential privacy except with probability δ (Mironov, 2017) . The parameters ε and δ are privacy budgets indicating the strength of privacy protection from the algorithm. The classic differential privacy is called ε-differential privacy with δ = 0, which imposes an upper bound e ε on the multiplicative distance of probability distributions of randomized query outputs for any two neighbouring data sets (Dong et al., 2019) .

3. SENSITIVITY OF THE STOCHASTIC DKM ITERATION

In this section, we estimate the ℓ 2 norm sensitivity of the stochastic dKM, laying the foundation for noise addition in the truncated generalized Gaussian mechanisms in Section 5. Before formalizing the result, we present the assumptions throughout and introduce the definition of the sensitivity of algorithms in a decentralized learning setting. Assumption (1) The loss function is c.c.p. and sub-differentiable with respect to x, and the fixedpoint iteration is bounded by a finite constant B, i.e., max x,ξ ∥T (x; ξ) -x∥ ≤ B; (2) The loss function l(x, ξ) is nonnegative and ∥l∥ ∞ ≤ R for some constant R > 0. ∥T (x; ξ) -x∥ in Assumption ( 1) is defined as a fixed point residual in the literature which typically relates to the gradient of an objective function (Davis & Yin, 2016) . We note that Assumption ( 1 Definition 4 (Sensitivity) For a specific algorithm A acting on training samples, Ξ ′ , Ξ ′′ which are two adjacent datasets that differ by one data point. Until iteration K, define the ∆ K -sensitivity of algorithm A as ∆ K := sup Ξ ′ ,Ξ ′′ ∥A (Ξ ′ ) -A (Ξ ′′ )∥ . We are now establishing the ∆ K -sensitivity of the dKM algorithm. That is, we, through Theorem 1, provide the boundedness on ∆ K due to the only one different point for any two adjacent datasets. Theorem 1 (∆ K -Sensitivity) Given x K = M m=1 x K (m) M , y K = M m=1 y K (m) M , denote x K and y K as the corresponding outputs of the dKM algorithm applied to two sets Ξ ′ , Ξ ′′ of size N M which differ at only one point. Assume the initial value X = 0. With Assumption (1) satisfied, given relaxed parameter, {α k } K k=0 ∈ (0, 1], the ∆ K -sensitivity of the dKM algorithm has the upper bound, E∆ K ≤ 2B K-1 k=0 α k N M + 4B K-1 k=0 (1 + 2α k ) k-1 j=0 α j λ k-1-j . Note that the derivation of sensitivity of our proposed dKM algorithm does not require the assumption of smoothness and strong convexity of objective functions. Theorem 1 quantifies the accumulated deviation bound between two trajectories of iterates based on two datasets that differ at only one point, where it allows to exist at any local agent. Compared with Huang & Gong (2020) studying the local sensitivity, Theorem 1 establishes the global sensitivity as the local communication of the network makes this different point, e.g. storing in agent 1, affects the final output. The expectation in Theorem 1 comes from the randomness of picking the different point to update the iterate in Definition 2. Specifically, we pick the only different point for two adjacent datasets with probability 1/N to update the iterate and have 1 -1/N chance using the same points. We, from Theorem 1, have that with a fixed iteration number K, as the data size, M, N increases and λ decreases, ∆ K gets smaller for both diminishing and constant learning rates. However, it fails to control the sensitivity when K increases, which also suggests the risk of privacy that, with the higher iterative step, it will be easier to identify the specific sample. Moreover, the sensitivity decreases as λ decreases indicating the effect of different topologies on ∆ K . Table 2 in the appendix summarizes it for clarity. This theorem also provides a rule to establish the adding mechanisms to guarantee DP in Section 5.

4. PERFORMANCE AND GENERALIZATION OF DECENTRALIZED LEARNING ALGORITHMS WITH DIFFERENTIAL PRIVACY

Existing DP schemes in decentralized learning typically rely on the perturbation of objective functions, and gradients, but are limited to iterates (Yu et al., 2021a; Xu et al., 2022; Huang & Gong, 2020) . Such methods usually introduce extra noise that has privacy preservation. It is still hard to examine the privacy and performance trade-off in the generalization of DP algorithms (He et al., 2021) . In this section, we establish a generalization error bound and a finite sample guarantee of decentralized learning algorithms when these algorithms satisfy differential privacy. These results illustrate the effectiveness of using dKM with any differentially private mechanism (Definition 3) in applications. We next proceed by quantifying the bound considering iterate independent noise addition mechanisms and computing the end-to-end differential privacy guarantee across M agents over a network system. Let L(x) = E ξ∼P [l(x, ξ)] and x ⋆ be its optimal solution. Note that, for a specific stochastic algorithm B := (A 1 , • • • , A M ) on Ξ with sample size N M with output B(Ξ), where A 1 , . . . , A M performing on local agent allows being different, the excess generalization error of B defined as, E Ξ,B [L(B(Ξ)) -L (x ⋆ )] , can be decomposed into three terms (Bottou & Bousquet, 2007) , E Ξ,B L(B(Ξ)) -L(B(Ξ)) generalization error + E Ξ,B L(B(Ξ)) -L( x) optimization error + E Ξ,B L( x) -L (x ⋆ ) test error . ( ) We establish the boundedness of generalization error in Theorem 2 that reflects joint effects caused by the data Ξ and the algorithm. Theorem 2 (Generalization Bound) Assume that the decentralized learning algorithm B : Ξ → R p × {1, • • • , M } is (ε, δ)-differentially private. Under Assumption (2), we have that, | E Ξ∼P M N ,B(Ξ) L(B(Ξ)) -L(B(Ξ)) | ≤ (1 -e -ε )R + e -ε M δ. Theorem 3 (Finite Sample Guarantee) Under the Assumption of Theorem 2, we have, P(L(B(Ξ)) ≤ L(B(Ξ)) + ϵ) ≥ ϵ -(1 -e -ε )R -e -ε M δ ϵ + R , for any ϵ > 0. These two theorems represent the gap between the empirical loss based on finite samples and its expectation. It demonstrates the impact of differential privacy on out-of-sample performance by establishing the bound of L(B(Ξ)) in probability and expectation. Although ensuring data privacy sacrifices the generalization, these results show that a good privacy-preserving mechanism still retains a certain level of generalization as well as a finite sample guarantee. In the existing work for DP decentralized learning, Xu et al. (2022) provided convergence and regret analysis based on gradient aggregation and Gaussian mechanism in the presence of Byzantine nodes; Yu et al. (2021a) explored the convergence rate of DP-SGD algorithm with Gaussian mechanism; Huang & Gong (2020) theoretically analyzed the utility of DP-ADMM algorithm, which can be measured by the expected empirical risk with feasibility violation. Note that our theoretical results in Theorem 2 and 3 are suitable for all DP mechanisms. As far as we know, we are the first to establish a generalization bound and finite sample guarantee in DP decentralized setting. We address that Theorem 2 and 3 require the algorithm B globally being (ε, δ)-DP. Additionally, considering that each agent acts independently in practice, where there is less likely to reach an agreement on a consistent (ε, δ)-DP across all agents (Bellet et al., 2018) , we then proceed by investigating how the local computation would affect global differential privacy as a composition theorem which also provides the reasonableness of DP assumption in Theorem 2 and 3. In detail, Theorem 4 shows the level of overall privacy cost, given the privacy cost levels of the local agents. Similar results are discussed in Huang & Gong (2020) ; Yu et al. (2021a) . Theorem 4 (Composition Theorem) Define iterates, which is similar to the output in Definition 2, generated by the specific stochastic algorithm with K steps as {x k } K k=1 . For the m-th agent, denote Ãm : Ξ → {x k (m)} K k=1 , where xk (m) is the iterates corrupted by noise. For any fixed m, if Ãm is (ε m , δ m )-differential private, then Xk = (x k (1), • • • , xk (M )) T is (ε ′ , δ ′ )-differential private, where, ε ′ = min {ε 1 , ε 2 , ε 3 } , δ ′ = 1 - M m=1 (1 -e am δ m 1 + e εm ) + 1 - M m=1 (1 - δ m 1 + e εm ) , with, ε 1 = M m=1 ε m , ε 2 = M m=1 (e εm -1)εm e εm +1 + M m=1 2ε 2 m log e + √ M m=1 ε 2 m δ , ε 3 = M m=1 C KL (m) + 2log( 1 δ ′ )( M m=1 ε 2 m ) with C KL (m) := min {min {2, e εm -1} ε m , ε m }, for some 0 < a m ≤ ε m , M m=1 a m = ε ′ , and real constant δ. For completeness, Algorithm 1 shows the detailed iterative step of dKM with noise addition to preserve DP, and we further examine the optimization error bound in formula (2) that is caused by adding noise to the query output. Specifically, we consider iterate independent noise addition mechanisms (Definition 5) to preserve DP for dKM in practice: a random noise is added to the iterate to reduce leakage information. Definition 5 (Noise-adding Mechanisms for dKM) Given a data set Ξ, a query-output independent noise-adding mechanism A will release the query output xk = A(x k ; Ξ) corrupted by an additive random noise d, xk = x k + d. Algorithm 1 Differentially Private Decentralized Krasnosel'skiǐ-Mann Iteration (DP-dKM) 1: Initialize: X0 , mixing matrix W, α k ∈ (0, 1], number of iterations K 2: while k ≤ K do 3: for m ∈ V (m ∈ [1, M ]) do 4: x k (m) = W Xk-1 (m) + α k-1 (T (x k-1 (m)) -x k-1 (m)) (Local computation) 5: end for 6: for m ∈ V do 7: Generate random noise ε k m , xk (m) = x k (m) + ε k m (Differential Privacy) 8: Broadcast xk (m) to all neighbours j ∈ N (m) 9: end for 10: end while 11: Output: X K = (x K (1), • • • , x K (M )) and xK = 1 M M m=1 x K (m) Let X ⋆ = [x ⋆ , • • • , x ⋆ ] ⊤ ∈ R M ×p be the true parameter and X = [x(1), • • • , x(M ) ] ⊤ be the released iterates corrupted by an additive random noise for each agent, the DP-dKM can be written as 1) and ( 2), the error bound is controlled by X k+1 = W X k + α k T X k ; Ξ -X k . According to Assumption ( ∥W X k + α k T X k ; Ξ -X k -X ⋆ ∥ ≤ ∥X k+1 -X ⋆ ∥ + ∥W X k -X k ∥, where the first term is the same as in the non-privacy setting, which depends on the convergence properties of a given algorithm (Wotao, 2019) . The second term indicates the deviation by using the privacy mechanisms. The following theorem gives the upper bound of the second term. Theorem 5 (Boundedness of local iterates based on Gaussian additive noise) Given a data set Ξ, assume an iterative independently noise-adding mechanism Ã releases the output Ã(x k ; Ξ) := x k + d corrupted by an additive random noise d, where d follows a Gaussian distribution with mean µ and variance σ 2 . The error bound caused by the additive noise is, E W X k -X k 2 F ≤ p σ 2 + µ 2 (M -1)λ 2 + 1 .

5. DIFFERENTIAL PRIVACY VIA TRUNCATED GENERALIZED GAUSSIAN MECHANISMS

While commonly adopted Gaussian noise-adding mechanisms for a single iterate can guarantee DP (Croft et al., 2022; Cormode et al., 2019; Yu et al., 2021a; Xu et al., 2022; Huang & Gong, 2020) , such mechanisms do not take into consideration the valid range of the iterates being posed and the utility of learning algorithm Geng et al. (2018) ; Bun et al. (2018) : the extremely large noise will severely affect a learning process and degrade the performance of the trained model under differential privacy guarantee. For example, Yu et al. (2021a; b) gave the lower bound of variance of Gaussian noise to guarantee DP. Ganesh & Zhao (2020) considered (ε, δ)-differential privacy with generalized Gaussian mechanisms to answer k counting queries about a database. Different from Ganesh & Zhao (2020) , our proposed generalized Gaussian mechanisms are novel due to the boundedness of noise. Specifically, we truncate the probability density function used for the generation of noise with a careful determination of an appropriate bounding parameter and propose truncated Generalized Gaussian (GG) distribution P d := GG(0, σ, b) with location parameter 0, scale parameter σ > 0, shape parameter b > 0. Its probability density function is, p(z | 0, σ, b) = C gg exp - |z| σ b , where z ∈ [-A, A], where C gg is a constant to guanrantee 2018)) Given the privacy parameters, 0 < δ < 1 2 , ε > 0, and iterates sensitivity ∆ > 0, the truncated Laplacian distribution with p = 1 in formula (3) preserves (ε, δ)-differential privacy when taking λ : = ∆ ε , C Lap := 2λ 1 -e -A λ -1 , A := ∆ ε log 1 + e ε -1 2δ . Theorem 6 (Truncated Gaussian Distribution) The truncated Gaussian distribution p nor (z) with p = 2 in formula (3) preserves (ε, δ)-differential privacy, where σ 2 ≥ ε -1 ∆ 2 , the constants C nor and A are determined by A -A p nor (z)dz = 1 satisfying the equation, C nor • ∞ l=0 (-1) l • A 2l+1 σ 2l l!(2l + 1) = 1 2 , C nor • ∞ l=0 (-1) l A 2l+1 -(A -∆) 2l+1 σ 2l l!(2l + 1) = δ. Note that the truncated Gaussian mechanism is also considered in Cesar & Rogers (2021) which focused on exploring privacy loss composition bounds for special classes of differentially private algorithms, while we aim to reduce the amount of added noise with the same level of privacy. An important property of the truncated GG mechanism is that the range of addition noises is bounded to [-A, A] while the DP still holds. More importantly, the truncated GG mechanism simultaneously improves the utility and guarantees privacy. Its good performance compared with the state-of-the-art methods is illustrated in the numerical experiments. 

6. NUMERICAL EXPERIMENT

We compare the proposed DP-dKM algorithm with baseline algorithms under various decentralized settings (ring, star, and full-connected graphs): (a) non-private decentralized approach; (b) private decentralized approach with Laplace noise; (c) private decentralized approach with Gaussian noise (Yu et al., 2021a; Huang & Gong, 2020) . The average root means squared error (RMSE) is used to quantify their performance. To start with, we do a sensitivity analysis caused by privacy parameters (ε, δ) by solving least squares using the SGD algorithm on a fully connected graph. The results are shown in Figure 1 . In Figure 1 , the first and second columns compare the performance of Laplace and truncated Laplace mechanisms with different ε and δ. Similarly, the comparisons between Gaussian and truncated Gaussian mechanisms are shown in the third and fourth columns. Figure 1 indicates that the proposed mechanism has the smallest RMSE compared with Laplace and Gaussian mechanisms and enjoys better convergence properties. In addition, the results demonstrate the privacy-utility tradeoffs of the proposed approach: the RMSE increases as ε increases with fixed δ. When privacy leakage increases, the truncated Laplace and truncated Gaussian approach achieves better utility. We next consider ℓ 1 regularized least square regression and ℓ 1 regularized logistic regression by employing the differentially private SPGD and ADMM algorithms with truncated generalized GG noise with b = 1, 2 to evaluate the performance of Algorithm 1, 1 M N M m=1 N i=1 (A mi x -b mi ) 2 + λ∥x∥ 1 , 1 M N M m=1 N i=1 {log(1 + e Amix ) -b mi A mi x} + λ∥x∥ 1 . The element A mi , x are drawn independently from the normal distribution. λ > 0 is a regularized parameter controlling the impact of the regularizer and is chosen by the grid search method. We fix 

7. CONCLUSION

In this paper, we have proposed a general framework of the privacy-preserving algorithm, DP-dKM, that is applicable to all communication network diagrams and covers many existing decentralized learning and optimization problems and show that the proposed algorithm retains the performance guarantee on generalization, and finite sample performance. We also established the effect of local privacy-preserving computation on global differential privacy. To avoid extremely large additional noise added to the shared information that will severely affect and degrade the performance of the learning process, we have introduced a truncated generalized Gaussian mechanism, in which we demonstrate privacy and utility trade-offs under a differential privacy guarantee. Experiments have demonstrated that our algorithm is effective in decentralized settings and performs better than the state-of-the-art baseline algorithms.



techniques. It quantifies to what extent individual privacy in a dataset is preserved while releasing aggregated information. Definition 3 ((ε, δ)-Differential Privacy Dwork et al. (2006)) A stochastic algorithm A is called (ε, δ)-differential privacy if for any subset R 0 ⊂ R p and any neighbouring sample set pair Ξ and Ξ ′

) is weaker thanYu et al. (2021a);Xu et al. (2022);Huang & Gong (2020);Sun et al. (2021);Zhu et al. (2022) as well as a common Assumption (2) inSun et al. (2021);Zhu et al. (2022).The sensitivity based on two datasets that differ at only one point is commonly used inYu et al.  (2021a);Xu et al. (2022);Huang & Gong (2020). Although the only different point stored at any local agent, the local communication among agents without a coordinator, affecting the full networks, promotes us to quantify its impact on a learning algorithm globally as introduced in Definition 4.

-A p(z | 0, σ, b)dz = 1. In the experiment, we use truncated GG noise with b = 1, 2, which represents the truncated Laplace distribution (Definition 6) and truncated normal distribution (Theorem 6) to preserve DP. Definition 6 (Truncated Laplacian Distribution Geng et al. (

Figure1: The sensitivity analysis of parameter (ε, δ) for SGD with a full connected graph.

Figure 2: Estimation error on Ring, Star, and Full connected graphs. The first and two columns are ℓ 1 penalized least squares; The third and fourth columns are ℓ 1 penalized logistic regression.

Figure 3: The estimation error as the number of agents changes.

