ANALYTICAL COMPOSITION OF DIFFERENTIAL PRI-VACY VIA THE EDGEWORTH ACCOUNTANT

Abstract

Many modern machine learning algorithms are in the form of a composition of simple private algorithms; thus, an increasingly important problem is to efficiently compute the overall privacy loss under composition. In this paper, we introduce the Edgeworth Accountant, an analytical approach to composing differential privacy guarantees of private algorithms. The Edgeworth Accountant starts by losslessly tracking the privacy loss under composition using the f -differential privacy framework (Dong et al., 2022) , which allows us to express the privacy guarantees using privacy-loss log-likelihood ratios (PLLRs). As the name suggests, this accountant next uses the Edgeworth expansion (Hall, 2013) to upper and lower bound the probability distribution of the sum of the PLLRs. Moreover, by relying on a technique for approximating complex distributions by simple ones, we demonstrate that the Edgeworth Accountant can be applied to composition of any noise-addition mechanism. Owing to certain appealing features of the Edgeworth expansion, the (ε, δ)-differential privacy bounds offered by this accountant are non-asymptotic, with essentially no extra computational cost, as opposed to the prior approaches in Koskela et al. (2020); Gopi et al. (2021) , in which the running times are increasing with the number of compositions. Finally, we show our upper and lower (ε, δ)-differential privacy bounds are tight in certain regimes of training private deep learning models and federated analytics.

1. INTRODUCTION

Differential privacy (DP) provides a mathematically rigorous framework for analyzing and developing private algorithms working on datasets containing sensitive information about individuals (Dwork et al., 2006) . This framework, however, is often faced with challenges when it comes to analyzing the privacy loss of complex algorithms such as privacy-preserving deep learning and federated analytics (Ramage & Mazzocchi, 2020; Wang et al., 2021) , which are composed of simple private building blocks. Therefore, a central question in this active area is to understand how the overall privacy guarantees degrade from the repetition of simple algorithms applied to the same dataset. Continued efforts to address this question have led to the development of relaxations of differential privacy and privacy analysis techniques (Dwork et al., 2010; Dwork & Rothblum, 2016; Bun et al., 2018; Bun & Steinke, 2016) . A recent flurry of activity in this line of research was triggered by Abadi et al. (2016) , which proposed a technique called moments accountant for providing upper bounds on the overall privacy loss of training private deep learning models over iterations. A shortcoming of moments accountant is that the privacy bounds are generally not tight, albeit computationally efficient. This is because this technique is enabled by Rényi DP in Mironov (2017) and its following works (Balle et al., 2018; Wang et al., 2019) , whose privacy loss profile can be lossy for many mechanisms. Alternatively, another line of works directly compose (ε, δ)-DP guarantees via numerical methods such as the fast Fourier transform (Koskela et al., 2020; Gopi et al., 2021) . This approach can be computationally expensive, as the number of algorithms under composition is huge, which unfortunately is often the case for training deep neural networks. Instead, this paper aims to develop computationally efficient lower and upper privacy bounds for the composition of private algorithms with finite-sample guaranteesfoot_0 by relying on a new privacy defini- The comparison between the GDP approximation in Dong et al. (2022) , and our Edgeworth Accountant. Both methods start from the exact composition using f -DP. Upper: Dong et al. (2022) uses a CLT type approximation to get a GDP approximation to the f -DP guarantee, then converts it to (ε, δ)-DP via duality (Fact 1). Lower: We losslessly convert the f -DP guarantee to an exact (ε, δ(ε))-DP guarantee, with δ(ε) defined with PLLRs in (3.1), and then take the Edgeworth approximation to numerically compute the (ε, δ)-DP. tion called f -differential privacy (f -DP (Dong et al., 2022) ). f -DP offers a complete characterization of DP guarantees via a hypothesis testing interpretation, which was first introduced in Kairouz et al. (2015) , and enables a precise tracking of the privacy loss under composition using a certain operation between the functional privacy parameters. Moreover, Dong et al. (2022) developed an approximation tool for evaluating the overall privacy guarantees using a central limit theorem (CLT), which can lead to approximate (ε, δ)-DP guarantees using the duality between (ε, δ)-DP and Gaussian Differential Privacy (GDP, a special type of f -DP) (Dong et al., 2022) . While the (ε, δ)-DP guarantees are asymptotically accurate, a usable finite-sample guarantee is lacking in the f -DP framework. In this paper, we introduce the Edgeworth Accountant as an analytically efficient approach to obtaining finite-sample (ε, δ)-DP guarantees by leveraging the f -DP framework. In short, the Edgeworth Accountant makes use of the Edgeworth approximation (Hall, 2013) , which is a refinement to the CLT with a better convergence rate, to approximate the distribution of the sum of certain random variables that we refer to as privacy-loss log-likelihood ratios (PLLRs). By leveraging a Berry-Esseen type bound derived for the Edgeworth approximation, we obtain non-asymptotic upper and lower privacy bounds that are applicable to privacy-preserving deep learning and federated analytics. On a high level, we compare the approach of our Edgeworth Accountant to the Gaussian Differential Privacy approximation in Figure 1 . Additionally, we note that while the rate of the Edgeworth approximation is well conceived, the explicit finite-sample error bounds are highly non-trivial. To the best of our knowledge, it is the first time such a bound has been established in the statistics and differential privacy communities and it is also of interest on its own. We have made available two versions of our Edgeworth Accountant to better fulfill practical needs: the approximate Edgeworth Accountant (AEA), and the exact Edgeworth Accountant interval (EEAI). The AEA can give an estimate with asymptotically accurate bound for any number of composition m. By using higher-order Edgeworth expansion, such an estimate can be arbitrarily accurate, provided that the Edgeworth series converges, and therefore it is useful in practice to quickly estimate privacy parameters. As for the EEAI, it provides an accurate finite-sample bound for any m. It gives a rigorous bound on the privacy parameters efficiently. Our proposal is very important as an efficiently computable DP-accountant. For the composition of m identical mechanisms, our algorithm runs in O(1) time to compute the privacy loss, and for the general case when we need to compose m heterogeneous algorithms, the runtime becomes O(m), which is information-theoretically optimal. In contrast, fast Fourier transform (FFT)-based algorithms (Gopi et al., 2021) provide accurate finite-sample bound, but can only achieve polynomial runtime for general composition of private algorithms. The suboptimal time-complexity of FFT-based methods leads to a large requirement of resources when m is large, and a large m is quite common in practice. For example, in deep learning and federated learning, m is the number of iterations (rounds) and can be potentially very large. To make things worse, in real-world applications, the same dataset is often adaptively used or shared among different tasks. To faithfully account for the privacy loss, the DP accountant system has to track the cost of each iteration across different tasks, further increasing the number of compositions. Our EEAI serves as the first DP accountant method that simultaneously provides finite-sample guarantees, performs with optimal time complexity, and is very accurate (when m is large), which can be a good supplement to the current toolbox. The paper is organized as follows. We briefly summarize related work in privacy accounting of differentially private algorithms as well as our contributions in Section 1. In Section 2 we introduce the concept of f -DP and its important properties. We then introduce the notion of privacy-loss log-likelihood ratios in Section 3 and establish how to use them for privacy accountant based on distribution function approximation. In Section 4 we provide a new method, Edgeworth Accountant, that can efficiently and almost accurately evaluate the privacy guarantees, while providing finitesample error bounds. Simulation results and conclusions can be found in Sections 5 and 6. Proofs and technical details are deferred to the appendices.

1.1. MOTIVATING APPLICATIONS

We now discuss two motivating applications: the NoisySGD (Song et al., 2013; Chaudhuri et al., 2011; Abadi et al., 2016; Bu et al., 2020) as well as the Federated Analytics and Federated Learning (Ramage & Mazzocchi, 2020; Wang et al., 2021) . The analysis of DP guarantees of those applications are important yet especially challenging due to the large number of compositions involved. Our goal is primarily to devise a general tool to analyze the DP guarantees for these applications. NoisySGD. NoisySGD is one of the most popular algorithms for training differentially private neural networks. In contrast to the standard SGD, the NoisySGD has two additional steps in each iteration: clipping (to bound the sensitivity of the gradients) and noise addition (to guarantee the privacy of models). The details of the NoisySGD algorithm is described in Algorithm 1 in Appendix B. Federated Analytics. Federated analytics is a distributed analytical model, which performs statistical tasks through the interaction between a central server and local devices. To complete a global analytical task, in each iteration, the central server randomly selects a subset of devices to carry out local analytics and then aggregates results for the statistical analysis. The total number of iterations is usually very largefoot_1 in federated analytics, requiring a tight analysis of its DP guarantee.

1.2. RELATED WORK

In this section, we present the following comparison of several existing works in DP accountant in Table 1 . Specifically, we focus on their theoretical guarantees and the runtime complexity when the number of composition is m. We present a detailed survey of those DP accountants in Appendix A. Method Finite-sample guarantee Tightness of guarantee Computational complexity GDP/GDP-E No N/A O(1), O(m) MA Only upper bound Loose conversion to (ε, δ)-DP O(1), O(m) FFT Yes Yes O( √ m), O(m 2.5 ) EA Yes Yes * O(1), O(m) Table 1 : Comparison among different DP accountants. Each entry in the computation complexity contains two columns: (Left) the runtime for the composition of m identical algorithms; (Right) the runtime for the composition of m general algorithms. GDP: the Gaussian differential privacy accountant (Dong et al., 2022) ; GDP-E: the Edgeworth refinement to the GDP accountant (Zheng et al., 2020) ; MA: the moments accountant using Rényi-DP (Abadi et al., 2016) ; FFT: the fast Fourier transform accountant for privacy random variables (Gopi et al., 2021) ; EA: the Edgeworth Accountant we propose, including both the AEA (Definition 4.1), and the EEAI (Definition 4.2). *The guarantee of EA is tight when the order of the underlying Edgeworth expansion k is high, or when m is large for k = 1.

1.3. OUR CONTRIBUTIONS

We now briefly summarize our three main contributions. Improved time-complexity and estimation accuracy. We propose a new DP accountant method, termed Edgeworth Accountant, which gives finite-sample error bound in constant/linear time complexity for the composition of identical/general mechanisms. In practice, our method outperforms GDP and moments accountant, with almost the same runtime. A unified framework for efficient and computable evaluation of f -DP guarantee. Though the evaluation of f -DP guarantee is #P-hard, we provide a general framework to efficiently approximate it. Leveraging this framework, any approximation scheme to the CDFs of the sum of privacy-loss log-likelihood ratios (PLLRs) can directly transform to a new DP accountant. Exact finite-sample Edgeworth bound analysis. We are, to our best knowledge, the first to use Edgeworth expansion with finite-sample bounds in the statistics and machine learning communities. The analysis of the finite-sample bound of Edgeworth expansion is of its own interest, and has many potential applications. We further derive an explicit adaptive exponential decaying bound for the Edgeworth expansion of the PLLRs, which is the first such result for Edgeworth expansion.

2. PRELIMINARIES AND PROBLEM SETUP

In this section, we first define the notion of differential privacy and f -DP mathematically. We then set up the problem by revisiting our motivating applications. A differentially private algorithm promises that an adversary with perfect information about the entire private dataset in use -except for a single individual -would find it hard to distinguish between its presence or absence based on the output of the algorithm (Dwork et al., 2006) . Formally, for ε > 0, and 0 ≤ δ < 1, we consider a (randomized) algorithm M that takes as input a dataset. Definition 2.1. A randomized algorithm M is (ε, δ)-DP if for any neighboring dataset S, S differing by an arbitrary sample, and for any event E, P[M (S) ∈ E] e ε • P [M (S ) ∈ E] + δ. In Dong et al. (2022) , the authors propose to use the trade-off between the type-I error and type-II error in place of a few privacy parameters in (ε, δ)-DP. To formally define this new privacy notion, we denote by P and Q the distribution of M (S) and M (S ), and let φ be a (possibly randomized) rejection rule for a hypothesis testing, where H 0 : P vs. H 1 : Q. The trade-off function f between P and Q is then defined as the mapping between type-I error to type-II error, that is, f = T (P, Q) : α → inf φ {1 -E Q [φ] : E P [φ] α} . This motivates the following definition. Definition 2.2. A (randomized) algorithm M is f -differentially private if T (M (S), M (S )) f for all neighboring datasets S and S . The following facts about f -DP have been established in Bu et al. (2020) ; Dong et al. (2022) . Fact 1 (Duality to (ε, δ)-DP). A mechanism is f -DP if and only if it is (ε, δ(ε))-DP for all ε > 0, with δ(ε) = 1 + f * (-e ε ). Here g * (y) = sup -∞<x<∞ yx -g(x) is the convex conjugate of g. Fact 2 (Composition). Letting M 1 and M 2 be two mechanisms, we define their composition algorithm M as M (S) = (M 1 (S), M 2 (S, M 1 (S))). In general, the composition of more than two algorithms follows recursively. Given trade-off functions f = T (P, Q) and g = T (P , Q ), let f ⊗ g = T (P × P , Q × Q ). Assume M t is f t -DP for t = 1, . . . , m. The composition theorem states that their m-fold composition algorithm is f 1 ⊗ • • • ⊗ f m -DP, which is tight in general. Fact 3 (Subsampling). Consider the following two most common subsampling schemes: (1) (Poisson subsampling) for each individual in the dataset S, includes its datum in the subsample independently with probability p; (2) (Uniform subsampling) draws a subsample of S that is chosen uniformly at random among all s = |S|p-sized subsets of S. Denote Id(α) = 1 -α, and suppose an algorithm M is f -DP. The subsampling theorem for f -DP states that the Poisson subsampled and uniform subsampled algorithm are both min{f p , f -1 p } * * -DP, where f p = pf + (1 -p) Id. Fact 4 (Gaussian Differential Privacy (GDP)). To deal with the composition of f -DP guarantees, Dong et al. (2022) introduce the concept of µ-GDP, which is a special case of f -DP with f = G µ = T (N (0, 1), N (µ, 1)). They prove that when all the f -DP guarantees are close to the identity, their composition is asymptotically a µ-GDP with some computable µ, which can then be converted to (ε, δ)-DP via duality. However, it comes without a finite-sample bound. With these facts, we can characterize the f -DP guarantee for motivating applications in Section 1.1. NoisySGD. For a NoisySGD with m iterations, subsampling ratio of p, and noise multiplier σ, it is min{f, f -1 } * * -DP (Bu et al., 2020; Dong et al., 2022) , with f = pG 1/σ + (1 -p)Id ⊗m . Federated Analytics. Suppose there are m tasks, and each task is f i -DP with f i = T (P i , Q i ). Then the overall DP guarantee is characterized by m i=1 f i -DP. It is easy to see that the f -DP guarantee of NoisySGD is a special case of the f -DP guarantee of federated analytics with each trade-off function being f i = min{f p , f -1 p } * * , that is, with identical composition of subsampled Gaussian mechanisms. Therefore, our goal is to efficiently and accurately evaluate the privacy guarantee of the general m i=1 f i -DP with an explicit finite-sample error bound.

3. PRIVACY-LOSS LOG-LIKELIHOOD RATIOS (PLLRS)

We aim to compute the explicit DP guarantees for general composition of trade-off functions of the form f = m i=1 f i . For the i-th composition, the trade-off function f i = T (P i , Q i ) is realized by the two hypotheses: H 0,i : w i ∼ P i vs. H 1,i : w i ∼ Q i , where P i , Q i are two distributions. To evaluate the trade-off function f = m i=1 f i , we are essentially distinguishing between the two composite hypotheses H 0 : w ∼ P 1 × P 2 × • • • × P m vs. H 1 : w ∼ Q 1 × Q 2 × • • • × Q m , where w = (w 1 , ..., w m ) is the concatenation of all w i 's. Motivated by the optimal test asserted by the Neyman-Pearson Lemma, we give the following definition. Definition 3.1. The associated pair of privacy-loss log-likelihood ratios (PLLRs) is defined to be the logarithm of the Radon-Nikodym derivatives of two hypotheses under null and alternative hypothesis, respectively. Specifically, we can express PLLRs with respect to H 0,i and H 1,i as X i ≡ log dQi(ξi) dPi(ξi) , Y i ≡ log dQi(ζi) dPi(ζi) , where ξ i ∼ P i , ζ i ∼ Q i . 3 Note that the definition of PLLRs only depends on the two hypotheses. It allows us to convert the f -DP guarantee to a collection of (ε, δ)-DP guarantees losslessly. The following proposition characterizes the relationship between ε and δ in terms of the distribution functions of PLLRs. Proposition 3.2. Let X 1 , . . . , X m and Y 1 , . . . , Y m be the PLLRs defined above. Let F X,m , F Y,m be the CDFs of The key contribution of our Proposition 3.2 is that we express the (ε, δ)-DP characterization of the #P-complete f -DP in terms of (3.1), which can be approximated directly. Of note, the above relationship is general in the sense that we make no assumption on the private mechanisms. X 1 + • • • + X m and Y 1 + • • • + Y m , respectively. Then, the composed mechanism is f -DP (with f = m i=1 f i ) if and only if it is (ε, δ)-DP for all ε > 0 with δ defined by δ = 1 -F Y,m (ε) -e ε (1 -F X,m (ε)). ( Definition 3.1 can be applied directly when dQi(ξi) dPi(ξi) is easy to compute, which corresponds to the case without subsampling. To deal with the case with subsampling, one must take into account that the subsampled DP guarantee is the double conjugate of the minimum of two asymmetric trade-off functions (for example, recall the trade-off function of a single sub-sampled Gaussian mechanism is min{f p , f -1 p } * * , where f p = (pG 1/σ + (1 -p)Id)). In general, the composition of multiple subsampled mechanisms satisfies f -DP for f = min{⊗ m i=1 f i,pi , ⊗ m i=1 f -1 i,pi } * * . This general form makes the direct computation of the PLLRs through composite hypotheses infeasible, as it is hard to write f as a trade-off function for some explicit pair of hypotheses H 0 and H 1 . Therefore, instead of using one single sequence of PLLRs directly corresponding to f , we shall use a family of sequences of PLLRs. In general, suppose we have a mechanism characterized by some f -DP guarantee, where f = inf α∈I {f (α) } * * , for some index set I. That is, f is the tightest possible trade-off function satisfying all the f (α) -DP. Suppose further that for each α, we can find a sequence of computable PLLRs corresponding to f (α) , which allows us to obtain a collection of (ε, δ (α) (ε))-DP guarantees. Lemma 3.3. Suppose that for each α, functions f (α) and δ (α) satisfy that a mechanism is f (α) -DP if and only if it is (ε, δ (α) (ε))-DP for all ε > 0. Then a mechanism is f = inf α∈I {f (α) } * * -DP if and only if it is (ε, sup α {δ (α) (ε)})-DP for all ε > 0. We defer the proof of this lemma to the appendices. The intuition is that both inf α∈I {f (α) } * * -DP and (ε, sup α {δ (α) (ε)})-DP correspond to the tightest possible DP-guarantee for the entire collection. Lemma 3.3 allows us to characterize the subsampled Gaussian mechanism using two sequences of PLLRs. As mentioned above, it is f -DP with f = min{⊗ m i=1 f i,p , ⊗ m i=1 f -1 i,p } * * , where each f i,p = pG 1/σ + (1 -p)Id . For the first part, the PLLRs corresponding to ⊗ m i=1 f i,p = pG 1/σ + (1 -p)Id ⊗m are given by X (1) i = log(1 -p + pe µξi-1 2 µ 2 ), and Y (1) i = log(1 -p + pe µζi-1 2 µ 2 ), for 1 ≤ i ≤ m, with ξ i ∼ N (0, 1), ζ i ∼ pN (0, 1) + (1 -p)N (µ, 1). And for the second part, the PLLRs corresponding to ⊗ m i=1 f -1 i,p = (pG 1/σ + (1 -p)Id) -1 ⊗m are given by X (2) i = -log(1 -p + pe µζi-1 2 µ 2 ), and Y (2) i = -log(1 -p + pe µξi-1 2 µ 2 ), for 1 ≤ i ≤ m, with ξ i ∼ N (0, 1), ζ i ∼ pN (0, 1) + (1 -p)N (µ, 1). Now substituting F X (1) ,m and F Y (1) ,m by any approximation (for example, using the CLT or Edgeworth), we get a computable relationship in terms of the (ε, δ (1) (ε))-DP; and similarly, we can get a relationship in terms of the (ε, δ (2) (ε))-DP. We conclude that the subsampled Gaussian mechanism is (ε, max{δ (1) (ε), δ (2) (ε)})-DP.

3.1. TRANSFERRED ERROR BOUND BASED ON CDF APPROXIMATIONS

As discussed above, Lemma 3.3 allows us to characterize the double conjugate of the infimum of a collection of f (α) -DPs via analyzing each sequence of PLLRs separately. As a result, our focus is to compute the bounds of δ (α) for each single trade-off function f (α) . To fulfill the purpose, we seek an efficient algorithm for approximating distribution functions of the sum of PLLRs, namely, F X (α) ,m , F Y (α) ,m . This perspective provides a general framework that naturally encompasses many existing methods, including fast Fourier transform (Gopi et al., 2021) and the characteristic function method (Zhu et al., 2021) . They can be viewed as different methods for finding upper and lower bounds of F X (α) ,m , F Y (α) ,m . Specifically, we denote the upper and lower bounds of F X (α) ,m by F + X (α) ,m and F - X (α) ,m , and similarly for F Y (α) ,m . These bounds can be easily converted to the error bounds on privacy parameters of the form g (α)-(ε) ≤ δ (α) (ε) ≤ g (α)+ (ε), for all ε > 0, where g (α)+ (ε) = 1 -F - Y (α) ,m (ε) -e ε (1 -F + X (α) ,m (ε)), g (α)-(ε) = 1 -F + Y (α) ,m (ε) -e ε (1 -F - X (α) ,m (ε)). (3.2) Thus, the DP guarantee of inf α f (α) * * -DP in the form of (ε, δ(ε)) satisfies sup α {g (α)-(ε)} ≤ δ(ε) ≤ sup α {g (α)+ (ε)}, for all ε > 0. To convert the guarantee of the form (ε, δ(ε)) for all ε > 0 to the guarantee of the form (ε(δ), δ) for all δ ∈ [0, 1), we can invert the above bounds on δ(ε) and obtain the bounds of the form ε -(δ) ≤ ε(δ) ≤ ε + (δ). Here ε + (δ) is the largest root of equation δ = sup α {g (α)+ (•)}, and ε -(δ) is the smallest non-negative root of equation δ = sup α {g (α)-(•)}. Remark 3.4. In practice, we often need to solve for those roots numerically, and we need to specify a finite range in which we find all the roots. In Appendix B, we exemplify how to find such range for NoisySGD, see Remark B.1 in the appendices for details.

4. EDGEWORTH ACCOUNTANT WITH FINITE-SAMPLE GUARANTEE

In what follows, we present a new approach, Edgeworth Accountant, based on the Edgeworth expansion to approximate the distribution functions of the sum of PLLRs. For simplicity, we demonstrate how to obtain the Edgeworth Accountant for any trade-off function f (α) based on a single sequence of PLLRs {X (α) i } m i=1 , {Y (α) i } m i=1 . Henceforth, we drop the superscript α when it is clear from context. Specifically, we derive an approximate Edgeworth Accountant (AEA) and the associated exact Edgeworth Accountant interval (EEAI) for f with PLLRs {X i } m i=1 , {Y i } m i=1 . We define AEA and EEAI for general trade-off function of the form inf α f (α) * * in Appendix C.

4.1. EDGEWORTH ACCOUNTANT

To approximate the CDF of a random variable X = m i=1 X i , we introduce Edgeworth expansion in its most general form, where X i 's are independent but not necessarily identical. Such generality allows us to account for composition of heterogeneous DP algorithms. Suppose E [X i ] = µ i and γ p,i := E [(X i -µ i ) p ] < +∞ for some p ≥ 4. We define B m := m i=1 E [(X i -µ i ) 2 ], and m i=1 µ i = M m . So, the standardized sum can be written as S m := (X -M m )/B m . We denote E m,k,X (x) to be the k-th order Edgeworth approximation of S m . Note that the central limit theorem (CLT) can be viewed as the 0-th order Edgeworth approximation. The first-order Edgeworth approximation is given by adding one extra order O(1/ √ m) term to the CLT, that is, E m,1,X (x) = Φ(x) - λ3,m 6 √ m x 2 -1 φ(x). Here, Φ and φ are the CDF and PDF of a standard normal distribution, and λ 3,m is a constant to be defined in Lemma 4.3. It is known that (see for example, Hall (2013) ) the Edgeworth approximation of order p has an error rate of O(m -(p+1)/2 ). This desirable property motivates us to use the rescaled Edgeworth approximation G respectively, in (3.1) . This is what we term the approximate Edgeworth Accountant (AEA). Definition 4.1 (AEA). The k-th order AEA that defines δ(ε) for ε > 0 is given by δ m,k,X (x) = E m,k,X ((x -M m )/B m ) and G m,k,Y (x) = E m,k,Y ((x -M m )/B m ) to approximate F X,m (x) and F Y,m (x), (ε) = 1 - G m,k,Y (ε) -e ε (1 -G m,k,X (ε)), for all ε > 0. Asymptotically, AEA is an exact accountant, due to the rate of convergence Edgeworth approximation admits. In practice, however, the finite-sample guarantee is still missing since the exact constant of such rate is unknown. To obtain a computable (ε, δ(ε))-DP bound via (3.1), we require the finite-sample bounds on the approximation error of the CDF for any finite number of iterations m. Suppose that we can provide a finite-sample bound using Edgeworth approximation of the form |F X,m (x) -G m,k,X (x)| ≤ ∆ m,k,X (x), where ∆ m,k,X (x) is computable. Then we have F + X,m (x) = G m,k,X (x) + ∆ m,k,X (x) and F - X,m (x) = G m,k,X (x) -∆ m,k,X (x), ) and similarly for F Y,m . We now define the exact Edgeworth Accountant interval (EEAI). Definition 4.2 (EEAI). The k-th order EEAI associated with privacy parameter δ(ε) for ε > 0 is given by [δ -, δ + ], where for all ε > 0 δ -(ε) ≡ 1 -G m,k,Y (ε) -∆ m,k,Y (ε) -e ε (1 -G m,k,X (ε) + ∆ m,k,X (ε)), δ + (ε) ≡ 1 -G m,k,Y (ε) + ∆ m,k,Y (ε) -e ε (1 -G m,k,X (ε) -∆ m,k,X (ε)). To bound the EEAI, it suffices to have a finite-sample bound on ∆ m,k,X (ε) and ∆ m,k,Y (ε).

4.2. UNIFORM BOUND ON PLLRS

We now deal with the bound of the Edgeworth approximation on PLLRs in (4.1). Our starting point is a uniform bound of the form ∆ m,k,X (x) ≤ c m,k,X , for all x. The bound for ∆ m,k,Y (x) follows identically. To achieve this goal, we follow the analysis on the finite-sample bound in Derumigny et al. (2021) . We state the bound of the first-order Edgeworth expansion. m , where k r,j is the r-th centralized cumulant of the j-th sample. With bounded moments of order four, that is, γ 4,i < +∞ for 1 ≤ i ≤ m, we have the (uniform) bound on Edgeworth expansion as ∆ m,1,X ≤ 0.1995 K 3,m √ m + 0.031 K 2 3,m + 0.195K 4,m + 0.054 |λ 3,m | K 3,m + 0.038λ 2 3,m m + r 1,m , where K p,m := m -1 m i=1 E [|X i -µ i | p ] / Bm p , which is the average standardized p-th absolute moment, and K 3,m := K 3,m + 1 m m i=1 E |X i -µ i | γ 2,i / B3 m . Here r 1,m is a remainder term of order O(1/m 5/4 ) that depends only on K 3,m , K 4,m and λ 3,m , and is defined in Equation (H.1). Note that this lemma deals with the first-order Edgeworth approximation which can be generalized to the higher-order Edgeworth approximations. We present the analysis of the second-and third-order in the appendices. The expression of r 1,m only involves real integration with known constants which can be numerically evaluated in constant time. Remark 4.4. The precision of the EEAI highly depends on the rate of the finite-sample bound of the Edgeworth expansion. Any better bounds for higher-order Edgeworth expansions can be directly applied to our EEAI by substituting ∆ m,k,X (ε), here we simply demonstrate when k = 1 leveraging the first-order expansion. Observe that Lemma 4.3 gives a bound of order O(1/ √ m) due to the reason that we want to deal with general independent but not necessarily identical random variables. We demonstrate how one can obtain a O(1/m) rate in the i.i.d. case in Appendix H. Our current first-order bound is primarily useful when m is large enough, but a bound for higher-order Edgeworth expansions can further improve the precision for all m.

4.3. ADAPTIVE EXPONENTIAL DECAYING BOUND FOR NOISYSGD

One specific concern of the bound derived in the previous section is that it is uniform in ε. Note that in (3.1), there is an amplification factor of error by e ε in front of F X,m . Therefore, as long as ε grows in m with order at least ε ∼ Ω(log m), the error term in (3.1) scales with order e Ω(log m) /O(m) = Ω(1). In this section, we study the compositions of subsampled Gaussian mechanism (including NoisySGD and many federated learning algorithms), where we are able to improve the previous bound when ε is large. Informally, omitting the dependence on m, we want to have a bound of the form |F X,m (ε) -G m,k,X (ε)| = O(e -ε 2 ) to offset the effect of e ε in front of F X,m . To this end, we first prove that the tail bound of F X,m (ε) is of order O(e -ε 2 ), with exact constant. Combining with the tail behavior of the Edgeworth expansion, we conclude that the difference has the desired convergence rate. Following the discussion in Section 3, we need to calculate the bounds for two sequences of PLLRs separately. Here we focus on the sequence of PLLRs corresponding to pG 1/σ + (1 -p)Id ⊗m . These PLLRs are given by X i = log(1 -p + pe µξi-1 2 µ 2 ), where ξ i ∼ N (0, 1). The following theorem characterizes the tail behavior of F X,m . The tail bound of the sum of the other sequence of PLLRs corresponding to ((pG 1/σ + (1 -p)Id) -1 ) ⊗m has the same rate, and can be proved similarly. Theorem 1. There exist some positive constant a, and some associated constant η(a) > 0, such that the tail of F X,m can be bounded as 1 -F X,m (ε) = P ( m i=1 X i ≥ ε) ≤ 2 exp -(ε+mη) 2 8mτ 2 , where τ 2 = max (log(1-p+pe µa-1 2 µ 2 )+µ(a + -a)-log(1-p)) 2 4 , µ 2 , (a + -a) 2 µ 2 2 log(Φ(a + )-Φ(a)) and a + = φ(a) 1-Φ(a) . The proof of Theorem 1 is deferred to Appendix G along with its dependent technical lemmas. From the above theorem, we know that the tail of F X,m (ε) is O(e -max{ε 2 /m,m} ) = o(e -ε ), as long as ε = o(m). Note that in this case, the tail of the rescaled Edgeworth expansion is of the same order O(e -max{ε 2 /m,m} ) = o(e -ε ). Therefore, we can give a finite-sample bound of the same rate for the difference between F X,m (ε) and its approximation G m,k,X at large ε. Note that this finite-sample bound scales better than uniform bound in Lemma 4.3 when m and ε are large.

4.4. EXTENSION TO OTHER MECHANISMS

Note that our analysis framework is applicable to a wide range of common noise-adding mechanisms. Specifically, Lemma 4.3 only requires the distribution of PLLRs to have bounded fourth moments. And for many common mechanisms, a counterpart of Theorem 1 can be proved similarly. We now demonstrate how to generalize our analysis to the Laplace Mechanism. The Laplace Mechanism. It is straightforward to verify that the trade-off function for subsampled Laplace Mechanisms is given by min{(pL µ + (1 -p)Id) ⊗m , ((pL µ + (1 -p)Id) -1 ) ⊗m } * * , where L µ = T (Lap(0, 1), Lap(µ, 1)). The two associated sequences of PLLRs X i and Y i can be expressed as: X (1) i ≡ log 1 -p + pe |ξ|-|ξ-µ| , Y ≡ log 1 -p + pe |ζ|-|ζ-µ| , and X (2 ) i ≡ -log 1 -p + pe |ζ|-|ζ-µ| , Y (2) i ≡ -log 1 -p + pe |ξ|-|ξ-µ| , where ξ ∼ Lap(0, 1), ζ ∼ pLap(µ, 1) + (1 -p)Lap(0, 1). Note that all the PLLRs are bounded and thus sub-Gaussian. This implies that we can apply Lemma 4.3 directly and also bound the tail similar to Theorem 1. Proposition 4.5. Denote η = -max E(X (1) i ), E(X (2) i ) > 0. The tail of the sum of both sequence of PLLRs under the Laplace Mechanism has the following inverse exponential behavior, max P m i=1 X (1) i ≥ ε , P m i=1 X (2) i ≥ ε ≤ exp -2(ε+mη) 2 mτ 2 , where τ 2 = (log(1 -p + pe µ ) -log(1 -p + pe -µ )) 2 .

5. NUMERICAL EXPERIMENTS

In this section, we illustrate the advantages of Edgeworth Accountant by presenting the plots of DP accountant curves under different settings. Specifically, we plot the privacy curve of ε against number of compositions and compare our methods (AEA and EEAI) with existing DP accountants. We provide the implementation of our Edgeworth Accountant in Appendix D. The AEA. We first demonstrate that our proposed approximate Edgeworth Accountant (AEA) is indeed very accurate, outperforming the existing Rényi DP and the CLT approximations in experiments. The first experiment has the same setting as in Figure 1 (b) in Gopi et al. (2021) , where the authors report that both RDP and GDP are inaccurate, whereas the second setting corresponds to a real federated learning task. The results are shown in Figure 2 , where we describe the specific settings in the caption. For each sub-figure, the dotted lines "FFT_LOW" and "FFT_UPP" denote the lower and upper bound computed by FFT (Gopi et al., 2021) which are used as the underlying ground truth. The "GDP" curve is computed by the CLT approximation (Bu et al., 2020) , the "RDP" curve is computed by moments accountant using Rényi DP with subsampling amplification (Wang et al., 2019) , and the "EW_EST" curve is computed by our (second-order) AEA. As is evident from the figures, our AEA outperforms the GDP and RDP. The setting of a real application task in federated learning for 10 epochs, with p = 0.05, σ = 1, and δ = 10 -5 . Here, "EW_EST" is the estimate given by our approximate Edgeworth accountant. We omit the RDP curve in the middle subfigure for better comparisons with others. The EEAI. We now present the empirical performance of EEAI obtained in Section 4.1. We still experiment with NoisySGD. Details of the experiments are in the caption of Figure 3 . The two error bounds of EEAI are represented by "EW_UPP" and "EW_LOW", and all other curves are defined the same as in the previous setting. In addition to its optimal time complexity, our analytical finite-sample bounds also achieve better numerical stability for large m in many cases. See Appendix for details. We demonstrate the comparisons between our Edgeworth accountant (both AEA and EEAI), the RDP accountant, and the FFT accountant (whose precision of ε is set to be 0.1). The three settings are set so that the privacy guarantees does not change dramatically as m increases. Specifically, in all three settings, we set δ = 0.1, σ = 0.8, and p = 0.4/ √ m (left), p = 1/ √ m log m (middle), and p = 0.1 log m/m (right). We omit the GDP curve here, because the performance is fairly close to the AEA ("EW_EST" curve) when m is large.

6. CONCLUSION

In this paper, we provide a novel way to efficiently evaluate the composition of f -DP, which serves as a general framework for constructing DP accountants based on approximations to PLLRs. Specifically, we introduced the Edgeworth Accountant, an efficient approach to composing DP algorithms via Edgeworth approximation. In contrast, existing privacy accountant algorithms either fail to provide a finite-sample bound, or only achieve polynomial runtime for general compositions. Importantly, our approach is a complement to the existing literature when the number of compositions is large, which is typical in applications such as large-scale deep learning and federated analytics.



Here, "sample" refers to the number of compositions of DP algorithms. From now on we use the term "finite-sample" to mean that the bound is non-asymptotic in the number of compositions. The number of iterations can be small for a single analytical task. However, in most practical cases, many statistical tasks are performed on the same base of users which leads to a large number of total iterations. For completeness, we explicitly require that all ξi and ζi be independent.



Figure 1: The comparison between the GDP approximation in Dong et al. (2022), and our Edgeworth Accountant. Both methods start from the exact composition using f -DP. Upper: Dong et al. (2022) uses a CLT type approximation to get a GDP approximation to the f -DP guarantee, then converts it to (ε, δ)-DP via duality (Fact 1). Lower: We losslessly convert the f -DP guarantee to an exact (ε, δ(ε))-DP guarantee, with δ(ε) defined with PLLRs in (3.1), and then take the Edgeworth approximation to numerically compute the (ε, δ)-DP.

a relationship between f -DP and a collection of (ε, δ(ε))-DP, which reflects the primal-dual relationship between them. Note that some special forms of this general proposition have been proved previously in terms of privacy loss random variables. SeeBalle & Wang (2018);Zhu et al. (2021);Gopi et al. (2021) for details.

Lemma 4.3. Define the average individual standard deviation Bm := B m / √ m and the average standardized r-th cumulant as λ k,m := 1 m m j=1 k r,j / B3

Figure 2: The privacy curve computed via several different accountants. Left: The setting in Figure 1(b) inGopi et al. (2021), where p = 0.01, σ = 0.8, and δ = 0.015. Middle and Right: The setting of a real application task in federated learning for 10 epochs, with p = 0.05, σ = 1, and δ = 10 -5 . Here, "EW_EST" is the estimate given by our approximate Edgeworth accountant. We omit the RDP curve in the middle subfigure for better comparisons with others.

Figure3: We demonstrate the comparisons between our Edgeworth accountant (both AEA and EEAI), the RDP accountant, and the FFT accountant (whose precision of ε is set to be 0.1). The three settings are set so that the privacy guarantees does not change dramatically as m increases. Specifically, in all three settings, we set δ = 0.1, σ = 0.8, and p = 0.4/ √ m (left), p = 1/ √ m log m (middle), and p = 0.1 log m/m (right). We omit the GDP curve here, because the performance is fairly close to the AEA ("EW_EST" curve) when m is large.

