PRIVATE DATA STREAM ANALYSIS FOR UNIVERSAL SYMMETRIC NORM ESTIMATION

Abstract

We study how to release summary statistics on a data stream subject to the constraint of differential privacy. In particular, we focus on releasing the family of symmetric norms, which are invariant under sign-flips and coordinate-wise permutations on an input data stream and include L p norms, k-support norms, top-k norms, and the box norm as special cases. Although it may be possible to design and analyze a separate mechanism for each symmetric norm, we propose a general parametrizable framework that differentially privately releases a number of sufficient statistics from which the approximation of all symmetric norms can be simultaneously computed. Our framework partitions the coordinates of the underlying frequency vector into different levels based on their magnitude and releases approximate frequencies for the "heavy" coordinates in important levels and releases approximate level sizes for the "light" coordinates in important levels. Surprisingly, our mechanism allows for the release of an arbitrary number of symmetric norm approximations without any overhead or additional loss in privacy. Moreover, our mechanism permits (1 + α)-approximation to each of the symmetric norms and can be implemented using sublinear space in the streaming model for many regimes of the accuracy and privacy parameters.

1. INTRODUCTION

The family of L p norms represent important statistics on an underlying dataset, where the L p norm of an n-dimensional vector freqeuncy x is defined as the number of nonzero coordinates of x for p = 0 and L p (x) = (x p 1 + . . . + x p n ) 1/p for p > 0. Thus, L 0 norm counts the number of distinct elements in the dataset and, e.g., is used to detect denial of service or port scan attacks in network monitoring (Akella et al., 2003; Estan et al., 2003) , to understand the magnitude of quantities such as search engine queries or internet graph connectivity in data mining (Palmer et al., 2001) , to manage workload in database design (Finkelstein et al., 1988) , and to select a minimum-cost query plan in query optimization (Selinger et al., 1979) . The L 1 norm computes the total number of elements in the dataset and, e.g., is used for data mining (Cormode et al., 2005) and hypothesis testing (Indyk & McGregor, 2008) , while the L 2 norm, e.g., is used for training random forests in machine learning (Breiman, 2001) , computing the Gini index in statistics (Lorenz, 1905; Gini, 1912) , and network anomaly detection in traffic monitoring (Krishnamurthy et al., 2003; Thorup & Zhang, 2004 ). Consequently, L p estimation has been extensively studied in the data stream model (Alon et al., 1999; Indyk & Woodruff, 2005; Indyk, 2006; Li, 2008; Kane et al., 2011; Andoni, 2017; Braverman et al., 2018b; Ganguly & Woodruff, 2018; Woodruff & Zhou, 2020; 2021) . The simplest streaming model is perhaps the insertion-only model, in which a sequence of m updates increments coordinates of an n-dimensional frequency vector x and the goal is to compute or approximate some statistic of x in space that is sublinear in both m and n. In many cases, the underlying dataset contains sensitive information that should not be leaked. Hence, an active line of work has focused on estimating L p norms for various values of p, while preserving differential privacy (Mir et al., 2011; Blocki et al., 2012; Smith et al., 2020; Bu et al., 2021; Wang et al., 2021) . Definition 1.1 (Differential privacy). (Dwork et al., 2006) Given ε > 0 and δ ∈ (0, 1), a randomized algorithm A : U * → Y is (ε, δ)-differentially private if, for every neighboring streams S and S ′ and for all E ⊆ Y, Pr [A(S) ∈ E] ≤ e ε • Pr [A(S ′ ) ∈ E] + δ, where streams S and S ′ are neighboring if there exists a single update i ∈ [m] such that u i ̸ = u ′ i , where u 1 , . . . , u m are the updates of S and u ′ 1 , . . . , u ′ m are the updates of S ′ . For example, (Blocki et al., 2012) showed that the Johnson-Lindenstrauss transformation preserves differential privacy (DP), thereby showing one of the main techniques in the streaming model for L 2 estimation already guarantees DP. Similarly, (Smith et al., 2020) showed that the Flajolet-Martin sketch, which is one of the main approaches for L 0 estimation in the streaming model, also preserves DP. Notably, algorithmic designs for L p estimation in the streaming model differ greatly and require individual analysis to ensure DP, which can be quite difficult due to the complexity of the various techniques. This is especially pronounced in the work of (Wang et al., 2021) , who studied the p-stable sketch that estimates the L p norm for p ∈ (0, 2] (Indyk, 2006) 1 . (Wang et al., 2021) showed that for p ∈ (0, 1], the p-stable sketch preserves DP, but was unable to show DP for p ∈ (1, 2], even though the general algorithmic approach remains the same. Thus the natural question is whether differential privacy can be guaranteed for an approach that simultaneously estimates the L p norm in the streaming model, for all p. More generally, the family of L p norms are all symmetric norms, which are invariant under sign-flips and coordinate-wise permutations on an input data stream. Symmetric norms thus also include other important families of norms such as the ksupport norms and the top-k norms. In this paper, we show that not only does there exist a differentially private algorithm for the estimation of symmetric norms in the streaming model, but also that there exists an algorithm that privately releases a set of statistics, from which estimates of all (properly parametrized) symmetric norms can be simultaneously computed. To illustrate the difference, suppose we wanted to release approximations of the L p norm of the stream for k different values of p. To guarantee (ε, δ)-DP for the set of k statistics, we would need, by advanced composition, to demand O ε √ k , O δ k -DP from k instances of a single differentially private L p -estimation algorithm, corresponding to the k different values of p. Due to accuracy-privacy tradeoffs, the quality of the estimation will degrade severely as k increases. By comparison, our algorithm releases a single set C of private statistics. By post-processing, we can then estimate the L p norms for k different values of p while only requiring (ε, δ)-DP from C. Hence, our algorithm can simultaneously handle any large number of estimations of symmetric norms without compromising the quality of approximation. Theorem 1.2. There exists a (ε, δ)-differentially private algorithm that outputs a set C, from which the (1 + α)-approximation to any norm with maximum modulus of concentration at most M can be computed, with probability at least 1 -δ. The algorithm uses M 2 • poly 1 α , 1 ε , log n, log 1 δ bits of space. The maximum modulus of concentration of a norm measures the worst-case ratio of the maximum value to the median value of a norm on the L 2 -unit sphere for any restriction of the coordinates and can intuitively quantify the complexity of computing a norm. For example, the L 1 norm is generally "easy" to compute and has maximum modulus of concentration O (log n). We emphasize that prior to our work, there is no algorithm that can handle private symmetric norm estimation, much less simultaneously for all parametrized symmetric norms. Although there is specific analysis for various norm estimation algorithms, e.g., see the discussion on related work in the supplementary material, these algorithms require a specific predetermined norm for their input. Thus a separate private algorithm must be run for each estimation, which increases the overall space. Moreover, for a large number of queries, the privacy parameter will need to be much smaller due to the composition of privacy, and thus to ensure privacy, the utility of each algorithm is provably poor. Our algorithm sidesteps both the space and accuracy problems and is the first and only work to do so. Applications. We briefly describe a number of specific symmetric norms that are handled by Theorem 1.2 and commonly used across various applications in machine learning. We first note the following parameterization of the previously discussed L p norms. Thus our algorithm immediately introduces a differentially private mechanism for the approximation of L p norms that unlike previous work, e.g., (Blocki et al., 2012; Sheffet, 2019; Choi et al., 2020; Smith et al., 2020; Bu et al., 2021; Wang et al., 2021) , does not need to provide separate analysis for specific values of p. Moreover for constant-factor approximation, the space complexity is tight with the optimal L p -approximation algorithms that do not consider privacy, up to polylogarithmic factors (Kane et al., 2010; Li & Woodruff, 2013; Ganguly, 2015; Woodruff & Zhou, 2021) . Definition 1.4 (Q-norm and Q ′ -norm). We call a norm L a Q-norm if there exists a symmetric norm L ′ such that L(x) = L ′ (x 2 ) 1/2 for all x ∈ R n . Here, we use x 2 to denote the coordinate-wise square power of x. We also call a norm L ′ a Q ′ -norm if its dual norm is a Q-norm. The family of Q ′ -norms includes the L p norms for 1 ≤ p ≤ 2, the k-support norm, and the box norm (Bhatia, 2013) and thus Q ′ -norms have been proposed to regularize sparse recovery problems in machine learning. For instance, (Argyriou et al., 2012) showed that Q ′ norms have tighter relaxations than elastic nets and can thus be more effective for sparse prediction. Similarly, (McDonald et al., 2014) used Q ′ norms to optimize sparse prediction algorithms for multitask clustering. Lemma 1.5. (Blasiok et al., 2017)  mmc(L) = O (log n) for every Q ′ -norm L. Theorem 1.2 and Lemma 1.5 thus present a differentially private algorithm for Q ′ -norm approximation that uses polylogarithmic space. Definition 1.6 (Top-k norm). The top-k norm for a vector x ∈ R n is the sum of the largest k coordinates of |x|, where we use |x| to denote the coordinate-wise absolute value of x. The top-k norm is frequently used to understand the more general Ky Fan k-norm (Wu et al., 2014) , which is used to regularize optimization problems in numerical linear algebra. Whereas the Ky Fan k norm is defined as the sum of the k largest singular values of a matrix, the top-k norm is equivalent to the Ky Fan k norm when the input vector x represents the vector of the singular values of the matrix. In particular, the top-k norm for a vector of singular values when k = n is equivalent to the Schatten-1 norm of a matrix, which is a common metric for matrix fitting problems such as low-rank approximation (Li & Woodruff, 2020) .

2. PRELIMINARIES

In this section, we introduce definitions and simple or well-known results from differential privacy, sketching algorithms, and symmetric norms. For notation, we use [n] for an integer n > 0 to denote the set {1, . . . , n}. We also use the notation poly(n) to represent a constant degree polynomial in n and we say an event occurs with high probability if the event holds with probability 1 -1 poly(n) . Similarly, we use polylog(n) to denote poly(log n). Sketching algorithms. Given a frequency vector x ∈ R n on a data stream, the AMS algorithm for L 2 -estimation first generates a sign vector σ ∈ {-1, +1} n and sets S 1 = (⟨σ, x⟩) 2 . The AMS algorithm then repeats this process b = 6 α 2 independent times to obtain dot products S 1 , . . . , S b , sets Z 2 to be the arithmetic mean of S 1 , . . . , S b , and reports Z. We define the L 2 norm of a vector x ∈ R n by L 2 (x) = x 2 1 + . . . + x 2 n . Definition 2.1 (ν-approximate η L 2 -heavy hitters). Given an accuracy parameter ν ∈ (0, 1), a threshold parameter η, and a frequency vector x ∈ R n , compute a set H and a set of approximations x k for all k ∈ H such that: (1) If x k ≥ ηL 2 (x) for any k ∈ [n], then k ∈ H, so that H contains all η L 2 -heavy hitters of x. (2) There exists a universal constant C > 0 so that if x k ≤ Cη 2 L 2 (x) for any k ∈ [n], then k / ∈ H, so that H does not contain any index that is not an Cη 2 L 2 -heavy hitter of x. ( ) If k ∈ H for any k ∈ [n], then compute (1 ± ν)-approximation to the frequency x k , i.e., a value x k such that (1 -ν)x k ≤ x k ≤ (1 + ν)x k . We introduce and use a private variant PRIVCOUNTSKETCH of the well-known COUNTSKETCH algorithm (Charikar et al., 2004 ) by adding noise to each coordinate and then using a standard private threshold routine to ensure differential privacy. Specifically, PRIVCOUNTSKETCH first uses the COUNTSKETCH data structure to obtain an estimated frequency for each coordinate. It then adds Laplacian noise with scale parameter O 1 η 2 ν 2 to each estimated frequency and then acquires a threshold T from the L 2 norm estimation algorithm AMS and releases all coordinates (and estimated frequencies) whose estimated frequencies are at least νηT 2 + X, where X is Laplacian noise with scale parameter O 1 η 2 ν 2 . Then PRIVCOUNTSKETCH gives the following guarantees: Lemma 2.2. There exists a one-pass streaming algorithm PRIVCOUNTSKETCH that takes an accuracy parameter ν ∈ (0, 1) and a threshold parameter η 2 and outputs a list H that contains all indices k ∈ [n] of an underlying frequency vector x with x k ≥ η L 2 (x) and no index k ∈ [n] with x k ≤ η(1 -ν) L 2 (x). For each k ∈ H, PRIVCOUNTSKETCH also reports a estimated fre- quency x k such that (1 -ν)x k -O log m ην ≤ x k ≤ (1 + ν)x k + O log m ην . The algorithm uses O 1 η 2 ν 2 log 2 n bits of space and succeeds with probability 1 - 1 poly(m) . Symmetric norms. We now introduce preliminaries for symmetric norms. Definition 2.3 (Symmetric norm). A function L : R n → R is a symmetric norm if L is a norm and for all x ∈ R n and any vector y ∈ R n that is a permutation of the coordinates of x, we have L(x) = L(y). Moreover, we have L(x) = L(|x|), where |x| is the coordinate-wise absolute value of x. Definition 2.4 (Modulus of concentration). Let x ∈ R n be a random variable drawn from the uniform distribution on the L 2 -unit sphere S n-1 and let b L denote the maximum value of L(x) over S n-1 . The median of a symmetric norm L is the unique value M L such that Pr [L(x) ≥ M L ] ≥ 1 2 and Pr [L(x) ≤ M L ] ≥ 1 2 . Then the ratio mc(L) := b L M L is the modulus of concentration of the norm L. Although the modulus of concentration quantifies the "average" behavior of the norm L on R n , norms with challenging behavior can still be embedded in lower-dimensional subspaces. For instance, the L 1 norm satisfies mc(L) = O (1), but when x ∈ R n has fewer than √ n nonzero coordinates, the norm max(L ∞ (x), L 1 (x)/ √ n) on the unit ball becomes identically L ∞ (x) (Blasiok et al., 2017), which requires Ω( √ n) space (Alon et al., 1999) to estimate. Hence, we further quantify the behavior of a norm L by examining its behavior on all lower dimensions. Definition 2.5 (Maximum modulus of concentration). For a norm L : R n → R and every k ≤ n, define the norm L (k) : R k → R by L (k) ((x 1 , . . . , x k )) := L((x 1 , . . . , x k , 0, . . . , 0)). Then the maximum modulus of concentration of the norm L is mmc(L) := max k≤n mc(L (k) ) = max k≤n b L (k) M L (k) . Definition 2.6 (Important Levels). For x ∈ R n and ξ > 1, we define the level i as the set B i = {k ∈ [n] : ξ i-1 ≤ |x k | ≤ ξ i }. We define b i := |B i | as the size of level i. For β ∈ (0, 1], we say level i is β-important if b i > β j>i b j and b i ξ 2i ≥ β j≤i b j ξ 2j . Informally, level i is β-important if (1) its size is at least a β-fraction of the total sizes of the higher levels and (2) its contribution is roughly a β-fraction of the total contribution of all the lower levels. We would like to show that to approximate a symmetric norm L(x), it suffices to identify the βimportant levels and their sizes for a fixed base ξ > 1. Lemma 2.7. (Blasiok et al., 2017  ) Let s = O (log n). If a level i is β-important, then either ξ 2i ≥ α 2 βε 2 log 2 m F 2 (x) or there exists j ∈ [s] such that b i ≥ 2 j log 2 m α 2 ε 2 and ξ 2i ∈ α 2 βε 2 log 2 m • F2(x) 2 j , α 2 βε 2 log 2 m • F2(x) 2 j-1 . Lemma 2.7 implies that if level i is β-important, then either (1) it will be identified by using PRIVCOUNTSKETCH with threshold α 2 β log 2 m on the stream or (2) its contribution can be well-approximated by using PRIVCOUNTSKETCH with threshold α 2 βε 2 log 2 m on a substream formed by sampling coordinates of the universe with probability 1 2 j . We thus split our algorithm and analysis to handle these cases. In particular, we call a frequency level i "high" if ξ 2i ≥ α 2 βε 2 log 2 m F 2 (x). We call a frequency level i "medium" if ξ 2i ≥ α 2 β ′ ε 2 2 j F 2 (x) > T and b i ≥ O 2 j log 2 m α 2 ε 2 for a certain β ′ > 0 and a threshold T . We call a frequency level i "low" if ξ 2i ≥ α 2 β ′ ε 2 2 j F 2 (x) and b i ≥ O 2 j log 2 m α 2 ε 2 , but T ≥ α 2 β ′ ε 2 2 j F 2 (x).

3. ALGORITHMIC INTUITION AND OVERVIEW

In this section, we give a brief technical overview of both our algorithmic intuition and how our approaches differ from previous (non-private) work. We defer the full proofs and additional discussion of related work to the supplementary material. Our starting point is the L p estimation algorithm of (Indyk & Woodruff, 2005) , which was parametrized by (Blasiok et al., 2017) to handle symmetric norms. For a (1 + α)-approximation, the algorithm partitions the n coordinates of the frequency vector x into powers of ξ-based on their magnitudes, where ξ > 1 is a fixed function of α. Each partition forms a level set, but (Indyk & Woodruff, 2005; Blasiok et al., 2017) showed that it suffices to accurately count the size of each important level set and zero out to the other level sets, where a level set is considered important if its size is large enough to contribute an α 2 log m fraction of the symmetric norm. Private symmetric norm estimation in the centralized setting. To preserve (ε, δ)-differential privacy, one initial approach would be to treat the statistics as a histogram and add Laplacian noise with scale O 1 ε to the frequency of each element. However, the level sets consisting of elements with frequencies between [ξ i , ξ i+1 ) for small i, say i = 0, could be largely perturbed by such Laplacian noise. Fortunately, if i is small, the corresponding level set must contain a large number of elements if it is important, so it seems possible to privately release the size Γ i of the level set. Indeed, we can show that the L 1 sensitivity of the vector corresponding to level set sizes is small and so we can add Laplacian noise with scale O 1 ε to each level set size. Hence if the level set has size Γ i roughly Ω 1 αε , then the Laplacian noise will affect Γ i by a (1 + α)-factor. Unfortunately, there can be level sets that are both important and small in size. For example, if there is a single element with frequency m, then the size of the corresponding level set is just one. Then adding Laplacian noise with scale O 1 ε will severely affect the size of the level set and thus the estimation of the symmetric norm. On the other hand, for m > 1 αε , the frequency of the coordinate is quite large so again it seems like we can just add Laplacian noise with scale O 1 ε and output the noisy frequency of the coordinate. New approach: classifying and separately handling high, medium, and low frequency levels. The main takeaway from these challenges is that we should handle different level sets separately. We partition the levels into three groups after defining thresholds T 1 and T 2 , with T 1 > T 2 . We define the "high frequency levels" as the levels whose coordinates exceed T 1 in frequency. The intuition is that because the high frequency levels have such large magnitude, their frequencies can be well-approximated by running an L 2 -heavy hitters algorithm on the stream S. We define the "medium frequency levels" as the levels whose coordinates are between T 1 and T 2 in frequency. These coordinates are not large enough to be detected by running an L 2 -heavy hitters algorithm on the stream S. However, the sizes of these level sets must be large if the level set is important. Thus there exists a substream S j for which a large number of these coordinates are subsampled and their frequencies can be well-approximated by running an L 2 -heavy hitters algorithm on the substream S j . Finally, we define the "low frequency levels" as the levels whose coordinates are less than T 2 in frequency. These coordinates are small enough that we cannot add Laplacian noise to their frequencies without affecting the level sets they are mapped to. Instead, we show that L 1 sensitivity for the level set estimations is particularly small for the low frequency levels. Thus, for these frequency levels, we report the size of the frequency levels rather than the identities of the heavy-hitters. We remark that if our goal was to just approximate the symmetric norms without preserving differential privacy, then it would suffice to just consider the high and medium frequency levels, since the low frequency levels are particularly problematic when Laplacian noise is added to the frequency vector. We also remark that we only use the thresholds T 1 and T 2 for the purposes of describing our algorithm -in the actual implementation of the algorithm, the thresholds T 1 and T 2 will be implicitly defined by each of the substreams. We summarize our new approach in Figure 1 . Private symmetric norm estimation in the streaming model. Although the previously discussed intuition builds towards a working algorithm, the main caveat is that so far, we have mainly discussed the centralized model, where space is not restricted and so each coordinate and thus each level set size can be counted exactly. In the streaming model, we cannot explicitly track the frequency vector, or even the frequencies of a constant fraction of coordinates. Instead, to estimate the sizes of each level set, (Indyk & Woodruff, 2005 ; Blasiok et al., 2017) take the stream S and form s = O (log n) substreams S 1 , . . . , S s , where the j-th substream is created by sampling the universe of size n at a rate of 1 2 j-1 . Then S j will only consist of the stream updates to the particular coordinates of x that are sampled. Thus in expectation, the frequency vector induced by S j will have sparsity ∥x∥0 2 j-1 . Similarly, if a level set i has size Γ i , then Γi 2 j-1 of its members will be sampled in S j in expectation. It can then be shown through a variance argument that if level set i is important, then there exists an explicit substream j from which Γ i can be well-approximated using the L 2 -heavy hitter algorithm COUNTSKETCH and as a result, the symmetric norm of x can be well-approximated. The main point of the subsampling approach is that if there exists a level set with large size consisting of small coordinates, then the coordinates will not be detected by the COUNTSKETCH on S, but because S j has significantly smaller L 2 norm, then the coordinates will be detected by COUNTSKETCH on S j . However, adapting the subsampling and heavy-hitter approach introduces additional challenges for privacy. For instance, we can analyze the L 2 -heavy hitter algorithm COUNTSKETCH and show that although the L 1 sensitivity of the estimated frequency for a single coordinate is small, the L 1 sensitivity of the estimated frequency for all the coordinates is large. Instead, we use the view that COUNTSKETCH is a composition function that first only estimates frequencies for the top poly 1 α , 1 ε , log n and then outputs only those estimates that are above a certain threshold. Similarly, the Laplacian noise added to privately use COUNTSKETCH can alter the sizes of a significant number of level sets for small coordinates. Thus for the small coordinates (corresponding to the substreams S j with large j), we invoke COUNTSKETCH with much higher accuracy, so that with high probability, it will return exactly the frequencies for the small coordinates. For example, note that if the frequency f k of a coordinate k ∈ [n] is at most 1 2α 2 ε , then any (1 + α 2 ε)-approximation to f k can be rounded to exactly recover f k . This decreases the L 1 sensitivity of the vector of estimated level set sizes, therefore allowing us to add Laplacian noise without greatly affecting the quality of approximation.

4. PRIVATE SYMMETRIC NORM ESTIMATION ALGORITHM

In this section, we give our algorithm that releases a set of private statistics from which an arbitrary number of symmetric norms can be well-approximated. In particular, recall that it suffices to approximate the sizes of the important levels and identity the non-important levels, so that their contributions can be set to zero.

4.1. RECOVERY OF HIGH FREQUENCY LEVELS

As a warm-up, we describe our algorithm for recovering the high frequency levels, whose coordinates have sufficiently large magnitude and thus their frequencies can be well-approximated by running an L 2 -heavy hitters algorithm on the stream S. Moreover, with high probability, adding Laplacian noise will not affect the level sets because the frequencies are so large. Thus it simply suffices to return the noisy estimated frequencies of each of the elements in the high frequency levels. This algorithm is the simplest of our cases and we give the algorithm in full in Algorithm 1. Algorithm 1 Algorithm to privately estimate the high levels Input: Privacy parameter ε > 0, accuracy parameter α ∈ (0, 1) Output: Private estimation of the frequencies of the coordinates of the high frequency levels 1: β ← O α 5 mmc(L) 2 log 5 m , β ′ ← O α 2 βε 2 log 2 m 2: Run PRIVCOUNTSKETCH on the stream S with threshold α 2 β ′ and failure probability Let f k be the frequency estimated by PRIVCOUNTSKETCH 5: x k ← x k + Lap 8 β ′ ε 6: return x k We first show that coordinates in high frequency levels are identified and their frequencies are accurately estimated and similarly that if a coordinate does not have high frequency, it will not be output by Algorithm 1. Lemma 4.1. Suppose x 2 k ≥ α 2 βε 2 log 2 m F 2 (x) and m = Ω(log 5 m) α 5 β 2 ε 5 . Then with high probability, Algo- rithm 1 outputs x k such that (1 -α 2 )x k ≤ x k ≤ x k . On the other hand if x 2 k < α 2 βε 2 2 log 2 m F 2 (x) , then with high probability, Algorithm 1 outputs x k such that x k < 3α 2 βε 2 4 log 2 m F 2 (x). We now justify the privacy and space complexity of Algorithm 1. Lemma 4.2. Algorithm 1 is ε 4 , δ 4 -differentially private for δ = 1 poly(m) and uses space mmc(L) 2 • poly 1 α , 1 ε , log m .

4.2. RECOVERY OF MEDIUM FREQUENCY LEVELS

In this section, we describe our algorithm for recovering the medium frequency levels, whose coordinates do not have sufficiently large magnitude to be detected by running an L 2 -heavy hitters algorithm on the stream S, but have sufficiently large size, so that there exists some j ∈ [s] across the s subsampling levels such that the coordinates can be detected by running an L 2 -heavy hitters algorithm on the stream S j . On the other hand, their magnitudes are sufficiently large so that with high probability, adding Laplacian noise will not affect the level sets. We give the algorithm in full in Algorithm 2. We first upper bound the second frequency moment (and hence the L 2 norm) of each substream. This is necessary because we want to detect the coordinates of the medium frequency levels as L 2 -heavy hitters for each substream, but if the substream has overwhelmingly large L 2 norm, then we will not be able to find coordinates of the medium frequency levels. However, it may not be true that F 2 (S j ) is significantly smaller than F 2 (S) with high probability. For example, if there were a single large element, then the probability it is sampled at level s is 1 2 s , which is roughly 1 n > 1 poly(m) . Instead, we note that PRIVCOUNTSKETCH benefits from the stronger tail guarantee, which states that not only does PRIVCOUNTSKETCH with threshold η < 1 detect the elements k such that (f k ) 2 ≥ ηF 2 (S), but it also detects the elements k such that (f k ) 2 ≥ ηF 2 (S tail(1/η) ), where S tail(1/η) is the frequency vector f induced by S, with the largest 1 η entries instead set to zero (Braverman et al., 2017; 2018a) . Lemma 4.3. With high probability, F 2 ((S j ) 1/(α 2 β ′ ε 2 ) ) ≤ 200 log m 2 j F 2 (x) for all j ∈ [s]. Algorithm 2 Algorithm to privately estimate the medium levels Input: Privacy parameter ε > 0, accuracy parameter α ∈ (0, 1) Output: Private estimations of the sizes of the medium frequency levels 1: β ← O α 5 mmc(L) 2 log 5 m , β ′ ← O α 3 βε 2 log 2 m , ξ ← (1 + O (ε)) 2: γ ← (1/2, 1) uniformly at random, ℓ ← log ξ (2m) , s ← O (log n) 3: for j ∈ [s] with 2 j > log n β ′ αε do 4: Form stream S j by sampling elements of [n] with probability 1 2 j 5: Run PRIVCOUNTSKETCH j on stream S j with threshold α 2 β ′ ε 2 and failure probability  x k ← x k + Lap 8 β ′ ε 10: for i ∈ [ℓ] with m 2 2 j+1 > γξ 2i ≥ 2 j > O log n β ′ α 2 ε do 11: Let b i be the number of indices k ∈ [n] such that γξ 2i ≤ x k < γξ 2i+2 12: b i ← 2 j (1+O(α)) b i 13: return b i We now show that conditioned on the event that the L 2 norm of the subsampled streams are not too large, then we can well-approximate the frequency of any coordinate of the medium frequency levels, provided that they are sampled in the substream. Lemma 4.4. Suppose i is a β-important level and k ∈ [n] is in level i, so that x k ∈ [ξ i , ξ i+1 ). If F 2 ((S j ) 1/(α 2 β ′ ε 2 ) ) ≤ 200 log m 2 j F 2 (x) for all j ∈ [s], then k is sampled in stream S j with 2 j > log n β ′ αε , then with high probability, Algorithm 2 outputs x k such that (1 -α 2 )x k ≤ x k ≤ x k . Unfortunately, Lemma 4.4 only provides guarantees for the coordinates of the medium frequency levels that are sampled. Thus, we still need to use Lemma 4.4 to show that a good estimator to the sizes of the medium frequency levels can be obtained from the estimates of the coordinates of the medium frequency levels that are sampled. In particular, we show that rescaling the empirical sizes of the medium frequency levels forms a good estimator to the actual sizes of the medium frequency levels. Lemma 4.5. Consider a β-important level i with ξ 2i ∈ βα 2 ε 2 log 2 m • F2(x) 2 j , βα 2 ε 2 log 2 m • F2(x) 2 j-1 for some integer j > 0 and ξ i > log n β ′ αε . If F 2 ((S j ) 1/(α 2 β ′ ε 2 ) ) ≤ 200 log m 2 j F 2 (x) for all j ∈ [s], then k is sampled in stream S j with 2 j >, then with high probability, Algorithm 2 outputs b i such that (1 -O (α))b i ≤ b i ≤ b i , where b i is the size of level i. We now analyze the priavcy and the space complexity of Algorithm 2 Lemma 4.6. Algorithm 2 is ε 4 , δ 4 -differentially private for δ = 1 poly(m) and uses space mmc(L) 2 • poly 1 α , 1 ε , log m .

4.3. RECOVERY OF LOW FREQUENCY LEVELS

In this section, we describe our algorithm for recovering the low frequency levels, whose coordinates have magnitude small enough that we cannot add Laplacian noise to their frequencies without affecting the corresponding level set sizes. We instead report the sizes of the level sets for the low frequency levels rather than the identities and approximate frequencies of the heavy-hitters. Thus we must add Laplacian noise to the sizes of the level sets; we show that L 1 sensitivity for the level set estimations is particularly small for the low frequency levels and thus the Laplacian noise does not greatly affect the estimates of the level set sizes. We note that this approach does not work for the high frequency levels because the high frequency levels may have small level set sizes, so that adding Laplacian noise to the sizes can significantly affect the resulting estimates of the level set sizes. Similarly, it is more challenging to argue the low L 1 sensitivity for the level set estimations for the medium frequency levels. Hence, both the algorithm and analysis are especially well-catered to the low frequency levels. We give the algorithm in full in Algorithm 3.

Algorithm 3 Algorithm to privately estimate the low levels

Input: Privacy parameter ε > 0, accuracy parameter α ∈ (0, 1) Output: Private estimations of the sizes of the low frequency levels 1: β ← O α 5 mmc(L) 2 log 5 m , β ′ ← O α 2 βε log n , ξ ← (1 + O (ε)) 2: γ ← (1/2, 1) uniformly at random, ℓ ← log ξ (2m) , s ← O (log n) 3: for j ∈ [s] with 2 j ≤ log n β ′ αε do 4: Form stream S j by sampling elements of [n] with probability 1 2 j 5: Run PRIVCOUNTSKETCH j on stream S j with threshold β ′′ := O β ′ α 2 ε 3 log 2 n 6: for each heavy-hitter k ∈ [n] reported by PRIVCOUNTSKETCH j do We first show that the estimates of the level set sizes for the low frequency levels are accurate. Lemma 4.7. Consider a β-important level i with ξ 2i ∈ βα 2 ε 2 log 2 m • F2(x) 2 j , βα 2 ε 2 log 2 m • F2(x) 2 j-1 for some integer j > 0 and ξ i ≤ log n β ′ αε . If F 2 ((S j ) 1/(α 2 β ′ ε 2 ) ) ≤ 200 log m 2 j F 2 (x) for all j ∈ [s], then k is sampled in stream S j with 2 j >, then with high probability, Algorithm 3 outputs b i such that 1 -O (α))b i ≤ b i ≤ b i , where b i is the size of level set i. We then argue the privacy and space complexity of Algorithm 3. Lemma 4.8. Algorithm 3 is ε 4 , δ 4 -differentially private for δ = 1 poly(m) and uses space mmc(L) 2 • poly 1 α , 1 ε , log m .

4.4. PUTTING THINGS TOGETHER

We would like to combine the subroutines from the previous sections to output a private dataset for symmetric norm estimation. Thus it remains to describe how to privately partition the coordinates into the high, medium, and low frequency levels. To that end, we remark that although PRIVCOUNTSKETCH actually provides an estimated frequency for each coordinate, for our purposes, we only need estimated frequencies for the L 2 -heavy hitters and there are at most K := O 1 η 2 possible L 2 -heavy hitters with whichever threshold η that we choose, e.g., η = α 2 β ′ in Algorithm 1. Thus it suffices to observe that we can privately partition the coordinates into the high, medium, and low frequency levels by first privately outputting the top K estimated frequencies and then partitioning the coordinates according to their noisy estimated frequencies, which can be viewed as post-processing. In particular, (Qiao et al., 2021) observes that it suffices to add Laplacian noise with scale 8 ηε to each of the frequencies and then outputting the top K noisy estimated frequencies to achieve ε 4 -differential privacy. We now finally put together the results from the previous sections to show the following result. In particular, correctness follows from applying Lemma 2.7 to Lemma 4.1, Lemma 4.5, and Lemma 4.7, while privacy and the space complexity follow from Lemma 4.2, Lemma 4.6, and Lemma 4.8. Theorem 4.9. There exists a (ε, δ)-differentially private algorithm that outputs a set C, for δ = 1 poly(m) . From C, the (1 + α)-approximation to any norm with maximum modulus of concentration at most M can be computed, with probability at least 1 -δ. The algorithm uses M 2 • poly 1 α , 1 ε , log m bits of space.



Lp for p ∈ (0, 1) does not satisfy the triangle inequality and therefore is not a norm, but is still welldefined/well-motivated and can be computed



(Milman & Schechtman, 2009; Klartag & Vershynin, 2007)  For L p norms, we have that mmc(L) = O (log n) for p ∈ [1, 2] and mmc(L) = O n 1/2-1/p for p > 2.

(Blasiok et al., 2017) mmc(L) = Õ n k for the top-k norm L.

for each heavy-hitter k ∈ [n] reported by PRIVCOUNTSKETCH do 4:

Let x k be the frequency estimated by PRIVCOUNTSKETCH j 8:for i ∈ [ℓ] with O log n β ′ α 2 ε ≥ 2 j+1 > γξ 2i ≥ 2 j do 9: Let b i be the number of indices k ∈ [n] such that γξ 2i ≤ x k < γξ 2i+2 10: b i ← 2 j (1+O(α)) b i + Lap 8

Illustration of separate handling of the high, medium, and low level sets.

