DIFFERENTIALLY PRIVATE L 2 -HEAVY HITTERS IN THE SLIDING WINDOW MODEL

Abstract

The data management of large companies often prioritize more recent data, as a source of higher accuracy prediction than outdated data. For example, the Facebook data policy retains user search histories for 6 months while the Google data retention policy states that browser information may be stored for up to 9 months. These policies are captured by the sliding window model, in which only the most recent W statistics form the underlying dataset. In this paper, we consider the problem of privately releasing the L 2 -heavy hitters in the sliding window model, which include L p -heavy hitters for p ≤ 2 and in some sense are the strongest possible guarantees that can be achieved using polylogarithmic space, but cannot be handled by existing techniques due to the sub-additivity of the L 2 norm. Moreover, existing non-private sliding window algorithms use the smooth histogram framework, which has high sensitivity. To overcome these barriers, we introduce the first differentially private algorithm for L 2 -heavy hitters in the sliding window model by initiating a number of L 2 -heavy hitter algorithms across the stream with significantly lower threshold. Similarly, we augment the algorithms with an approximate frequency tracking algorithm with significantly higher accuracy. We then use smooth sensitivity and statistical distance arguments to show that we can add noise proportional to an estimation of the L 2 norm. To the best of our knowledge, our techniques are the first to privately release statistics that are related to a sub-additive function in the sliding window model, and may be of independent interest to future differentially private algorithmic design in the sliding window model.

1. INTRODUCTION

Differential privacy (Dwork, 2006; Dwork et al., 2016) has emerged as the standard for privacy in both the research and industrial communities. For example, Google Chrome uses RAPPOR (Erlingsson et al., 2014) to collect user statistics such as the default homepage of the browser or the default search engine, etc., Samsung proposed a similar mechanism to collect numerical answers such as the time of usage and battery volume (Nguyên et al., 2016) , and Apple uses a differentially private method (Greenberg, 2016) to generate predictions of spellings. The age of collected data can significantly impact its relevance to predicting future patterns, as the behavior of groups or individuals may significantly change over time due to either cyclical, temporary, or permanent change. Indeed, recent data is often a more accurate predictor than older data across multiple sources of big data, such as stock markets or Census data, a concept which is often reflected through the data management of large companies. For example, the Facebook data policy (Facebook) retains user search histories for 6 months, the Apple differential privacy (Upadhyay, 2019) states that collected data is retained for 3 months, the Google data retention policy states that browser information may be stored for up to 9 months (Google), and more generally, large data collection agencies often perform analysis and release statistics on time-bounded data. However, since large data collection agencies often manage highly sensitive data, the statistics must be released in a way that does not compromise privacy. Thus in this paper, we study the (event-level) differentially private release of statistics of time-bounded data that only use space sublinear in the size of the data. Definition 1.1 (Differential privacy (Dwork et al., 2016) ). Given ε > 0 and δ ∈ (0, 1), a randomized algorithm A operating on datastreams is (ε, δ)-differentially private if, for every pair of neighboring datasets S and S ′ and for all sets E of possible outputs, we have, Pr [A(S) ∈ E] ≤ e ε • Pr [A(S ′ ) ∈ E] + δ. In the popular streaming model of computation, elements of an underlying dataset arrive one-by-one but the entire dataset is considered too large to store; thus algorithms are restricted to using space sublinear in the size of the data. Although the streaming model provides a theoretical means to handle big data and has been studied thoroughly for applications in privacy-preserving data analysis, e.g., (Mir et al., 2011; Blocki et al., 2012; Joseph et al., 2020; Huang et al., 2022; Dinur et al., 2023) and adaptive data analysis, e.g., (Avdiukhin et al., 2019; Ben-Eliezer et al., 2022b; Hassidim et al., 2020; Braverman et al., 2021a; Chakrabarti et al., 2022; Ajtai et al., 2022; Beimel et al., 2022; Ben-Eliezer et al., 2022a; Attias et al., 2023) , it does not properly capture the ability to prioritize more recent data, which is a desirable quality for data summarization. The time decay model (Cohen & Strauss, 2006; Kopelowitz & Porat, 2008; Su et al., 2018; Braverman et al., 2019) emphasizes more recent data by assigning a polynomially decaying or exponentially decaying weight to "older" data points, but these functions cannot capture the zero-one property when data older than a certain age is completely deleted. The sliding window model. By contrast, the sliding window model takes a large data stream as an input and only focuses on the updates past a certain point in time by implicitly defining the underlying dataset through the most recent W updates of the stream, where W > 0 is the window parameter. Specifically, given a stream u 1 , . . . , u m such that u i ∈ [n] for all i ∈ [m] and a parameter W > 0 that we assume satisfies W ≤ m without loss of generality, the underlying dataset is a frequency vector f ∈ R n induced by the last W updates of the stream u m-W +1 , . . . , u m so that f k = |{i : u i = k}|, for all k ∈ [n]. Then the goal is to output a private approximation to the frequency f k of each heavy-hitter, i.e., the indices k ∈ [n] for which f k ≥ αL p (f ), which denotes the L p norm of f for a parameter p ≥ 1, L p (f ) = ∥f ∥ p = ( n i=1 f p i ) 1/p . In this case, we say that streams S and S ′ are neighboring if there exists a single update i ∈ [m] such that u i ̸ = u ′ i , where u 1 , . . . , u m are the updates of S and u ′ 1 , . . . , u ′ m are the updates of S ′ . Note that if k is an L 1 -heavy hitter, i.e., a heavy-hitter with respect to L 1 (f ), then f k ≥ αL 1 (f ) so that f k ≥ α ( n i=1 f i ) ≥ α n i=1 f 2 i 1/2 , and k is also an L 2 -heavy hitter. Thus, any L 2 -heavy hitter algorithm will also report the L 1 -heavy hitters, but the converse is not always true. Indeed, for the Yahoo! password frequency corpus (Blocki et al., 2016) (n ≈ 70 million) with heavy-hitter threshold α = 1 500 there were 3, 972 L 2 -heavy hitters, but only one L 1 -heavy hitter. On the other hand, finding L p -heavy hitters for p > 2 requires Ω(n 1-2/p ) space (Chakrabarti et al., 2003; Bar-Yossef et al., 2004) , so in some sense, the L 2 -heavy hitters are the best we can hope to find using polylogarithmic space. Although there is a large and active line of work in the sliding window model (Datar et al., 2002; Braverman & Ostrovsky, 2007; Braverman et al., 2014; 2016; 2018; 2020; Borassi et al., 2020; Woodruff & Zhou, 2021; Braverman et al., 2021b; Jayaram et al., 2022) , there is surprisingly little work in the sliding window model that considers differential privacy (Upadhyay, 2019; Upadhyay & Upadhyay, 2021).

1.1. OUR CONTRIBUTIONS

In this paper, we consider the problem of privately releasing approximate frequencies for the heavyhitters in a dataset defined by the sliding window model. We give the first differentially private algorithm for approximating the frequencies of the L 2 -heavy hitters in the sliding window model. Theorem 1.2. For any α ∈ (0, 1), c > 0, window parameter W on a stream of length m that induces a frequency vector f ∈ R n in the sliding window model, and privacy parameter ε > 1000 log m α 3 √ W , there exists an algorithm such that: (1) (Privacy) The algorithm is (ε, δ)-differentially private for δ = 1 m c . (2) (Heavy-hitters) With probability at least 1 -1 m c , the algorithm outputs a list L such that k ∈ L for each k ∈ [n] with f k ≥ α L 2 (f ) and j / ∈ L for each j ∈ [n] with f j ≤ α 2 L 2 (f ). (3) (Accuracy) With probability at least 1 -1 m c , we simultaneously have |f k -f k | ≤ α 4 L 2 (f ) for all k ∈ L, where f k denotes the noisy approximation of f k output by the algorithm.

