REGRESSION WITH LABEL DIFFERENTIAL PRIVACY *

Abstract

We study the task of training regression models with the guarantee of label differential privacy (DP). Based on a global prior distribution on label values, which could be obtained privately, we derive a label DP randomization mechanism that is optimal under a given regression loss function. We prove that the optimal mechanism takes the form of a "randomized response on bins", and propose an efficient algorithm for finding the optimal bin values. We carry out a thorough experimental evaluation on several datasets demonstrating the efficacy of our algorithm.

1. INTRODUCTION

In recent years, differential privacy (DP, Dwork et al., 2006a; b) has emerged as a popular notion of user privacy in machine learning (ML) . On a high level, it guarantees that the output model weights remain statistically indistinguishable when any single training example is arbitrarily modified. Numerous DP training algorithms have been proposed, with open-source libraries tightly integrated in popular ML frameworks such as TensorFlow Privacy (Radebaugh & Erlingsson, 2019) and PyTorch Opacus (Yousefpour et al., 2021) . In the context of supervised ML, a training example consists of input features and a target label. While many existing research works focus on protecting both features and labels (e.g., Abadi et al. (2016) ), there are also some important scenarios where the input features are already known to the adversary, and thus protecting the privacy of the features is not needed. A canonical example arises from computational advertising where the features are known to one website (a publisher), whereas the conversion events, i.e., the labels, are known to another website (the advertiser).foot_0 Thus, from the first website's perspective, only the labels can be treated as unknown and private. This motivates the study of label DP algorithms, where the statistical indistinguishability is required only when the label of a single example is modified. 2 The study of this model goes back at least to the work of Chaudhuri & Hsu (2011) . Recently, several works including (Ghazi et al., 2021a; Malek Esmaeili et al., 2021) studied label DP deep learning algorithms for classification objectives. Our Contributions. In this work, we study label DP for regression tasks. We provide a new algorithm that, given a global prior distribution (which, if unknown, could be estimated privately), derives a label DP mechanism that is optimal under a given objective loss function. We provide an explicit characterization of the optimal mechanism for a broad family of objective functions including the most commonly used regression losses such as the Poisson log loss, the mean square error, and the mean absolute error. More specifically, we show that the optimal mechanism belongs to a class of randomized response on bins (Algorithm 1). We show this by writing the optimization problem as a linear program (LP), and characterizing its optimum. With this characterization in mind, it suffices for us to compute the * Authors in alphabetical order. Email: {badih.ghazi, ravi.k53}@gmail.com, {pritishk, ethanleeman, pasin, avaradar, chiyuan}@google.com M(y 1 , …, y n ) ε-DP Trained ε-LabelDP model Labels Party y 1 , …, y n Step 1 Step 2 Features Party x 1 , …, x n optimal mechanism among the class of randomized response on bins. We then provide an efficient algorithm for this task, based on dynamic programming (Algorithm 2). In practice, a prior distribution on the labels is not always available. This leads to our two-step algorithm (Algorithm 3) where we first use a portion of the privacy budget to build an approximate histogram of the labels, and then feed this approximate histogram as a prior into the optimization algorithm in the second step; this step would use the remaining privacy budget. We show that as the number of samples grows, this two-step algorithm yields an expected loss (between the privatized label and the raw label) which is arbitrarily close to the expected loss of the optimal local DP mechanism. (We give a quantitative bound on the convergence rate.) Our two-step algorithm can be naturally deployed in the two-party learning setting where each example is vertically partitioned with one party holding the features and the other party holding the (sensitive) labels. The algorithm is in fact one-way, requiring a single message to be communicated from the labels party to the features party, and we require that this one-way communication satisfies (label) DP. We refer to this setting, which is depicted in Figure 1 , as feature-oblivious label DP. We evaluate our algorithm on three datasets: the 1940 US Census IPUMS dataset, the Criteo Sponsored Search Conversion dataset, and a proprietary app install ads dataset from a commercial mobile app store. We compare our algorithm to several baselines, and demonstrate that it achieves higher utility across all test privacy budgets, with significantly lower test errors for the high privacy regime. For example, for privacy budget ε = 0.5, comparing to the best baseline methods, the test MSE for our algorithm is ∼ 1.5× smaller on the Criteo and US Census datasets, and the relative test error is ∼ 5× smaller on the app ads dataset. Organization. In Section 2, we recall some basics of DP and learning theory, and define the feature-oblivious label DP setting in which our algorithm can be implemented. Our label DP algorithm for regression objectives is presented in Section 3. Our experimental evaluation and results are described in Section 4. A brief overview of related work appears in Section 5. We conclude with some interesting future directions in Section 6. Most proofs are deferred to the Appendix (along with additional experimental details and background material).

2. PRELIMINARIES

We consider the standard setting of supervised learning, where we have a set of examples of the form (x, y) ∈ X × Y, drawn from some unknown distribution D and we wish to learn a predictor f θ (parameterized by θ) to minimize L(f θ ) := E (x,y)∼D (f θ (x), y), for some loss function : R × Y → R ≥0 ; we will consider the case where Y ⊆ R. Some common loss functions include the zeroone loss 0-1 (ỹ, y) := 1[ỹ = y], the logistic loss log (ỹ, y) := log(1+e -ỹy ) for binary classification, and the squared loss sq (ỹ, y) := 1 2 (ỹ-y) 2 , the absolute-value loss abs (ỹ, y) := |ỹ-y|, the Poisson log loss Poi (ỹ, y) := ỹ -y • log(ỹ) for regression. This paper focuses on the regression setting. However, we wish to perform this learning with differential privacy (DP). We start by recalling the definition of DP, which can be applied to any notion of adjacent pairs of datasets. For an overview of DP, we refer the reader to the book of Dwork & Roth (2014) . Definition 1 (DP; Dwork et al. (2006b) ). Let ε be a positive real number. A randomized algorithm A taking as input a dataset is said to be ε-differentially private (denoted ε-DP) if for any two adjacent datasets X and X , and any subset S of outputs of A, we have Pr[A(X) ∈ S] ≤ e ε •Pr[A(X ) ∈ S]. In supervised learning applications, the input to a DP algorithm is the training dataset (i.e., a set of labeled examples) and the output is the description of a predictor (e.g., the weights of the trained



A similar use case is in mobile advertising, where websites are replaced by apps. We note that this label DP setting is particularly timely and relevant for ad attribution and conversion measurement given the deprecation of third-party cookies by several browsers and platforms(Wilander, 2020; Wood, 2019; Schuh, 2020).



Figure 1: Learning with feature-oblivious label DP.

