FAIR DIFFERENTIAL PRIVACY CAN MITIGATE THE DISPARATE IMPACT ON MODEL ACCURACY Anonymous authors Paper under double-blind review

Abstract

The techniques based on the theory of differential privacy (DP) has become a standard building block in the machine learning community. DP training mechanisms offer strong guarantees that an adversary cannot determine with high confidence about the training data based on analyzing the released model, let alone any details of the instances. However, DP may disproportionately affect the underrepresented and relatively complicated classes. That is, the reduction in utility is unequal for each class. This paper proposes a fair differential privacy algorithm (FairDP) to mitigate the disparate impact on each class's model accuracy. We cast the learning procedure as a bilevel programming problem, which integrates differential privacy with fairness. FairDP establishes a self-adaptive DP mechanism and dynamically adjusts instance influence in each class depending on the theoretical bias-variance bound. Our experimental evaluation shows the effectiveness of FairDP in mitigating the disparate impact on model accuracy among the classes on several benchmark datasets and scenarios ranging from text to vision.

1. INTRODUCTION

Protecting data privacy is a significant concern in many data-driven decision-making applications (Zhu et al., 2017) , such as social networking service, recommender system, location-based service. For example, the United States Census Bureau will firstly employ differential privacy to the 2020 census data (Bureau, 2020) . Differential privacy (DP) guarantees that the released model cannot be exploited by attackers to derive whether one particular instance is present or absent in the training dataset (Dwork et al., 2006) . However, DP intentionally restricts the instance influence and introduces noise into the learning procedure. When we enforce DP to a model, DP may amplify the discriminative effect towards the underrepresented and relatively complicated classes (Bagdasaryan et al., 2019; Du et al., 2020; Jaiswal & Provost, 2020) . That is, reduction in accuracy from nonprivate learning to private learning may be uneven for each class. There are several empirical studies on utility reduction: (Bagdasaryan et al., 2019; Du et al., 2020) show that the model accuracy in private learning tends to decrease more on classes that already have lower accuracy in non-private learning. (Jaiswal & Provost, 2020) shows different observations that the inequality in accuracy is not consistent for classes across multiple setups and datasets. It needs to be cautionary that although private learning improves individual participants' security, the model performance should not harm one class more than others. The machine learning model, specifically in supervised learning tasks, outputs a hypothesis f (x; θ) parameterized by θ, which predicts the label y given the unprotected attributes x. Each instance's label y belongs to a class k. The model aims to minimize the objective (loss) function L(θ; x, y), i.e., θ * := arg min θ E [L(θ; x, y)] . Our work builds on a recent advance in machine learning models' training that uses the differentially private mechanism, i.e., DPSGD (Abadi et al., 2016) for releasing model. The key idea can be extended to other DP mechanisms with the specialized noise form (generally Laplacian or Gaussian distribution). The iterative update scheme of DPSGD at the (t + 1)-th iteration is of the form where n and µ t denote the batch size and step-size (learning rate) respectively; S t denotes the randomly chosen instance set; the vector 1 denotes the vector filled with scalar value one; and g t (x i ) denotes the gradient of the loss function in (1) at iteration t, i.e., ∇L(y i ; θ t , x i ). The two key operations of DPSGD are: i) clipping each gradient g t (x i ) in 2 -norm based on the threshold parameter C; ii) adding noise ξ drawn from Gaussian distribution N (0, σ 2 C 2 ) with a variance of noise scale σ and the clipping threshold parameter C. These operations enable training machine learning models with non-convex objectives at a manageable privacy cost. Based on the result of traditional SGD, we theoretically analyze the sufficient decrease type scheme of DPSGD, i.e., θt+1 = θt -µ t • 1 n i∈S t g t (x i ) max(1, g t (xi) 2 C ) + ξ1 , E f (θ t+1 ) f (θ t ) + E ∇f (θ t ), θ t+1 -θ t + L 2 E θ t+1 -θ t 2 + τ (C, σ; θ t ), where the last term τ (C, σ; θ t ) denotes the gap of loss expectation compared with ideal SGD at this (t + 1)-th iteration, and related with parameters C, and σ. The term τ (C, σ; θ), which can be called bias-variance term, can be calculated mathematically as 2(1 + 1 µ t L ) ∇f (θ) • η + η 2 Clipping bias + 1 n 2 • σ 2 C 2 |1| Noise variance , where L denotes the Lipschitz constant of f ; |1| denotes the vector dimension; and we have η := 1 n I g t (x i ) >C g t (x i ) -C , where I g t (xi) >C denotes the cardinality number of satisfying g t (x i ) > C. The detailed proof of ( 3) and ( 4) can be found in Appendix A. τ (C, σ) is consist of clipping bias and noise variance terms, which means the amount that the private gradient differs from the non-private gradient due to the influence truncation and depending on the scale of the noise respectively. As a result, we call τ (C, σ) the bias-variance term. As underrepresented class instances or complicated instances manifest differently from common instances, a uniform threshold parameter C may incur significant accuracy disparate for different classes. In Figure 1 (a), we employ DPSGD (Abadi et al., 2016) 



τ (C, σ; θ t ) vs. accuracy.

Figure 1: Effect of clipping and noise in differentially private mechanism and τ (C, σ; θ t ) on MNIST dataset

Further, Figure 1(b)  shows the classification accuracy of different sub-classes for τ (C, σ; θ) on the unbalanced MNIST dataset. Larger bias-variance term τ (C, σ; θ) (determined by C and σ) results in more serious accuracy bias on different classes, while similar results are also shown in(Bagdasaryan et al., 2019; Du et al., 2020; Jaiswal & Provost,

on the unbalanced MNIST dataset(Bagdasaryan et al., 2019)  to numerical study the inequality of utility loss (i.e., the prediction accuracy gap between private model and non-private model) caused by differential privacy. For the unbalanced MNIST dataset, the underrepresented class (Class 8) has significantly larger utility loss than the other classes (e.g., Class 2) in the private model. DPSGD results in a 6.74% decrease in accuracy on the well-represented classes, but accuracy on the underrepresented class drops 74.16%. Training with more epochs does not reduce this gap while exhausting the privacy budget. DPSGD obviously introduces negative discrimination against the underrepresented class (which already has lower accuracy in the non-private SGD model).

