ON DYNAMIC NOISE INFLUENCE IN DIFFERENTIAL PRIVATE LEARNING

Abstract

Protecting privacy in learning while maintaining the model performance has become increasingly critical in many applications that involve sensitive data. Private Gradient Descent (PGD) is a commonly used private learning framework, which noises gradients based on the Differential Privacy protocol. Recent studies show that dynamic privacy schedules of decreasing noise magnitudes can improve loss at the final iteration, and yet theoretical understandings of the effectiveness of such schedules and their connections to optimization algorithms remain limited. In this paper, we provide comprehensive analysis of noise influence in dynamic privacy schedules to answer these critical questions. We first present a dynamic noise schedule minimizing the utility upper bound of PGD, and show how the noise influence from each optimization step collectively impacts utility of the final model. Our study also reveals how impacts from dynamic noise influence change when momentum is used. We empirically show the connection exists for general non-convex losses, and the influence is greatly impacted by the loss curvature.

1. INTRODUCTION

In the era of big data, privacy protection in machine learning systems is becoming a crucial topic as increasing personal data involved in training models (Dwork et al., 2020) and the presence of malicious attackers (Shokri et al., 2017; Fredrikson et al., 2015) . In response to the growing demand, differential-private (DP) machine learning (Dwork et al., 2006) provides a computational framework for privacy protection and has been widely studied in various settings, including both convex and non-convex optimization (Wang et al., 2017; 2019; Jain et al., 2019) . One widely used procedure for privacy-preserving learning is the (Differentially) Private Gradient Descent (PGD) (Bassily et al., 2014; Abadi et al., 2016) . A typical gradient descent procedure updates its model by the gradients of losses evaluated on the training data. When the data is sensitive, the gradients should be privatized to prevent excess privacy leakage. The PGD privatizes a gradient by adding controlled noise. As such, the models from PGD is expected to have a lower utility as compared to those from unprotected algorithms. In the cases where strict privacy control is exercised, or equivalently, a tight privacy budget, accumulating effects from highly-noised gradients may lead to unacceptable model performance. It is thus critical to design effective privatization procedures for PGD to maintain a great balance between utility and privacy. Recent years witnessed a promising privatization direction that studies how to dynamically adjust the privacy-protecting noise during the learning process, i.e., dynamic privacy schedules, to boost utility under a specific privacy budget. One example is (Lee & Kifer, 2018) , which reduced the noise magnitude when the loss does not decrease, due to the observation that the gradients become very small when approaching convergence, and a static noise scale will overwhelm these gradients. Another example is (Yu et al., 2019) , which periodically decreased the magnitude following a predefined strategy, e.g., exponential decaying or step decaying. Both approaches confirmed the empirically advantages of decreasing noise magnitudes. Intuitively, the dynamic mechanism may coordinate with certain properties of the learning task, e.g., training data and loss surface. Yet there is no theoretical analysis available and two important questions remain unanswered: 1) What is the form of utility-preferred noise schedules? 2) When and to what extent such schedules improve utility? To answer these questions, in this paper we develop a principled approach to construct dynamic schedules and quantify their utility bounds in different learning algorithms. Our contributions 1

