FINDING PRIVATE BUGS: DEBUGGING IMPLEMEN-TATIONS OF DIFFERENTIALLY PRIVATE STOCHASTIC GRADIENT DESCENT

Abstract

It is important to learn with privacy-preserving algorithms when training data contains sensitive information. Differential privacy (DP) proposes to bound the worst-case privacy leakage of a training algorithm. However, the analytic nature of these algorithmic guarantees makes it difficult to verify that an implementation of a differentially private learner is correct. Research in the field focuses on empirically approximating the analytic bound, which only assesses whether an implementation provides the guarantee claimed for a particular dataset or not. It is also typically costly. In this paper, we take a first step towards providing a simple and lightweight methodology for practitioners to identify common implementation mistakes without imposing any changes to their scripts. Our approach stems from measuring distances between models outputted by the training algorithm. We demonstrate that our method successfully identifies specific mistakes made in the implementation of DP-SGD, the de facto algorithm for differentially private deep learning. These mistakes are improper gradient computations or noise miscalibration. Both approaches invalidate assumptions that are essential to obtaining a rigorous privacy guarantee.

1. INTRODUCTION

Machine learning (ML) models trained without taking privacy into consideration may inadvertently expose sensitive information contained in their training data (Shokri et al., 2017; Rahman et al., 2018; Song & Shmatikov, 2019; Fredrikson et al., 2015) . Training with differential privacy (DP) (Dwork et al., 2014) emerged as an established practice to bound and decrease such possible leakage. Because differential privacy guarantees are algorithmic, they require modifications to the training algorithm to obtain such a bound. This bound is also known as the privacy budget ε of the algorithm. Making the necessary modifications can be challenging because practitioners often do not have the DP expertise required to ensure that the implementation is sound and correct, and wrong implementations usually do not "fail loudly" (i.e., they do not block training, nor lead to obvious differences in terms of the performance of the trained models). In this paper, we approach this problem through testing practices. We focus on the canonical DP learning algorithm, which is the differentially private stochastic gradient descent (DP-SGD) Chaudhuri et al. (2011); Abadi et al. (2016) . Established research in the field has considered testing this algorithm but only from an auditing perspective with an external party, e.g., a regulator. Their approach is to interact with the implementation of DP-SGD in a black-box fashion to empirically verify the privacy budget achieved by the algorithm, ε, is the one claimed by its developer (Jagielski et al., 2020; Nasr et al., 2021; Tramer et al., 2022) . It is important to note that any discrepancy does not get attributed to the specific mistake(s) made in the implementation. Instead, it simply informs us if an implementation is correct or not. As we introduce our framework for testing implementations of DP-SGD to identify common failures, we adopt the perspective of the developer themselves. This is orthogonal and, in fact, complementary to prior work on auditing the privacy budget of DP-SGD implementations. Once prior work has identified an incorrect implementation, our framework can be used to help identify the source of the discrepancy. We see two key use cases where this would be beneficial for developers: (1) when they integrate an existing implementation of DP-SGD in their ML pipeline; and (2) when they write their own implementation of DP-SGD from scratch. While the usefulness of our method is apparent in the latter application scenario, even using an off-the-shelf DP-SGD implementation is not as trivial as it may initially sound. This is due to the fact that open-source implementations of DP-SGD are still nascent and do not always support all modern architectural features of ML pipelines. Since the outputs of DP-SGD are updates of model parameters, incorrect implementations of it should manifest themselves through differences in the parameter space of the trained ML models. However, the parameter space is typically high-dimensional and the nature of non-convex optimization makes it difficult to evaluate whether the solution yielded by a particular DP-SGD implementation is correct. Therefore, we propose to identify and approximate such differences evaluating functions defined in the parameter space of the models, or by comparing the models' losses. Note that the DP-SGD algorithm makes three key modifications to vanilla SGD to obtain differential privacy, each of which represents a potential failure point in its implementation. First, gradients are computed on a per-example basis. This enables the algorithm to isolate the contribution of each training example. Second, the norm of these per-example gradients are clipped. This bounds the sensitivity of the algorithm to each individual training example. Third, gradients are noised before they are applied to update the model. This provides the indistinguishability across updates needed to obtain differential privacy. We introduce a methodology to identify common implementations mistakes for each of these modifications: (1) We observe correct gradient clipping restricts the impact of some training steps on the model and thus the change in loss values caused by such steps. However, this may not be the case if gradient clipping is performed incorrectly or not performed at all. Therefore, we leverage changes in loss values to test correctness of gradient clipping; and (2) Having the noise improperly calibrated to the sensitivity of the training algorithm often results in insufficient indistinguishability in the parameters to obtain differential privacy. Thus we detect improper calibration of the noise added to gradients by measuring the parameter-space distance between models obtained. Note that both of these tests can be performed on checkpoints outputted by the ML pipeline without modifications on the training scripts, which makes them easy to deploy universally. We validate our approach on standard image classification tasks (i.e., ResNet-20 and Wide ResNet50-2 architectures trained on CIFAR-10 and ImageNet datasets, respectively) and an NLP task (i.e., a Bert model trained on the Stanford Sentiment Treebank v2). Our proposed method is able to detect the three common implementation mistakes we highlighted earlier, namely when: (1) the gradient is evaluated using an aggregate on the mini-batch of data points before clipping; (2) the updates are computed with no clipping; and (3) the additive noise is not calibrated. In summary, our tests help identify implementation mistakes that developers may not be aware of, and that would otherwise invalidate the privacy guarantees claimed. Our conceptually simple, computationally efficient, dataset-agnostic and model-agnostic tests detect common mistakes in the implementation of gradient clipping and additive noise: • We characterize the effect of per-example gradient clipping, mini-batch gradient clipping, and no gradient clipping on gradient updates computed by DP-SGD. Based on our analysis, we design a test that detects incorrect gradient clipping by either varying the mini-batch size or the gradient norm bound which clipping is configured to enforce. • We theoretically demonstrate that the parameter-space distance between models trained with DP-SGD is a function of the scale of noise added to gradients. From this, we obtain a test that detects incorrect noise calibration by varying the value of the gradient norm bound. • We demonstrate, through extensive experimentation, that a developer can run our tests on their implementations to detect incorrect gradient clipping and incorrect noise calibration in both image and text domains without changing their scripts.

2. PROBLEM DESCRIPTION AND RELATED WORK

Differential privacy (Dwork et al., 2006; 2014) (DP) is the gold standard to reason about the privacy guarantees of learning algorithms. A randomized algorithm A is (ε, δ)-differentially private if its outputs for any S ∈ Range(A) and any two neighboring datasets D and D ′ that differ only by one

