FINDING PRIVATE BUGS: DEBUGGING IMPLEMEN-TATIONS OF DIFFERENTIALLY PRIVATE STOCHASTIC GRADIENT DESCENT

Abstract

It is important to learn with privacy-preserving algorithms when training data contains sensitive information. Differential privacy (DP) proposes to bound the worst-case privacy leakage of a training algorithm. However, the analytic nature of these algorithmic guarantees makes it difficult to verify that an implementation of a differentially private learner is correct. Research in the field focuses on empirically approximating the analytic bound, which only assesses whether an implementation provides the guarantee claimed for a particular dataset or not. It is also typically costly. In this paper, we take a first step towards providing a simple and lightweight methodology for practitioners to identify common implementation mistakes without imposing any changes to their scripts. Our approach stems from measuring distances between models outputted by the training algorithm. We demonstrate that our method successfully identifies specific mistakes made in the implementation of DP-SGD, the de facto algorithm for differentially private deep learning. These mistakes are improper gradient computations or noise miscalibration. Both approaches invalidate assumptions that are essential to obtaining a rigorous privacy guarantee.

1. INTRODUCTION

Machine learning (ML) models trained without taking privacy into consideration may inadvertently expose sensitive information contained in their training data (Shokri et al., 2017; Rahman et al., 2018; Song & Shmatikov, 2019; Fredrikson et al., 2015) . Training with differential privacy (DP) (Dwork et al., 2014) emerged as an established practice to bound and decrease such possible leakage. Because differential privacy guarantees are algorithmic, they require modifications to the training algorithm to obtain such a bound. This bound is also known as the privacy budget ε of the algorithm. Making the necessary modifications can be challenging because practitioners often do not have the DP expertise required to ensure that the implementation is sound and correct, and wrong implementations usually do not "fail loudly" (i.e., they do not block training, nor lead to obvious differences in terms of the performance of the trained models). In this paper, we approach this problem through testing practices. We focus on the canonical DP learning algorithm, which is the differentially private stochastic gradient descent (DP-SGD) Chaudhuri et al. (2011); Abadi et al. (2016) . Established research in the field has considered testing this algorithm but only from an auditing perspective with an external party, e.g., a regulator. Their approach is to interact with the implementation of DP-SGD in a black-box fashion to empirically verify the privacy budget achieved by the algorithm, ε, is the one claimed by its developer (Jagielski et al., 2020; Nasr et al., 2021; Tramer et al., 2022) . It is important to note that any discrepancy does not get attributed to the specific mistake(s) made in the implementation. Instead, it simply informs us if an implementation is correct or not. As we introduce our framework for testing implementations of DP-SGD to identify common failures, we adopt the perspective of the developer themselves. This is orthogonal and, in fact, complementary to prior work on auditing the privacy budget of DP-SGD implementations. Once prior work has identified an incorrect implementation, our framework can be used to help identify the source of the discrepancy. We see two key use cases where this would be beneficial for developers: (1) when

