MULTI-EPOCH MATRIX FACTORIZATION MECHANISMS FOR PRIVATE MACHINE LEARNING

Abstract

We introduce new differentially private (DP) mechanisms for gradient-based machine learning (ML) training involving multiple passes (epochs) of a dataset, substantially improving the achievable privacy-utility-computation tradeoffs. Our key contribution is an extension of the online matrix factorization DP mechanism to multiple participations, substantially generalizing the approach of Denisov et al. (2022). We first give conditions under which it is possible to reduce the problem with per-iteration vector contributions to the simpler one of scalar contributions. Using this, we formulate the construction of optimal (in total squared error at each iterate) matrix mechanisms for SGD variants as a convex program. We propose an efficient optimization algorithm via a closed form solution to the dual function. While tractable, both solving the convex problem offline and computing the necessary noise masks during training can become prohibitively expensive when many training steps are necessary. To address this, we design a Fourier-transform-based mechanism with significantly less computation and only a minor utility decrease. Extensive empirical evaluation on two tasks: example-level DP for image classification and user-level DP for language modeling, demonstrate substantial improvements over the previous state-of-the-art. Though our primary application is to ML, we note our main DP results are applicable to arbitrary linear queries and hence may have much broader applicability.

1. INTRODUCTION

Differentially private stochastic gradient descent (DP-SGD) is the de facto standard algorithm for DP machine learning (ML) (Song et al., 2013; Abadi et al., 2016a) . However, obtaining stateof-the-art privacy-utility tradeoffs critically requires use of privacy amplification techniques like shuffling (Erlingsson et al., 2019; Feldman et al., 2022) or (Poisson) subsampling (Bassily et al., 2014; Zhu & Wang, 2019; Wang et al., 2019) . These in turn require strong assumptions on the manner in which data is processed that are rarely valid in applications of DP-SGD, as implementing these procedures is often impractical (Kairouz et al., 2021) . 2021) recently proposed the DP-FTRL framework that avoids reliance on amplification by sampling, through using DP streaming of prefix sums (Dwork et al., 2010; Chan et al., 2011; Honaker, 2015) . DP-FTRL can often match (or outperform) DP-SGD in privacy-utility tradeoffs. Indeed, this algorithm enabled McMahan & Thakurta (2022) to train the first known provably DP ML model on user data in a production setting.

Kairouz et al. (

Several works have since focused on this primitive as an instantiation of the streaming matrix mechanism; in particular, Denisov et al. (2022) showed that leveraging optimal matrix mechanisms led to significant empirical improvements, though their work was restricted to the single-epoch setting. Shown in Figs. 1 and 3 , we achieve substantially improved privacy-utility tradeoffs, with comparable computation. Our methods outperform all prior work, including DP-SGD with amplification, to as low as ε ≈ 2. To accomplish this, we propose a formalism for measuring multi-participation sensitivity, given in Section 2, a significant extension to the single-participation sensitivity used in in Denisov et al. (2022) . We show in Section 3 how one may compute matrix mechanisms optimized for this multi-participation setting. This generalization enables application of optimized streaming matrix mechanisms to settings where each example (or user) may contribute to multiple elements of the data matrix (the matrix formed by stacking unnoised batch gradients in ML).

