SHARE YOUR REPRESENTATION ONLY: GUARANTEED IMPROVEMENT OF THE PRIVACY-UTILITY TRADEOFF IN FEDERATED LEARNING

Abstract

Repeated parameter sharing in federated learning causes significant information leakage about private data, thus defeating its main purpose: data privacy. Mitigating the risk of this information leakage, using state of the art differentially private algorithms, also does not come for free. Randomized mechanisms can prevent convergence of models on learning even the useful representation functions, especially if there is more disagreement between local models on the classification functions (due to data heterogeneity). In this paper, we consider a representation federated learning objective that encourages various parties to collaboratively refine the consensus part of the model, with differential privacy guarantees, while separately allowing sufficient freedom for local personalization (without releasing it). We prove that in the linear representation setting, while the objective is non-convex, our proposed new algorithm CENTAUR converges to a ball centered around the global optimal solution at a linear rate, and the radius of the ball is proportional to the reciprocal of the privacy budget. With this novel utility analysis, we improve the SOTA utility-privacy trade-off for this problem by a factor of √ d, where d is the input dimension. We empirically evaluate our method with the image classification task on CIFAR10, CIFAR100, and EMNIST, and observe a significant performance improvement over the prior work under the same small privacy budget. The code can be found in this link.

1. INTRODUCTION

In federated learning (FL), multiple parties cooperate to learn a model under the orchestration of a central server while keeping the data local. However, this paradigm alone is insufficient to provide rigorous privacy guarantees, even when local parties only share partial information (e.g. gradients) about their data. An adversary (e.g. one of the parties) can infer whether a particular record is in the training data set of other parties (Nasr et al., 2019) , or even precisely reconstruct their training data (Zhu et al., 2019) . To formally mitigate these privacy risks, we need to guarantee that any information shared between the parties during the training phase has bounded information leakage about the local data. This can be achieved using FL under differential privacy (DP) guarantees. FL and DP are relatively well-studied separately. However, their challenges multiply when conducting FL under a DP constraint, in real-world settings where the data distributions can vary substantially across the clients (Li et al., 2020b; Acar et al., 2020; Shen et al., 2022) . A direct consequence of such data heterogeneity is that the optimal local models might vary significantly across clients and differ drastically from the global solution. This results in large local gradients (Jiang et al., 2019) . However, these large signals leak information about the local training data, and cannot be communicated as such when we need to guarantee DP. We require clipping gradient values (usually by a small threshold (De et al., 2022) ) before sending them to the server, to bound the sensitivity of the gradient function with respect to changes in training data Abadi et al. (2016) . As the local persample gradients (due to data heterogeneity) tend to be large even at the global optimum, clipping per-example gradient by a small threshold and then randomizing it, will result in a high error in the overall gradient computation, and thus degrading the accuracy of the model learned via FL. Contributions. In this work, we identify an important bottleneck for achieving high utility in FL under a tight privacy budget: There exists a magnified conflict between learning the representation function and classification head, when we clip gradients to bound their sensitivity (which is required for achieving DP). This conflict causes slow convergence of the representation function and disproportional scaling of the local gradients, and consequently leads to the inevitable utility drop in DP FL. To address this issue, we observe that in many FL classification scenarios, participants have minimal disagreement on data representations (Bengio et al., 2013; Chen et al., 2020; Collins et al., 2021) , but possibly have very different classifier heads (e.g., the last layer of the neural network). Therefore, instead of solving the standard classification problem, we borrow ideas from the literature of model personalization and view the neural network model as a composition of a representation extractor and a small classifier head, and optimize these two components in different manners. In the proposed scheme, CENTAUR, we train a single differentially private global representation extractor while allowing each participant to have a different personalized classifier head. Such a decomposition has been considered in previous arts like (Collins et al., 2021) and (Singhal et al., 2021) , but only in a non-DP setting, and also in (Jain et al., 2021) , but only for a linear embedding case. Due to low heterogeneity in data representation (compared to the whole model), the DP learned representation in our new scheme outperforms prior schemes that perform DP optimization over the entire model. In the setting where both the representation function and the classifier heads are linear w.r.t. their parameters, we prove a novel utility-privacy trade-off for an instance of CENTAUR, yielding a significant O( √ d) improvement over previous art, where d is the input dimension (Corollary 5.1). A major algorithmic novelty of our proposed approach is a cross-validation scheme for boosting the success probability of the classic noisy power method for privacy-preserving spectral analysis. We present strong empirical evidence for the superior performance of CENTAUR over the prior work, under the small DP budget of (1, 10 -5 ) in a variety of data-heterogeneity settings on benchmark datasets CIFAR10, CIFAR100, and EMNIST. Our method outperforms the prior work in all settings. Moreover, we showcase that CENTAUR uniformly enjoys a better utility-privacy trade-off over its competitors on the CIFAR10 dataset across different privacy budget ϵ (Figure 1 ). Importantly, CENTAUR outperforms the local stand-alone training even with, ϵ = 0.5, thus justifying the benefit of collaborative learning compared to stand-alone training for a larger range of privacy budget. 1.1 RELATED WORK Federated learning with differential privacy has been extensively studied since its emergence (Shokri & Shmatikov, 2015; McMahan et al., 2017a) . Without any trusted central party, the local DP model requires each client to randomize its messages before sending them to other (malicious) parties. Consequently, the trade-off between local DP and accuracy is significantly worse than that for centralized setting and requires huge amount of data for learning even simple statistics (Duchi et al., 2014; Erlingsson et al., 2014; Ding et al., 2017) . By using secure aggregation protocol, recent works (McMahan et al., 2017b; Agarwal et al., 2018; Levy et al., 2021; Kairouz et al., 2021 ) study user-level DP under Billboard model to enable utility. We also focus on such user-level DP setting. Model personalization approaches (Smith et al., 2017; Fallah et al., 2020; Li et al., 2020b; Arivazhagan et al., 2019; Collins et al., 2021; Pillutla et al., 2022) enable each client to learn a different (while related) model, thus alleviating the model drifting issue due to data heterogeneity. Recent works further investigate whether model personalization approaches enable improved privacy accuracy trade-off for federated learning. Hu et al. (2021) propose a private federated multi-task learning algorithm by adding task-specific regularization to each client's optimization objective. However,



Figure 1: Privacy utility trade-off for models trained under CENTAUR and other algorithms on CIFAR10 (500 clients, 5 shards per user). Error bar denotes the std. across 3 runs.

