FEDPD: DEFYING DATA HETEROGENEITY THROUGH PRIVACY DISTILLATION

Abstract

Model performance of federated learning (FL) typically suffers from data heterogeneity, i.e., data distribution varies with clients. Advanced works have already shown great potential for sharing client information to mitigate data heterogeneity. Yet, some literature shows a dilemma in preserving strong privacy and promoting model performance simultaneously. Revisiting the purpose of sharing information motivates us to raise the fundamental questions: Which part of the data is more critical for model generalization? Which part of the data is more privacy-sensitive? Can we solve this dilemma by sharing useful (for generalization) features and maintaining more sensitive data locally? Our work sheds light on data-dominated sharing and training, in a way that we decouple original training data into sensitive features and generalizable features. To be specific, we propose a Federated Privacy Distillation framework named FedPD to alleviate the privacy-performance dilemma. Namely, FedPD keeps the distilled sensitive features locally and constructs a global dataset using shared generalizable features in a differentially private manner. Accordingly, clients can perform local training on both the local and securely shared data for acquiring high model performance and avoiding the leakage of not distilled privacy. Theoretically, we demonstrate the superiority of the sharing-only useful feature strategy over sharing raw data. Empirically, we show the efficacy of FedPD in promoting performance with comprehensive experiments.

1. INTRODUCTION

Federated learning (FL), as an emerging protection paradigm, receives increasing attention recently (Kairouz et al., 2021; Li et al., 2021b; Yang et al., 2019) , which preserves data privacy without transmitting pure data. In general, distributed clients collaboratively train a global model by aggregating gradients (or model parameters). However, distributed data can cause heterogeneity issues (McMahan et al., 2017; Li et al., 2022; 2020; Zhao et al., 2018) , due to diverse computing capability and non-IID data distribution across federated clients. It results in unstable convergence and degraded performance. To address the challenge of heterogeneity, the seminal work, federated averaging (FedAvg) (McMahan et al., 2017) , proposes weighted averaging to overcome Non-IID data distribution when sharing selected local parameters in each communication round. Despite addressing the diversity of computing and communication, FedAvg still struggles with the client drift issue (Karimireddy et al., 2020) . Therefore, recent works try to resolve this issue by devising new learning objectives (Li et al., 2020) , designing new aggregation strategies (Yurochkin et al., 2019) and constructing information for sharing (Zhao et al., 2018; Yoon et al., 2021) . Among these explorations, sharing relevant information across clients provides a straightforward and promising approach to mitigate data heterogeneity. However, recent works point out a dilemma in preserving strong privacy and promoting model performance. Specifically, (Zhao et al., 2018) show that a limited amount of sharing data could significantly improve training performance. Unfortunately, sharing raw data, synthesized data, logits and statistical information (Luo et al., 2021; Goetz & Tewari, 2020; Hao et al., 2021; Karimireddy et al., 2020) can incur high privacy risks. To protect clients' privacy, differential privacy (DP) provides a de facto standard way for provable security quantitatively. The primary concern in applying DP is about performance degradation (Tramer & Boneh, 2020) . Thus, solving the above dilemma can contribute to promoting model performance while preserving strong privacy.

