DATA-FREE ONE-SHOT FEDERATED LEARNING UNDER VERY HIGH STATISTICAL HETEROGENEITY

Abstract

Federated learning (FL) is an emerging distributed learning framework that collaboratively trains a shared model without transferring the local clients' data to a centralized server. Motivated by concerns stemming from extended communication and potential attacks, one-shot FL limits communication to a single round while attempting to retain performance. However, one-shot FL methods often degrade under high statistical heterogeneity, fail to promote pipeline security, or require an auxiliary public dataset. To address these limitations, we propose two novel data-free one-shot FL methods: FEDCVAE-ENS and its extension FEDCVAE-KD. Both approaches reframe the local learning task using a conditional variational autoencoder (CVAE) to address high statistical heterogeneity. Furthermore, FEDCVAE-KD leverages knowledge distillation to compress the ensemble of client decoders into a single decoder. We propose a method that shifts the center of the CVAE prior distribution and experimentally demonstrate that this promotes security, and show how either method can incorporate heterogeneous local models. We confirm the efficacy of the proposed methods over baselines under high statistical heterogeneity using multiple benchmark datasets. In particular, at the highest levels of statistical heterogeneity, both FEDCVAE-ENS and FEDCVAE-KD typically more than double the accuracy of the baselines.

1. INTRODUCTION

Traditional federated learning (FL) achieves privacy protection by sharing learned model parameters with a central server, circumventing the need for a centralized dataset and thus allowing potentially sensitive data to remain local to client devices (McMahan et al., 2017) . FL has shown promise in several practical application domains with privacy concerns, such as health care, mobile phones, and industrial engineering (Li et al., 2020a) . However, most existing FL methods depend on substantial iterative communication (Guha et al., 2019; Li et al., 2020b) , introducing a vulnerability to eavesdropping attacks, among other privacy and security concerns (Mothukuri et al., 2021) . One-shot FL has emerged to address issues associated with communication and security in standard FL (Guha et al., 2019) . One-shot FL limits communication to a single round, which is more practical in scenarios like model markets, where models trained to convergence are sold with no possibility for iterative communication during local client training (Li et al., 2021b) . In high impact settings, like health care, data could be highly heterogeneous and computation capabilities could be varied; for example, health care institutions could have different prevalence rates of particular diseases or no data on a disease and substantially different computing abilities depending on funding (Li et al., 2020a) . Furthermore, fewer communications rounds means fewer opportunities for eavesdropping attacks. While results in one-shot FL are promising, existing methods struggle under high statistical heterogeneity, non-independently-and identically-distributed (non-IID) data, (i.e., Zhou et al. ( 2020 Figure 1 : Motivating our proposed methods, FEDCVAE-ENS and FEDCVAE-KD, using the MNIST dataset as an example. In cases of very high statistical heterogeneity, each client will only observe one or two of the ten available classes, as seen on the left where the size of each dot is proportional to the number of samples. For example, client 2 only observes 4's and 7's, resulting in a client decoder that can expertly generate these digits. Note that the columns are shown in order of conditioning class (digits 0-9). Similarly, client 4 is an expert in 3's and 6's. In FEDCVAE-KD, our lightweight knowledge distillation training procedure compacts local learning into a single server decoder, as evidenced by the high-quality samples from all available classes (digits 0-9). This server decoder can then be used for any downstream task, e.g., classification. To address these issues, we jointly propose FEDCVAE-ENS and FEDCVAE-KD, two novel datafree one-shot FL models that reframe the local learning task using conditional variational autoencoders (CVAE). Because CVAEs can easily learn a simplified data distribution, both methods train CVAEs locally to capture the narrow conditional data distributions that arise in the high statistical heterogeneity setting. Figure 1 shows how client decoders become experts in the few classes that they observed. These decoders are ensembled (FEDCVAE-ENS) or compactly aggregated (FEDCVAE-KD). More specifically, FEDCVAE-KD aggregates the models using a lightweight knowledge distillation procedure; client decoders are teachers, and the server decoder is the student. Figure 1 shows images generated by the server decoder. Thorough experiments on multiple benchmark datasets (MNIST, FashionMNIST, SVHN) demonstrate the superiority of FEDCVAE-ENS and FEDCVAE-KD over other relevant one-shot FL methods in the high statistical heterogeneity setting. In particular, FEDCVAE-ENS and FEDCVAE-KD obtain more than 1.75× the accuracy of the best baseline method for MNIST, more than 2× the accuracy for FashionMNIST, and more than 2.75× the accuracy for SVHN under extreme statistical heterogeneity (i.e., clients only observe one or two classes). Furthermore, to protect the decoders uploaded to the server, we propose a method to shift the center of the CVAE prior distribution. We show that without knowing the center of the prior, an eavesdropping attacker cannot train a performant classifier, thus promoting pipeline security. In sum, our contributions are two one-shot FL methods targeted to the high statistical heterogeneity setting that: (1) perform substantially better than other baseline methods in this setting, (2) demonstrate invariance to the number of clients, (3) are data-free and can be applied to any downstream task requiring a labeled dataset, (4) allow for heterogeneous local model architectures, and (5) extend to promote pipeline security. To the best of our knowledge, we are the first to thoroughly address very high statistical heterogeneity in one-shot FL.

2. PRELIMINARIES

Conditional Variational Autoencoders. A variational autoencoder (VAE) is a probabilistic generative model that attempts to learn the distribution of data samples (Kingma & Welling, 2014) .



) Zhang et al. (2021)) or do not fully consider statistical heterogeneity (i.e., Guha et al. (2019), Shin et al. (2020), Li et al. (2021b)). Additionally, most do not consider pipeline security (i.e., Shin et al. (2020), Li et al. (2021b), Zhang et al. (2021)). Furthermore, an auxiliary public dataset is often required to achieve satisfactory performance in one-shotFL (i.e., Guha et al. (2019), Li et al.  (2021b)), which may be difficult to obtain in practice(Zhu et al., 2021).

