META KNOWLEDGE CONDENSATION FOR FEDERATED LEARNING

Abstract

Existing federated learning paradigms usually extensively exchange distributed models at a central solver to achieve a more powerful model. However, this would incur severe communication burden between a server and multiple clients especially when data distributions are heterogeneous. As a result, current federated learning methods often require a large number of communication rounds in training. Unlike existing paradigms, we introduce an alternative perspective to significantly decrease the communication cost in federate learning. In this work, we first introduce a meta knowledge representation method that extracts meta knowledge from distributed clients. The extracted meta knowledge encodes essential information that can be used to improve the current model. As the training progresses, the contributions of training samples to a federated model also vary. Thus, we introduce a dynamic weight assignment mechanism that enables samples to contribute adaptively to the current model update. Then, informative meta knowledge from all active clients is sent to the server for model update. Training a model on the combined meta knowledge without exposing original data among different clients can significantly mitigate the heterogeneity issues. Moreover, to further ameliorate data heterogeneity, we also exchange meta knowledge among clients as conditional initialization for local meta knowledge extraction. Extensive experiments demonstrate the effectiveness and efficiency of our proposed method. Remarkably, our method outperforms the state-of-the-art by a large margin (from 74.07% to 92.95%) on MNIST with a restricted communication budget (i.e., 10 rounds).

1. INTRODUCTION

Most deep learning-based models are trained in a data-centralized manner. However, in some cases, data might be distributed among different clients and cannot be shared. To address this issue, Federated Learning (FL) (Yang et al., 2019b; a; Kairouz et al., 2021) has been proposed to learn a powerful model without sharing private original data among clients. In general, most prior FL works often require frequent model communications to exchange models between local clients and a global server, resulting in heavy communications burden (Wu & Wang, 2021; Chencheng et al., 2022) . Therefore, it is highly desirable to obtain a powerful federated model with only a few communication rounds. In this work, we propose a new meta knowledge-driven federated learning approach to achieve an effective yet communication-efficient model, thus significantly reducing communication costs. Unlike prior works, we formulate federated learning in a new perspective, where representative information will be distilled from original data and sent to the server for model training. On the client side, we extract representative information of original data and condense it into a tiny set of highly-compressed synthetic data, namely meta knowledge. Furthermore, we develop two mechanisms, i.e., dynamic weight assignment and meta knowledge sharing, in the condensation process to mitigate the data heterogeneity issue widely existing in decentralized data. On the server side, we train our global model with meta knowledge uploaded from clients rather than simply averaging client models. Specifically, we firstly distill the task-specific knowledge from private data on local clients and condense it as meta knowledge. The meta knowledge condensation process is modeled as a bi-level optimization procedure under the federated learning setting: the inner-loop minimizes the training loss on meta knowledge to update a model; and the outer-loop minimizes the training loss on original data to update meta knowledge based on the updated model. In the optimization process, we assign dynamic weights to each sample based on its training loss. By dynamically adjusting the weight of each sample in training, we empower each sample to contribute adaptively to the current model. Besides, to further mitigate heterogeneous data distributions among different clients, we design a meta knowledge sharing mechanism. Our model can be trained with meta knowledge of various clients, which better describes the overall distribution. This is in contrast to previous methods that average local models on the server. To further improve the stability of the central model training, we incorporate a learnable conditional generator. The generator models the statistical distribution of the uploaded meta knowledge and generates synthetic samples, which provide historical information to the model update. It is worth noting that meta knowledge, which contains the essential information of the original data and the corresponding class information, can be used as normal training data for model training. As a result, our global model is trained with both the uploaded and generated meta knowledge on the server side, effectively reducing the impact of data heterogeneity and reducing the number of communication rounds. We have conducted extensive experiments on several benchmark datasets, including MNIST (LeCun et al., 2010) , SVHN (Netzer et al., 2011 ), CIFAR10 (Krizhevsky & Hinton, 2009 ), and CIFAR100 (Krizhevsky & Hinton, 2009) . The results demonstrate the efficacy and efficiency of our proposed approach. In particular, our method demonstrates a significant improvement over the competing works, particularly in scenarios with limited communication budgets (i.e., 10 communication rounds). Overall, our key contributions are summarized as follows: • We propose a new meta knowledge driven federated learning approach, in which we present a novel approach for federated meta-knowledge extraction. Our method can effectively encodes local data for global model training. Specifically, we formulate a dynamic weight assignment mechanism to enhance the informative content of the extracted meta-knowledge, and design a knowledge sharing strategy to facilitate the exchange of meta-knowledge among clients without exchanging the original data. • We introduce a server-side conditional generator that models the statistical distribution of uploaded meta knowledge to stabilize the training process. Benefiting from the extracted



Figure 1: Illustration of our pipeline, in which only three active clients are shown. The local clients conduct meta knowledge condensation from local private data, and the server utilizes the uploaded meta knowledge for training a global model. The local meta knowledge condensation and central model training are conducted in an iterative manner. For meta knowledge extraction on clients, we design two mechanisms, i.e., meta knowledge sharing, and dynamic weight assignment. For server-side central model training, we introduce a learnable constraint.

