MODEL-AGNOSTIC ROUND-OPTIMAL FEDERATED LEARNING VIA KNOWLEDGE TRANSFER

Abstract

Federated learning enables multiple parties to collaboratively learn a model without exchanging their local data. Currently, federated averaging (FedAvg) is the most widely used federated learning algorithm. However, FedAvg or its variants have obvious shortcomings. It can only be used to learn differentiable models and needs many communication rounds to converge. In this paper, we propose a novel federated learning algorithm FedKT that needs only a single communication round (i.e., round-optimal). With applying the knowledge transfer approach, our algorithm can be applied to any classification model. Moreover, we develop the differentially private versions of FedKT and theoretically analyze the privacy loss. The experiments show that our method can achieve close or better accuracy compared with the other state-of-the-art federated learning algorithms.

1. INTRODUCTION

While the size of training data can influence the machine learning model quality a lot, the data are often dispersed over different parties in reality. Due to regulations on data privacy, the data cannot be centralized to a single party for training. To address these issues, federated learning (Kairouz et al., 2019; Li et al., 2019a; b; Yang et al., 2019) enables multiple parties to collaboratively learn a model without exchanging their local data. It has become a hot research topic and shown promising results in the real world (Bonawitz et al., 2019; Hard et al., 2018; Li et al., 2020a; Peng et al., 2020) . Currently, federated averaging (FedAvg) (McMahan et al., 2016 ) is a widely used federated learning algorithm. Its training is an iterative process with four steps in each iteration. First, the server sends the global model to the selected parties. Second, each of the selected parties updates its model with their local data. Third, the updated models are sent to the server. Last, the server averages all the received models to update the global model. There are also many variants of FedAvg (Li et al., 2020c; Karimireddy et al., 2020) . For example, to handle the heterogeneous data setting, FedProx (Li et al., 2020c) introduces an additional proximal term to limit the local updates, while SCAFFOLD (Karimireddy et al., 2020) introduces control variates to correct the local updates. The overall frameworks of these studies are still similar to FedAvg. FedAvg or its variants have the following limitations. First, they rely on the gradient descent for optimization. Thus, they cannot be applied to train non-differentiable models such as decision trees in the federated setting. Second, the algorithm usually needs many communication rounds to finally achieve a good model, which causes massive communication traffic and fault tolerance requirements among rounds. Last, FedAvg is originally designed for the cross-device setting (Kairouz et al., 2019) , where the parties are mobile devices and the number of parties is large. In the cross-silo setting where the parties are organizations or data centers and the number of parties is relatively small, it is possible to take better advantage of the computation resources of the parties with relatively high computation power. In order to address the above-mentioned limitations, we propose a novel federated learning algorithm called FedKT (Federated learning via Knowledge Transfer) focusing on the cross-silo setting. With the round-optimal design goal, FedKT extends the idea of ensemble learning in a novel 2-tier design to federated setting. Inspired by the success of the usage of unlabelled public data in many studies (Papernot et al., 2017; 2018; Jordon et al., 2019; Chang et al., 2019) , which often exists such as text and images, we adopt the knowledge transfer method to reduce the inference and storage costs of ensemble learning. As such, FedKT is able to learn any classification model including differentiable

