EXPLOIT UNLABELED DATA ON THE SERVER! FEDERATED LEARNING VIA UNCERTAINTY-AWARE ENSEMBLE DISTILLATION AND SELF-SUPERVISION Anonymous

Abstract

Federated Learning (FL) is a distributed machine learning paradigm that involves the cooperation of multiple clients to train a server model. In practice, it is hard to assume that each client possesses large-scale data or many clients are always available to participate in FL for the same round, which may lead to data deficiency. This deficiency degrades the entire learning process. To resolve this challenge, we propose a Federated learning with entropy-weighted ensemble Distillation and Self-supervised learning (FedDS). FedDS reliably deals with situations where not only the amount of data per client but also the number of clients is scarce. This advantage is achieved by leveraging the prevalent unlabeled data in the server. We demonstrate the effectiveness of FedDS on classification tasks for CIFAR-10/100 and PathMNIST. In CIFAR-10, our method shows the improvement over FedAVG by 12.54% in data deficient regime, and by 17.16% and 23.56% in more challenging scenarios of noisy label or Byzantine client cases, respectively.

1. INTRODUCTION

, rapidly drops as the amount of available data at each client decreases. Our FedDS mitigates the effect of data deficiency by exploiting unlabeled data on the server. The accuracy is measured at the 50th communication round on CIFAR-10 classification task. This result follows the same setting as the main experiment in Fig. 4 except the data amounts for each client. Federated Learning (FL) is a distributed machine learning paradigm that involves the cooperation of multiple clients to train a server model (McMahan et al., 2017) . In FL, a server model is trained as follows: 1) the server distributes the current server model to clients, 2) each client independently trains each model downloaded from the server with their available local data and sends back the resultant model to the server, 3) the server updates the server model with the collected locally-trained models, and 4) repeat the steps. By collecting updated model parameters at the server instead of raw client data, FL can mitigate the personal information leakage. In some FL scenarios, such as developing a medical diagnosis algorithm, it is often the case where the number of clients (participating hospitals) and the size of labeled datasets in each client (the number of relevant patients and those labels at each hospital) are deficient. Such a lack of participating clients and labeled datasets in the clients leads to performance degradation for the standard FL method, e.g., FedAVG (McMahan et al., 2017) (refer to Fig. 1 ). The deficiency may also destabilize the learning process which increases label noise sensitivity of FL methods. Even in such scenarios, unlabeled data is abundant or easy to collect in practice, which may help to mitigate the data deficiency and label noise vulnerability of FL algorithms. In this paper, we propose a robust FL algorithm by utilizing additional unlabeled data on the server. The key idea of our method is to leverage unlabeled data to mitigate lack of data as well as to reliably 1



Figure 1: The accuracy of the standard FLmethod, FedAVG (McMahan et al., 2017), rapidly drops as the amount of available data at each client decreases. Our FedDS mitigates the effect of data deficiency by exploiting unlabeled data on the server. The accuracy is measured at the 50th communication round on CIFAR-10 classification task. This result follows the same setting as the main experiment in Fig.4except the data amounts for each client.

