CAN FAIR FEDERATED LEARNING REDUCE THE NEED FOR PERSONALIZATION?

Abstract

Federated Learning (FL) allows edge devices to collaboratively train machine learning models without sharing data. Since the data distribution varies across clients, the performance of the federated model on local data also varies. To solve this, fair FL approaches attempt to reduce the accuracy disparity between local partitions by focusing on clients with larger losses; while local adaptation personalizes the federated model by re-training it on local data-providing a device participation incentive when a federated model underperforms relatively to one trained locally. This paper evaluates two Fair Federated Learning (FFL) algorithms in this relative domain and determines if they provide a better starting point for personalization or supplant it. Contrary to expectation, FFL does not reduce the number of underperforming clients in a language task while doubling them in an image recognition task. Furthermore, fairness levels which maintain performance provide no benefit to relative accuracy in federated or adapted models. We postulate that FFL is unsuitable for our goal since clients with highly accurate local models require the federated one to have a disproportionate local accuracy to receive benefits. Instead, we propose Personalization-aware Federated learning (PaFL) as a paradigm which uses personalization objectives during FL training and allows them to vary across rounds. Our preliminary results show a 50% reduction in underperforming clients for the language task with knowledge distillation. For the image task, PaFL with elastic weight consolidation or knowledge distillation avoids doubling the number of underperformers. Thus, we argue that PaFL represents a more promising means of reducing the need for personalization.

1. INTRODUCTION

Edge devices represent a pool of computational power and data for ML tasks. To use such devices while minimizing communication costs, McMahan et al. (2017) introduced Federated Learning (FL). Federated Learning trains models directly on clients devices without sharing data. As the data distribution differs across clients, FL must balance average performance and performance on specific clients. In some cases, a federated model may perform worse than a fully local one-thus lowering the incentive for FL participation. While the sets of potential use cases for fairness and personalization are not identical-e.g. personalization would be inappropriate for very low-data clients-FFL could potentially construct a fairer relative accuracy distribution without hurting average performance. For FFL to reduce the need for personalization it would have to lower the number of underperforming clients or improve their av-1



The existing body of work on balancing global and local performance focuses on two primary means of improving the client accuracy distribution.Li et al. (2019a) and Li et al. (2020a)  propose two Fair FL techniques, q-Fair Federated Learning (q-FFL) and Tilted Empirical Risk Minimization (TERM), which raise the accuracy of the worst-performers by focusing on clients with large losses during global FL training. Alternatively, using local adaptation (personalization) methods such as Freezebase (FB), Multi-task Learning (MTL) with Elastic Weight Consolidation (EWC), and Knowledge Distillation (KD) has been recommended byYu et al. (2020) and Mansour et al. (2020)   in order to construct effective local models from the global one. Since personalization is local, the natural baseline of comparison is a local model trained only on the client. In this work, relative accuracy refers to the accuracy difference between a federated and local model on a client test set.

