LOTTERY AWARE SPARSITY HUNTING: ENABLING FED-ERATED LEARNING ON RESOURCE-LIMITED EDGE

Abstract

Limited computation and communication capabilities of clients pose significant challenges in federated learning (FL) over resource-limited edge nodes. A potential solution to this problem is to deploy off-the-shelf sparse learning algorithms that train a binary sparse mask on each client with the expectation of training a consistent sparse server mask yielding sparse weight tensors. However, as we investigate in this paper, such naive deployments result in a significant drop in accuracy compared to FL with dense models, especially for clients with limited resource budgets. In particular, our investigations reveal a serious lack of consensus among the trained sparsity masks on clients, which prevents convergence for the server mask and potentially leads to a substantial drop in model performance. Based on such key observations, we propose federated lottery aware sparsity hunting (FLASH), a unified sparse learning framework to make the server win a lottery in terms of yielding a sparse sub-model, able to maintain classification performance under highly resource-limited client settings. Moreover, to support FL on different devices requiring different parameter density, we leverage our findings to present hetero-FLASH, where clients can have different target sparsity budgets based on their device resource limits. Experimental evaluations with multiple models on various datasets (both IID and non-IID) show superiority of our models in closing the gap with unpruned baseline while yielding up to ∼10.1% improved accuracy with ∼10.26× fewer communication costs, compared to existing alternatives, at similar hyperparameter settings. Code is released as Supplementary.

1. INTRODUCTION

Federated learning (FL) McMahan et al. (2017) is a popular form of distributed training, which has gained significant traction due to its ability to allow multiple clients to learn a shared global model without the requirement to transfer their private data. However, clients' heterogeneity and resource limitations pose significant challenges for FL deployment over edge nodes, including mobile phones and IoT devices. To resolve these issues, various methods have been proposed over the past few years including efficient learning for heterogeneous collaborative training Lin et al. ( 2020 



); Zhu et al. (2021), distillation He et al. (2020), federated dropout techniques Horvath et al. (2021); Caldas et al. (2018b), efficient aggregation for faster convergence and reduced communication Reddi et al. (2020); Li et al. (2020b). However, these methods do not necessarily address the growing concerns of highly computation and communication limited edge. Meanwhile, reducing the memory, compute, and latency costs for deep neural networks (DNNs) in centralized training for their efficient edge deployment has also become an active area of research. In particular, recently proposed sparse learning (SL) strategies Evci et al. (2020); Kundu et al. (2021b); Mocanu et al. (2018); Dettmers & Zettlemoyer (2019); Raihan & Aamodt (2020) effectively train weights and associated binary sparse masks to allow only a fraction of model parameters to be updated during training, potentially enabling the lucrative reduction in both the training time and compute cost Qiu et al. (2021); Raihan & Aamodt (2020), while creating a model to meet a target parameter density denoted as d, and is able to yield accuracy close to that of the unpruned baseline. However, the challenges and opportunities of sparse learning in FL is yet to be fully unveiled. Only very recently, few works Bibikar et al. (2021); Huang et al. (2022) have tried to leverage sparse learning in FL primarily to show their efficacy in non-IID settings. Nevertheless, these works primarily used sparsity for non-aggressive model compression, 1

