SUPERFED: WEIGHT SHARED FEDERATED LEARNING

Abstract

Federated Learning (FL) is a well-established technique for privacy preserving distributed training. Much attention has been given to various aspects of FL training. A growing number of applications that consume FL-trained models, however, increasingly operate under dynamically and unpredictably variable conditions, rendering a single model insufficient. We argue for training a global "family of models" cost efficiently in a federated fashion. Training them independently for different tradeoff points incurs ≈ O(k) cost for any k architectures of interest, however. Straightforward applications of FL techniques to recent weight-shared training approaches is either infeasible or prohibitively expensive. We propose SuperFedan architectural framework that incurs O(1) cost to co-train a large family of models in a federated fashion by leveraging weight-shared learning. We achieve an order of magnitude cost savings on both communication and computation by proposing two novel training mechanisms: (a) distribution of weight-shared models to federated clients, (b) central aggregation of arbitrarily overlapping weight-shared model parameters. The combination of these mechanisms is shown to reach an order of magnitude (9.43x) reduction in computation and communication cost for training a 5 * 10 8 -sized family of models, compared to independently training as few as k = 9 DNNs without any accuracy loss.

1. INTRODUCTION

With the increase in the computational power of smartphones, the use of on-device inference in mobile applications is on the rise, ranging from image recognition (google vision; azure vision) , virtual assistant (Alexa) , voice recognition (google ASR) to recommendation systems (Bin et al., 2019) . Indeed, on-device inference is pervasive, especially with recent advances in software (Chen et al., 2018; torch mobile) , accelerators (samsung exynos; apple neural engine), and neural architecture optimizations (Howard et al., 2019; Sun et al., 2020; Wu et al., 2019a) . The surge in its use cases (Cai et al., 2017; Han et al., 2019; Kang et al., 2017; Lane et al., 2016; Reddi et al., 2021; Wu et al., 2019b) has led to a growing interest in providing support not only for on-device inference, but also for on-device training of these models (Dhar et al., 2021) . Federated Learning (FL) is an emerging distributed training technique that allows smartphones with different data sources to collaboratively train an ML model (McMahan et al., 2017; Chen & Chao, 2020; Wang et al., 2020; Karimireddy et al., 2021; Konečnỳ et al., 2016) . FL enjoys three key properties, it -a) has smaller communication cost, b) is massively parallel, and c) involves no data-sharing. As a result, numerous applications such as GBoard (Hard et al., 2018), Apple's Siri (sir, 2019), pharmaceutical discovery (CORDIS., 2019), medical imaging (Silva et al., 2019) , health record mining (Huang & Liu, 2019) , and recommendation systems (Ammad-ud-din et al., 2019) are readily adopting federated learning. However, adoption of FL in smartphone applications is non-trivial. As a result, recent works pay attention to the emerging challenges that occur in training, such as data heterogeneity (Karimireddy et al., 2021; Li et al., 2020; Acar et al., 2021 ), heterogeneous resources (Alistarh et al., 2017; Ivkin et al., 2019; Li et al., 2020; Konečnỳ et al., 2016), and privacy (Truex et al., 2019; Mo et al., 2021; Gong et al., 2021) . These helped FL adoption, particularly in challenging training conditions. However, the success of FL adoption depends not only on tackling challenges that occur in training but also post-training (deployment). Indeed, deploying ML models for on-device inference is exceedingly challenging (Wu et al., 2019b; Reddi et al., 2021) . Yet, most of the existing training techniques in FL do not take these deployment challenges into consideration. In this paper, we focus on developing FL

