MODERATED ASYNCHRONOUS FEDERATED LEARNING ON HETEROGENEOUS MOBILE DEVICES WITH NON-IID DATA

Abstract

Federated learning allows multiple clients to jointly learn an ML model while keeping their data private. While synchronous federated learning (Sync-FL) requires the devices to share local gradients synchronously, to provide better guarantees, it suffers from the problem of stragglers, slowing the entire training process. Conventional techniques completely drop the updates from the stragglers and lose the opportunity to learn from the data the stragglers hold, especially relevant in a non-iid setting. Asynchronous learning (Async-FL) provides a potential solution to allow the clients to function at their own pace, which typically achieves faster convergence. We target the video action recognition problem on edge devices as an exemplar heavyweight task to perform on a realistic edge setup using asynchronous-FL (Async-FL). Our FL system, KUIPER, leverages Async-FL to learn a heavy model on video-action-recognition tasks on a heterogeneous edge testbed with non-IID data. KUIPER introduces a novel aggregation scheme, which solves the straggler problem, while taking into account the different client data in a non-iid setting. Although the proposed aggregation technique is catered majorly for video action recognition, it is task-independent and scalable, and we demonstrate it by showing experiments on other vision and NLP tasks. KUIPER shows a 11% faster convergence compared to Oort [OSDI-21], up to 12% and 9% improvement in test accuracy compared to FedBuff and Oort [OSDI-21] on HMDB51, and 10% and 9% on UCF101.

1. INTRODUCTION

Federated learning McMahan et al. (2017) has gained great popularity in recent times as it allows heterogeneous clients to collaborate and benefit from peer data while keeping their own data private. As a result, the clients learn a better model with collaboration than they would have, individually. The training process is orchestrated by a central server that broadcasts the global model to the clients while the clients run local training on their own data and only share the gradient updates with the server. This has made it possible for clients with limited computational resources to participate in the learning process. However, heterogeneous clients with varying computational capabilities (we use the term "computational capabilities" as a shorthand to include heerogeneity in both computational capabilities on the node as well as the communication capabilities connecting the node to the federation server), if forced to synchronize, direct the process to progress at the speed of the slowest client Li et al. (2020a). For example, in our experimental setup of embedded nodes with mobile GPUs, Jetson Nano is 5× slower than Jetson AGX Xavier; including variation in network speeds adds to this heterogeneity. It becomes crucial to incorporate even slow clients when the data distribution among clients is non-IID, as all clients then have distinctive elements to contribute to the learned model. In this paper, we target a heavyweight learning task, namely, video action recognition, that till date had been considered out of the reach of embedded devices, i.e., mobile GPUs. The straggler problem becomes particularly serious for heavyweight learning tasks on heterogeneous edge devices since the devices are resource constrained relative to the demands of the task and the variance in device capabilities (processing power, memory, storage) is large (5× in our representative setup). Therefore, to deal with stragglers an obvious approach seems to be to use synchronous learning. However, this prevents the global model from learning features specific to the local data of the stragglers, leading to a model that underfits. This problem becomes more acute as the degree of non-IIDness increases;

