CLIENT SELECTION IN FEDERATED LEARNING: CONVERGENCE ANALYSIS AND POWER-OF-CHOICE SELECTION STRATEGIES Anonymous authors Paper under double-blind review

Abstract

Federated learning is a distributed optimization paradigm that enables a large number of resource-limited client nodes to cooperatively train a model without data sharing. Several works have analyzed the convergence of federated learning by accounting of data heterogeneity, communication and computation limitations, and partial client participation. However, they assume unbiased client participation, where clients are selected at random or in proportion of their data sizes. In this paper, we present the first convergence analysis of federated optimization for biased client selection strategies, and quantify how the selection skew affects convergence speed. We reveal that biasing client selection towards clients with higher local loss achieves faster error convergence. Using this insight, we propose POWER-OF-CHOICE, a communication-and computation-efficient client selection framework that can flexibly span the trade-off between convergence speed and solution bias. We also propose an extension of POWER-OF-CHOICE that is able to maintain convergence speed improvement while diminishing the selection skew. Our experiments demonstrate that POWER-OF-CHOICE strategies can converge up to 3× faster and give 10% higher test accuracy than the baseline random selection.

1. INTRODUCTION

Until recently, machine learning models were largely trained in the data center setting (Dean et al., 2012) using powerful computing nodes, fast inter-node communication links, and large centrally available training datasets. The future of machine learning lies in moving both data collection as well as model training to the edge. The emerging paradigm of federated learning (McMahan et al., 2017; Kairouz et al., 2019; Bonawitz et al., 2019) considers a large number of resource-constrained mobile devices that collect training data from their environment. Due to limited communication capabilities and privacy concerns, these data cannot be directly sent over to the cloud. Instead, the nodes locally perform a few iterations of training using local-update stochastic gradient descent (SGD) (Yu et al., 2018; Stich, 2018; Wang & Joshi, 2018; 2019) , and only send model updates periodically to the aggregating cloud server. Besides communication limitations, the key scalability challenge faced by the federated learning framework is that the client nodes can have highly heterogeneous local datasets and computation speeds. The effect of data heterogeneity on the convergence of local-update SGD is analyzed in several recent works (Reddi et al., 2020; Haddadpour & Mahdavi, 2019; Khaled et al., 2020; Stich & Karimireddy, 2019; Woodworth et al., 2020; Koloskova et al., 2020; Huo et al., 2020; Zhang et al., 2020; Pathak & Wainwright, 2020; Malinovsky et al., 2020; Sahu et al., 2019) and methods to overcome the adverse effects of data and computational heterogeneity are proposed in (Sahu et al., 2019; Wang et al., 2020; Karimireddy et al., 2019) , among others. Partial Client Participation. Most of the recent works described above assume full client participation, that is, all nodes participate in every training round. In practice, only a small fraction of client nodes participate in each training round, which can exacerbate the adverse effects of data heterogeneity. While some existing convergence guarantees for full client participation and methods to tackle heterogeneity can be generalized to partial client participation (Li et al., 2020) , these generalizations are limited to unbiased client participation, where each client's contribution to the expected global objective optimized in each round is proportional to its dataset size. In Ruan et al. (2020) , the authors analyze the convergence with flexible device participation, where devices can freely join or leave the

