FEDCL: CRITICAL LEARNING PERIODS-AWARE ADAP-TIVE CLIENT SELECTION IN FEDERATED LEARNING

Abstract

Federated learning (FL) is a distributed optimization paradigm that learns from data samples distributed across a number of clients. Adaptive client selection that is cognizant of the training progress of clients has become a major trend to improve FL efficiency but not yet well-understood. Most existing FL methods such as FedAvg and its state-of-the-art variants implicitly assume that all learning phases during the FL training process are equally important. Unfortunately, this assumption has been revealed to be invalid due to recent findings on critical learning (CL) periods, in which small gradient errors may lead to an irrecoverable deficiency on final test accuracy. In this paper, we develop FedCL, a CL periods-aware FL framework to reveal that adaptively augmenting exiting FL methods with CL periods, the resultant performance is significantly improved when the client selection is guided by the discovered CL periods. Experiments based on various machine learning models and datasets validate that the proposed FedCL framework consistently achieves an improved model accuracy while maintains comparable or even better communication efficiency as compared to state-of-the-art methods, demonstrating a promising and easily adopted method for tackling the heterogeneity of FL training.

1. INTRODUCTION

Federated learning (FL) (McMahan et al., 2017) has emerged as an attractive distributed learning paradigm that leverages a large number of clients to collaboratively learn a joint model with decentralized training data under the coordination of a centralized server. In contrast with centralized learning, the FL architecture allows for preserving clients' privacy and reducing the communication burden caused by transmitting data to the server. While there is a rich literature in distributed optimization in the context of machine learning, FL distinguishes itself from traditional distributed optimization in two key challenges: high degrees of system and statistical heterogeneity (Kairouz et al., 2019) . In an attempt to address the heterogeneity and improve the efficiency of FL, various optimization methods have been developed for FL. In particular, the federated averaging algorithm (FedAvg) (McMahan et al., 2017) is the current state-of-the-art method for FL. In each communication round, FedAvg leverages local computation at each client and employs a centralized server to aggregate and update the global model parameter. While FedAvg has demonstrated empirical success in heterogeneous settings, it fails to fully address the underlying challenges associated with heterogeneity. For example, FedAvg randomly selects a subset of clients in each iteration regardless of their statistical heterogeneity, which has been shown to diverge empirically in settings where data samples of each client follow a non-identical and independent distribution (non-IID). A recent trend of improving FL efficiency focuses on adaptive client selection during the FL training process, such as (Ruan et al., 2021; Karimireddy et al., 2020; Li et al., 2020a; Wang et al., 2020c; b; Cho et al., 2020; Wang et al., 2020a; Rothchild et al., 2020; Lai et al., 2021) . However, these studies implicitly assume that all learning phases during the FL training process are equally important. Unfortunately, this assumption has recently been revealed to be invalid due to the existence of critical learning (CL) periods, i.e., the final quality of a deep neural network (DNN) model is determined by the first few training epochs, in which deficits such as low quality or quantity of training data will cause irreversible model degradation. Notably, this phenomenon was revealed in the latest series of works (Achille et al., 2019; Jastrzebski et al., 2019; Golatkar et al., 2019; Jastrzebski et al., 2021) for centralized learning, and in (Yan et al., 2022) for FL settings. Despite their insightful findings, there remains to be a major gap between the observation of CL periods in FL and the goal of more efficient training and improved model accuracy, since existing client selection methods in state-of-the-art FL algorithms are unaware of the existence of CL periods in FL, which were only identified using a computationally expensive metric that emerges after the full training process. In this paper, we close this gap by demonstrating the importance of CL periods awareness for client selection in state-of-the-art FL algorithms. Through a range of carefully designed experiments on different machine learning models and datasets, we observe a consistently improved model accuracy without sacrificing communication efficiency by augmenting state-of-the-art FL algorithms with CL periods. We build upon recent work by (Yan et al., 2022) , who showed that if the training dataset for each client is not recovered to the entire training dataset early enough in the training process, the test accuracy of FL is permanently impaired. We extend this notation to client selection in FL and show that a larger number of clients are only required during these CL periods. As a result, an adaptive and efficient client selection scheme is akin to finding CL periods in the FL training process. These CL periods can be detected in an online manner using a new metric called Federated Gradient Norm (FGN). To the best of our knowledge, this is the first step taken towards exploiting CL periods for adaptive client selection in FL to mitigate heterogeneity. Our main contributions in this paper are summarized as follows: 1. We propose a practical, easy-to-compute Federated Gradient Norm (FGN) metric to identify CL periods in an online manner, fixing a major paradox for connecting CL periods with client selection for the efficient FL training goal. 2. We propose a simple but powerful CL periods-aware FL framework, dubbed as FedCL, that is generic across and orthogonal to different FL methods. In particular, we use FedAvg as our building block since it is the first and the most widely used one. FedCL inspects the changes in FGN to detect CL periods in FL training process, and adaptively determines the number of clients to participate in each FL training round. With extensive empirical evaluation on different machine learning models with different datasets, we show that FedCL consistently achieves up to 11% accuracy improvement while maintaining comparable or even better communication efficiency compared to FedAvg. 3. We show that the CL periods awareness can be easily combined with state-of-the-art FL methods, such as FedProx (Li et al., 2020a), VRL-SGD (Liang et al., 2019) and FedNova (Wang et al., 2020c) . When augmented by FedCL via manipulating the client selection, existing methods achieve up to 11%, 13% and 10% accuracy improvement, respectively, compared to training without the awareness of CL periods.

2. RELATED WORK

Critical Learning (CL) Periods. The presence of CL periods in centralized neural network training was first highlighted in (Achille et al., 2019; Jastrzebski et al., 2019) . Some other works (Golatkar et al., 2019; Jastrzebski et al., 2021; Frankle et al., 2020; Jastrzebski et al., 2020) have also highlighted the importance of early training phase in centralized learning. The existence of CL periods in FL was recently discovered in (Yan et al., 2022) . However, studying CL phenomena hinged on costly information metric (e.g., eigenvalues of the Hessian) that emerges after the full training, limiting their practical benefits. We differ from existing works by developing an easy-to-compute metric to identify CL periods during the training process in an online manner. Federated Learning and Client Selection. The state-of-the-art method for FL is FedAvg, which was first proposed in (McMahan et al., 2017) and has sparked many follow-ups (Stich, 2018; Wang & Joshi, 2021; Yu et al., 2019) with full client participation. In practice, only a small fraction of clients participate in each training round, which exacerbates the effect of data heterogeneity. As a result, solutions with partial client participation and effect of data heterogeneity have been developed and analyzed (Katharopoulos & Fleuret, 2018; Ruan et al., 2021; Karimireddy et al., 2020; Li et al., 2020a; Wang et al., 2020c; Li et al., 2020b; Wang et al., 2020b; Ribero & Vikalo, 2020; Cho et al., 2020; Wang et al., 2020a; Yang et al., 2021; Cho et al., 2022; Reddi et al., 2021; Haddadpour & Mahdavi, 2019; Khaled et al., 2020; Stich & Karimireddy, 2020; Woodworth et al., 2020; Horváth & Richtarik, 2021; Nishio & Yonetani, 2019; Malinovskiy et al., 2020; Pathak & Wainwright, 2020; Goetz et al., 2019; Tang et al., 2022) . For a comprehensive introduction to FL and other algorithmic variants

