ENHANCE LOCAL CONSISTENCY FOR FREE: A MULTI-STEP INERTIAL MOMENTUM APPROACH

Abstract

Federated learning (FL), as a collaborative distributed training paradigm with several edge computing devices under the coordination of a centralized server, is plagued by inconsistent local stationary points due to the heterogeneity of the local partial participation clients, which precipitates the local client-drifts problems and sparks off the unstable and slow convergence, especially on the aggravated heterogeneous dataset. To address these issues, we propose a novel federated learning algorithm, named FedMIM, which adopts the multi-step inertial momentum on the edge devices and enhances the local consistency for free during the training to improve the robustness of the heterogeneity. Specifically, we incorporate the weighted global gradient estimations as the inertial correction terms to guide both the local iterates and stochastic gradient estimation, which can reckon the global objective optimization on the edges' heterogeneous dataset naturally and maintain the demanding consistent iteration locally. Theoretically, we show that FedMIM achieves the O( 1 √ SKT ) convergence rate with a linear speedup property with respect to the number of selected clients S and proper local interval K in each communication round under the nonconvex setting. Empirically, we conduct comprehensive experiments on various real-world datasets and demonstrate the efficacy of the proposed FedMIM against several state-of-the-art baselines.

1. INTRODUCTION

Federated Learning (FL) is an increasingly important distributed learning framework where the distributed data is utilized over a large number of clients, such as mobile phones, wearable devices or network sensors (Kairouz et al., 2021) . In the contrast to traditional machine learning paradigms, FL places a centralized server to coordinate the participating clients to train a model, without collecting the client data, thereby achieving a basic level of data privacy and security (Li et al., 2020a) . The common pipelines to achieve this goal includes three steps (Bonawitz et al., 2019) : i) The server broadcasts the current model to clients at the beginning of each communication iteration; ii) The clients synchronize the local models and update the local model based on their own data; iii) The server averages the latest local models and repeats these procedures until convergence. Despite the empirical success of the past work, there are still some key challenges for FL: expensive communication, privacy concern and statistical diversity. The first two problems are well fixed in past work (Konečnỳ et al., 2016; Sattler et al., 2019; Hamer et al., 2020; Truex et al., 2019; Xu et al., 2019) although the last one is still the main challenge that need to be deal with. Due to statistical diversity among clients within FL system, client drift (Karimireddy et al., 2020a) leads to slow and unstable convergence within model training. In the case of heterogeneous data, each client's optimum is not well aligned with the global optimum. The conventional FL algorithm does not consider this data heterogeneity problem and simply applies the stochastic gradient descent algorithm to the local update. As a consequence, the final converged solution of clients may differ from the stationary point of the global objective function since the average of client updates move towards the average of clients' optimums rather than the true optimum. As the distribution drift exists over the client's dataset, the model may overfit the local training data by applying empirical risk minimization and it has been reported that the generalization performance on clients' local data may exacerbate when clients have different distributions between training and testing dataset (Liang et al., 2020) . In order to overcome these problems, several solutions have been put forward in recent years. Generally, 1

