FEDSPEED: LARGER LOCAL INTERVAL, LESS COM-MUNICATION ROUND, AND HIGHER GENERALIZATION ACCURACY

Abstract

Federated learning is an emerging distributed machine learning framework which jointly trains a global model via a large number of local devices with data privacy protections. Its performance suffers from the non-vanishing biases introduced by the local inconsistent optimal and the rugged client-drifts by the local over-fitting. In this paper, we propose a novel and practical method, FedSpeed, to alleviate the negative impacts posed by these problems. Concretely, FedSpeed applies the prox-correction term on the current local updates to efficiently reduce the biases introduced by the prox-term, a necessary regularizer to maintain the strong local consistency. Furthermore, FedSpeed merges the vanilla stochastic gradient with a perturbation computed from an extra gradient ascent step in the neighborhood, thereby alleviating the issue of local over-fitting. Our theoretical analysis indicates that the convergence rate is related to both the communication rounds T and local intervals K with a upper bound O(1/T ) if setting a proper local interval. Moreover, we conduct extensive experiments on the real-world dataset to demonstrate the efficiency of our proposed FedSpeed, which performs significantly faster and achieves the state-of-the-art (SOTA) performance on the general FL experimental settings than several baselines including FedAvg, FedProx, FedCM, FedAdam, SCAFFOLD, FedDyn, FedADMM, etc. 

1. INTRODUCTION

Since McMahan et al. (2017) proposed federated learning (FL) , it has gradually evolved into an efficient paradigm for large-scale distributed training. Different from the traditional deep learning methods, FL allows multi local clients to jointly train a single global model without data sharing. However, FL is far from its maturity, as it still suffers from the considerable performance degradation over the heterogeneously distributed data, a very common setting in the practical application of FL. We recognize the main culprit leading to the performance degradation of FL as local inconsistency and local heterogeneous over-fitting. Specifically, for canonical local-SGD-based FL method, e.g., FedAvg, the non-vanishing biases introduced by the local updates may eventually lead to inconsistent local solution. Then, the rugged client-drifts resulting from the local over-fitting into inconsistent local solutions may make the obtained global model degrading into the average of client's local parameters. The non-vanishing biases have been studied by several previous works Charles & Konečnỳ (2021); * Li Shen is the corresponding author. 1

