ON CONVERGENCE OF FEDERATED AVERAGING LANGEVIN DYNAMICS Anonymous authors Paper under double-blind review

Abstract

We propose a federated averaging Langevin algorithm (FA-LD) for uncertainty quantification and mean predictions with distributed clients. In particular, we generalize beyond normal posterior distributions and consider a general class of models. We develop theoretical guarantees for FA-LD for strongly log-concave distributions with non-i.i.d data and study how the injected noise and the stochasticgradient noise, the heterogeneity of data, and the varying learning rates affect the convergence. Such an analysis sheds light on the optimal choice of local updates to minimize communication cost. Important to our approach is that the communication efficiency does not deteriorate with the injected noise in the Langevin algorithms. In addition, we examine in our FA-LD algorithm both independent and correlated noise used over different clients. We observe there is a trade-off between the pairs among communication, accuracy, and data privacy. As local devices may become inactive in federated networks, we also show convergence results based on different averaging schemes where only partial device updates are available. In such a case, we discover an additional bias that does not decay to zero.

1. INTRODUCTION

Federated learning (FL) allows multiple parties to jointly train a consensus model without sharing user data. Compared to the classical centralized learning regime, federated learning keeps training data on local clients, such as mobile devices or hospitals, where data privacy, security, and access rights are a matter of vital interest. This aggregation of various data resources heeding privacy concerns yields promising potential in areas of internet of things (Chen et al., 2020 ), healthcare (Li et al., 2020d; 2019b ), text data (Huang et al., 2020)) , and fraud detection (Zheng et al., 2020) . A standard formulation of federated learning is a distributed optimization framework that tackles communication costs, client robustness, and data heterogeneity across different clients (Li et al., 2020a) . Central to the formulation is the efficiency of the communication, which directly motivates the communication-efficient federated averaging (FedAvg) (McMahan et al., 2017) . FedAvg introduces a global model to synchronously aggregate multi-step local updates on the available clients and yields distinctive properties in communication. However, FedAvg often stagnates at inferior local modes empirically due to the data heterogeneity across the different clients (Charles & Konečnỳ, 2020; Woodworth et al., 2020) . To tackle this issue, Karimireddy et al. ( 2020); Pathaky & Wainwright (2020) proposed stateful clients to avoid the unstable convergence, which are, however, not scalable with respect to the number of clients in applications with mobile devices (Al-Shedivat et al., 2021) . In addition, the optimization framework often fails to quantify the uncertainty accurately for the parameters of interest, which are crucial for building estimators, hypothesis tests, and credible intervals. Such a problem leads to unreliable statistical inference and casts doubts on the credibility of the prediction tasks or diagnoses in medical applications. To unify optimization and uncertainty quantification in federated learning, we resort to a Bayesian treatment by sampling from a global posterior distribution, where the latter is aggregated by infrequent communications from local posterior distributions. We adopt a popular approach for inferring posterior distributions for large datasets, the stochastic gradient Markov chain Monte Carlo (SG-MCMC) method (Welling & Teh, 2011; Vollmer et al., 2016; Teh et al., 2016; Chen et al., 2014; Ma et al., 2015) , which enjoys theoretical guarantees beyond convex scenarios (Raginsky et al., 2017; Zhang et al., 2017; Mangoubi & Vishnoi, 2018; Ma et al., 2019) . In particular, we examine

