IMPROVING MODEL CONSISTENCY OF DECENTRAL-IZED FEDERATED LEARNING VIA SHARPNESS AWARE MINIMIZATION AND MULTIPLE GOSSIP APPROACHES

Abstract

To mitigate the privacy leakages and reduce the communication burden of Federated Learning (FL), decentralized FL (DFL) discards the central server and each client only communicates with its neighbors in the decentralized communication network. However, existing DFL algorithms tend to feature high inconsistency among local models, which results in severe distribution shifts across clients and inferior performance compared with centralized FL (CFL), especially on heterogeneous data or with sparse connectivity of communication topology. To alleviate this challenge, we propose two DFL algorithms named DFedSAM and DFedSAM-MGS to improve the performance. Specifically, DFedSAM leverages gradient perturbation to generate local flatness models via Sharpness Aware Minimization (SAM), which searches for model parameters with uniformly low loss function values. In addition, DFedSAM-MGS further boosts DFedSAM by adopting the technique of Multiple Gossip Steps (MGS) for a better model consistency, which accelerates the aggregation of local flatness models and better balances the communication complexity and learning performance. In the theoretical perspective, we present the improved convergence rates in the stochastic non-convex setting for DFedSAM and DFedSAM-MGS, respectively, where 1-λ is the spectral gap of the gossip matrix W and Q is the gossip steps in MGS. Meanwhile, we empirically confirm that our methods can achieve competitive performance compared with CFL baselines and outperform existing DFL baselines.



Figure 1: Illustrations of the types of communication CFL (a) and DFL (b) framework. For decentralized setting, the various communication network topologies are illustrated in Appendix A. Federated learning (FL) (Mcmahan et al., 2017; Li et al., 2020b) allows distributed clients to collaboratively train a shared model under the orchestration of the cloud without transmitting local data. However, almost all FL paradigms employ a central server to communicate with clients, which faces several critical challenges, such as computational resources limitation, high communication bandwidth cost, and privacy leakage (Kairouz et al., 2021). Compared to the centralized FL (CFL) framework, decentralized FL (DFL, see Figure 1), in which the clients only communicate with their neighbors without a central server, offers communication advantage and further preserves the data privacy (Kairouz et al., 2021; Wang et al., 2021). However, DFL suffers from bottlenecks such as severe inconsistency of local models due to heterogeneous data and model aggregation locality caused by the network connectivity of communication topology. This inconsistency results in severe over-fitting in local models and model performance degradation. Therefore, the global/consensus model may bring inferior performance compared with CFL, especially on heterogeneous data or in face of the sparse connectivity of communication net-

