EXP-α: BEYOND PROPORTIONAL AGGREGATION IN FEDERATED LEARNING

Abstract

Federated Learning (FL) is a distributed learning paradigm, which computes gradients of a model locally on different clients and aggregates the updates to construct a new model collectively. Typically, the updates from local clients are aggregated with weights proportional to the size of clients' local datasets. In practice, clients have different local datasets suffering from data heterogeneity, such as imbalance. Although proportional aggregation still theoretically converges to the global optimum, it is provably slower when non-IID data is present (under convexity assumptions), the effect of which is exacerbated in practice. We posit that this analysis ignores convergence rate, which is especially important under such settings in the more realistic non-convex real world. To account for this, we analyze a generic and time-varying aggregation strategy to reveal a surprising trade-off between convergence rate and convergence error under convexity assumptions. Inspired by the theory, we propose a new aggregation strategy, Exp-α, which weights clients differently based on their severity of data heterogeneity. It achieves stronger convergence rates at the theoretical cost of a non-vanishing convergence error. Through a series of controlled experiments, we empirically demonstrate the superior convergence behavior (both in terms of rate and, in practice, even error) of the proposed aggregation on three types of data heterogeneity: imbalance, label-flipping, and domain shift when combined with existing FL algorithms. For example, on our imbalance benchmark, Exp-α, combined with FedAvg, achieves a relative 12% increase in convergence rate and a relative 3% reduction in error across four FL communication settings.

1. INTRODUCTION

Federated Learning (FL) (McMahan et al., 2017) is a decentralized approach for learning a model on distributed data to preserve data privacy. Because data reside on clients and are never transmitted to a central server, privacy is preserved. However, data on local clients are often correlated with their demographics and preferences. This makes training data highly non-IID or heterogeneous (Wang et al., 2021; Zhang et al., 2021; Kairouz et al., 2021) , containing label imbalance, noisy labels (e.g. label-flipping), or domain shift. This can significantly impact a model's performance and specifically convergence rates (Zhao et al., 2018; Li et al., 2019) . To tackle the issue of data heterogeneity, the majority of federated learning have focused on improving the local optimization (Zhao et al., 2018; Shoham et al., 2019; Karimireddy et al., 2020; Zhang et al., 2020; Acar et al., 2021) and the global optimization (Hsu et al., 2019; Reddi et al., 2020) objectives in a federated learning pipeline. Few papers have paid attention to the other aspects of federated learning, such as client selection (Cho et al., 2020) and model aggregation (Chen et al., 2020; Wang et al., 2020) . Most existing methods use proportional aggregation (McMahan et al., 2017) , whose aggregation weights are proportional to the size of local dataset. Although proportional aggregation still theoretically converges when non-IID data is present under convexity assumptions, we posit that this analysis ignores convergence rate, which is especially important under such settings in the real world, because proportional aggregation assumes equal importance of all samples. Intuitively, non-IID data makes the equal importance property questionable since imbalanced data can bias predictions towards majority classes, and noise or domain shift can slow down convergence.

