LEARNING TO AGGREGATE: A PARAMETERIZED AG-GREGATOR TO DEBIAS MODEL AGGREGATION FOR CROSS-DEVICE FEDERATED LEARNING Anonymous

Abstract

Federated learning (FL) collaboratively trains deep models on decentralized clients with privacy constraint. The aggregation of client parameters within a communication round suffers from the "client drift" due to the heterogeneity of client data distributions, resulting in unstable and slow convergence. Recent works typically impose regularization based on the client parameters to reduce the local aggregation heterogeneity for optimization. However, we argue that they generally neglect the inter-communication heterogeneity of data distributions ("period drift"), leading to deviations of intra-communication optimization from the global objective. In this work, we aim to calibrate the local aggregation under "client drift" and simultaneously approach the global objective under "period drift". To achieve this goal, we propose a learning-based aggregation strategy, named FEDPA, that employs a Parameterized Aggregator rather than non-adaptive techniques (e.g., federated average). We frame FEDPA within a meta-learning setting, where aggregator serves as the meta-learner and the meta-task is to aggregate the client parameters to generalize well on a proxy dataset. Intuitively, the metalearner is task-specific and could thereby acquire the meta-knowledge, i.e., calibrating the parameter aggregation from a global view and approaching the global optimum for generalization.

1. INTRODUCTION

Federated Learning (FL) McMahan et al. (2017) has been an emerging privacy-preserving machine learning paradigm to collaboratively train a shared model on a decentralized manner without sharing private data. In FL, clients independently train the shared model over their private data, and the server aggregates the uploaded model parameters periodically until convergence. In 2021). To cope with it, they typically impose regularization in local optimization at each communication round such that the intra-round heterogeneity can be reduced. However, we argue that existing methods generally neglect the heterogeneity among different communication rounds, and the round-specific regularization would inevitably fall into a local optimum. Specifically, in cross-device FL, the sampled clients to be aggregated might involve different data distributions among different communication rounds. As such, the optimization direction estimated in a single round might deviate from that estimated with all clients, eventually amplifying the the aggregation biasfoot_0 , and resulting in bad convergence even oscillation. For simplicity, we term this challenge as "period drift", and provide empirical evidence in real-wolrd datasets (c.f. Figure 1 ).



In ecological studies, aggregation bias is the expected difference between effects for the group and effects for the individual.



FL Kairouz et al. (2021), a key challenge hindering effective model aggregation lies in the heterogeneous data of clients Zhao et al. (2018), especially in cross-device (as opposed to cross-silo) FL with a large amount of clients (e.g. mobile devices). Wherein, vanilla FL algorithms, such as federated averaging (FEDAVG) McMahan et al. (2017), based on averaging the parameters of candidate clients, would suffer from bad convergence and performance degradation. Existing works Hsu et al. (2019); Li et al. (2020); Karimireddy et al. (2021) depict the non-iid trap as weight divergence Zhao et al. (2018) or client drift Karimireddy et al. (

