AQUILA: COMMUNICATION EFFICIENT FEDERATED LEARNING WITH ADAPTIVE QUANTIZATION OF LAZILY-AGGREGATED GRADIENTS

Abstract

The development and deployment of federated learning (FL) have been bottlenecked by heavy communication overheads of high-dimensional models between the distributed device nodes and the central server. To achieve better errorcommunication trade-offs, recent efforts have been made to either adaptively reduce the communication frequency by skipping unimportant updates, e.g., lazy aggregation, or adjust the quantization bits for each communication. In this paper, we propose a unifying communication efficient framework for FL based on adaptive quantization of lazily-aggregated gradients (AQUILA), which adaptively balances two mutually-dependent factors, the communication frequency, and the quantization level. Specifically, we start with a careful investigation of the classical lazy aggregation scheme and formulate AQUILA as an optimization problem where the optimal quantization level is selected by minimizing the model deviation caused by update skipping. Furthermore, we devise a new lazy aggregation strategy to better fit the novel quantization criterion and retain the communication frequency at an appropriate level. The effectiveness and convergence of the proposed AQUILA framework are theoretically verified. The experimental results demonstrate that AQUILA can reduce around 60% of overall transmitted bits compared to existing methods while achieving identical model performance in a number of non-homogeneous FL scenarios, including Non-IID data and heterogeneous model architecture.

1. INTRODUCTION

With the deployment of ubiquitous sensing and computing devices, the Internet of things (IoT), as well as many other distributed systems, have gradually grown from concept to reality, bringing dramatic convenience to people's daily life (Du et al., 2020; Liu et al., 2020; Hard et al., 2018) . To fully utilize such distributed computing resources, distributed learning provides a promising framework that can achieve comparable performance with the traditional centralized learning scheme. However, the privacy and security of sensitive data during the updating and transmission processes in distributed learning have been a growing concern. In this context, federated learning (FL) (McMahan et al., 2017) has been developed, allowing distributed devices to collaboratively learn a global model without privacy leakage by keeping private data isolated and masking transmitted information with secure approaches. On account of its privacy-preserving property and great potentiality in some distributed but privacy-sensitive fields such as finance and health, FL has attracted tremendous attention from both academia and industry in recent years. Unfortunately, in many FL applications, such as image classification and objective recognition, the trained model tends to be high-dimensional, resulting in significant communication costs. Hence, communication efficiency has become one of the key bottlenecks of FL. To this end, Sun et al. (2020) proposes the lazily-aggregated quantization (LAQ) method to skip unnecessary parameter uploads by estimating the value of gradient innovation -the difference between the current unquantized gradient and the previously quantized gradient. Moreover, Mao et al. (2021) devises an adaptive quantized gradient (AQG) strategy based on LAQ to dynamically select the quantization level within some artificially given numbers during the training process. Nevertheless, the AQG is still not sufficiently adaptive because the pre-determined quantization levels are difficult to choose in complicated FL 1

