FEDDEBIAS: REDUCING THE LOCAL LEARNING BIAS IMPROVES FEDERATED LEARNING ON HETEROGE-NEOUS DATA

Abstract

Federated Learning (FL) is a machine learning paradigm that learns from data kept locally to safeguard the privacy of clients, whereas local SGD is typically employed on the clients' devices to improve communication efficiency. However, such a scheme is currently constrained by the slow and unstable convergence induced by clients' heterogeneous data. In this work, we identify three under-explored phenomena of the biased local learning that may explain these challenges caused by local updates in supervised FL. As a remedy, we propose FedDebias, a novel unified algorithm that reduces the local learning bias on features and classifiers to tackle these challenges. FedDebias consists of two components: The first component alleviates the bias in the local classifiers by balancing the output distribution of models. The second component learns client invariant features that are close to global features but considerably distinct from those learned from other input distributions. In a series of experiments, we show that FedDebias consistently outperforms other SOTA FL and domain generalization (DG) baselines, in which both two components have individual performance gains.

1. INTRODUCTION

Federated Learning (FL) is an emerging privacy-preserving distributed machine learning paradigm. The model is transmitted to the clients by the server, and when the clients have completed local training, the parameter updates are sent back to the server for integration. Clients are not required to provide local raw data during this procedure, maintaining their privacy. As the workhorse algorithm in FL, FedAvg (McMahan et al., 2016) proposes local SGD to improve communication efficiency. However, the considerable heterogeneity between local client datasets leads to inconsistent local updates and hinders convergence. Several studies propose variance reduction methods (Karimireddy et al., 2019; Das et al., 2020) , or suggest regularizing local updates towards global models (Li et al., 2018b; 2021) to tackle this issue. Almost all these existing works directly regularize models by utilizing the global model collected from previous rounds to reduce the variance or minimize the distance between global and local models (Li et al., 2018b; 2021) . However, it is hard to balance the trade-offs between optimization and regularization to perform well, and data heterogeneity remains an open question in the community, as justified by the limited performance gain, e.g. in our Table 1 . To this end, we begin by revisiting and reinterpreting the issues caused by data heterogeneity and local updates. We identify three pitfalls of FL, termed local learning bias, from the perspective of representation learningfoot_0 : 1) Biased local classifiers are unable to effectively classify unseen data (in Figure 1 As a remedy, we propose FedDebias, a unified method that leverages a globally shared pseudo-data and two key algorithmic components to simultaneously address the three difficulties outlined above. The first component of FedDebias alleviates the first difficulty by forcing the output distribution of 

Contributions

• We propose FedDebias, a unified algorithm that leverages pseudo-data to reduce the learning bias on local features and classifiers. We design two orthogonal key components of FedDebias to complement each other to improve the learning quality of clients with heterogeneous data. • FedDebias considerably outperforms other FL and domain generalization (DG) baselines, as justified by extensive numerical evaluation. 



Please refer to section 3 for more justification about the existence of our observations.



(a)), due to the shifted decision boundaries dominated by local class distributions; 2) Local features (extracted by a local model) differ significantly from global features (similarly extracted by a centralized global model), even for the same input data. (c.f. Figure 1(b)); and 3) Local features, even for data from different classes, are close to each other and cannot be accurately distinguished (c.f. Figure 1(b)).

feature local feature extractor of client 1 global feature extractor (b) Bias in local feature.

Figure 1: Observation for learning bias: three pitfalls of FL on heterogeneous data with local updates. There are two clients in the figure (denoted by two colors), and each has two classes of data (red and blue points).Figure 1(a): Client 1's decision boundary cannot accurately classify data samples from client 2. Figure 1(b):The difference between features extracted by client 1's local feature extractor and global feature extractor is sustainable large. However, client 2's local feature is close enough to client 1's, even for input data from different data distributions/clients.

Figure 1: Observation for learning bias: three pitfalls of FL on heterogeneous data with local updates. There are two clients in the figure (denoted by two colors), and each has two classes of data (red and blue points).Figure 1(a): Client 1's decision boundary cannot accurately classify data samples from client 2. Figure 1(b): The difference between features extracted by client 1's local feature extractor and global feature extractor is sustainable large. However, client 2's local feature is close enough to client 1's, even for input data from different data distributions/clients.the pseudo-data to be close to the global prior distribution. The second component of FedDebias is designed for the second and third difficulties. In order to tackle the last two difficulties simultaneously, we develop a min-max contrastive learning method to learn client invariant local features. More precisely, instead of directly minimizing the distance between global and local features, we design a twostage algorithm. The first stage learns a projection space-an operation that can maximize the difference between global and local features but minimize local features of different inputs-to distinguish the features of two types. The second stage then debiases the features by leveraging the trained projection space to enforce learned features that are farther from local features and closer to global features. We examine the performance of FedDebias and compare it with other FL and domain generalization baselines on RotatedMNIST, CIFAR10, and CIFAR100. Numerical results show that FedDebias consistently outperforms other algorithms by a large margin on mean accuracy and convergence speed. Furthermore, both components have individual performance gains, and the combined approach FedDebias yields the best results.

Federated Learning (FL). As the de facto FL algorithm,McMahan et al. (2016);Lin et al. (2020b)   propose to use local SGD steps to alleviate the communication bottleneck. However, the objective inconsistency caused by the local data heterogeneity considerably hinders the convergence of FL algorithms(Li et al., 2018b; Wang et al., 2020; Karimireddy et al., 2019; 2020; Guo et al., 2021). To address the issue of heterogeneity in FL, a series of projects has been proposed. FedProx (Li et al., 2018b) incorporates a proximal term into local objective functions to reduce the gap between the local and global models. SCAFFOLD (Karimireddy et al., 2019) adopts the variance reduction method on local updates, and Mime (Karimireddy et al., 2020) increases convergence speed by adding global momentum to global updates. Lin et al., 2020a; Li & Wang, 2019), which is used to transfer knowledge from local models (teachers) to global models (students). Considering the impractical of sharing the global datasets in FL settings, some recent research use proxy datasets with augmentation techniques. Astraea (Duan et al., 2019) uses local augmentation to create a globally balanced distribution. XorMixFL (Shin et al., 2020) encodes a couple of local data and decodes it on the server using the XOR operator. FedMix (Yoon et al., 2021b) creates the privacy-protected augmentation data by averaging local

