LABEL-DISTRIBUTION-AGNOSTIC ENSEMBLE LEARN-ING ON FEDERATED LONG-TAILED DATA

Abstract

Federated Learning (FL) is a distributed machine learning paradigm that enables devices to collaboratively train a shared model. However, the long-tailed distribution in nature deteriorates the performance of the global model, which is difficult to address due to data heterogeneity, e.g., local clients may exhibit diverse imbalanced class distributions. Moreover, existing re-balance strategies generally utilize label distribution as the class prior, which may conflict with the privacy requirement of FL. To this end, we propose a Label-Distribution-Agnostic Ensemble (LDAE) learning framework to integrate heterogeneous data distributions using multiple experts, which targets to optimize a balanced global objective under privacy protection. In particular, we derive a privacy-preserving proxy from the model updates of clients to guide the grouping and updating of multiple experts. Knowledge from clients can be aggregated via implicit interactions among different expert groups. We theoretically and experimentally demonstrate that (1) there is a global objective gap between global and local re-balance strategies 1 and (2) with protecting data privacy, the proxy can be used as an alternative to label distribution for existing class prior based re-balance strategies. Extensive experiments on long-tailed decentralized datasets demonstrate the effectiveness of our method, showing superior performance to state-of-the-art methods.

1. INTRODUCTION

Federated Learning (FL) aims to collaboratively learn from data dominated by a number of remote clients and produce a highly accurate global model on the server with aggregated knowledge. The most important issues in practical FL applications mainly involve data heterogeneity and privacy protection during collaboration of disparate data sources. Such issues are even more significant in the setting of long-tailed data distribution for some real-world scenarios (Cui et al., 2019; Liu et al., 2019) , such as medical applications (Li et al., 2019; Malekzadeh et al., 2021) and autonomous vehicles (Samarakoon et al., 2019; Pokhrel & Choi, 2020) . Under the long-tailed global data distribution, it is extremely challenging to learn an effective global model by leveraging knowledge from local clients. From the local perspective, there can be a large divergence among the imbalanced label distributions of different clients, resulting in the heterogeneous imbalance as shown in Figure 1 (a), i.e., local datasets on different clients may have different imbalance ratios or minority classes. From the global perspective, one should handle the imbalance issue with privacy preservation (Li et al., 2021a) , i.e., the server should not require clients to upload label distributions for re-balance strategies. Several techniques have been proposed to tackle the class imbalance problem in FL, such as loss re-weighting (Wang et al., 2021; Shen et al., 2021 ), client clustering (Duan et al., 2020) and the client selection scheme (Yang et al., 2021) . Most of them focus on datasets with only a few classes (e.g., ten or twenty classes), suffering from significant performance drops on large-scale imbalanced datasets with more classes (Liu et al., 2019; Zhang et al., 2021b) . Simultaneously, existing solutions generally assume that some sensitive information is accessible to the global server, e.g., a balanced ), combined with these algorithms, the global re-balance strategy yields higher recognition accuracy than the local re-balance strategy in the setting of FL. We theoretically demonstrate that the main reason arises from the gap of objective functions between the global and local re-balance strategies in FL, where the former can yield the matched objective of the centralized training. However, obtaining global label distribution requires clients to upload their own label distributions to the server, which may violate the privacy protection principle in FL (Wang et al., 2021; McMahan et al., 2017) . Thus, it is a critical issue to exploit privacy-preserving priors for global re-balance strategy to maintain a balanced global objective function. To overcome the above-mentioned problem, we propose a Label-Distribution-Agnostic Ensemble learning framework (LDAE) to deal with the data heterogeneity and privacy in the long-tailed FL setting. Specifically, we present the proxy information as the class prior of global re-balance strategies rather than label distribution. The proxy information is derived from the model updates uploaded by local clients, which is agnostic about the local label distributions for privacy protection. To alleviate the heterogeneous issue, we propose a multi-expert model architecture to aggregate the knowledge from different client groups, where clients in the same group have similar local data distribution and train a corresponding expert. The heterogeneity could be mitigated through information interaction among different experts trained on different local data distributions. In conclusion, the key contributions of this work are: (1) We experimentally and theoretically explore the effectiveness of existing class prior based rebalance algorithms in FL. It is demonstrated that there is a mismatch of objectives between local and global re-balance strategies, which indicates that the global re-balance performs better than the local one on the imbalanced decentralized data. (2) To address the imbalance issue with privacy protection, we propose a novel FL framework called LDAE to utilize uploaded model updates to cluster clients into different groups, where a multiexpert architecture is used to aggregate the knowledge from different groups with heterogeneous data distribution. Our method is agnostic to the label distribution of the clients. (3) The experimental results on multiple benchmark datasets demonstrate that LDAE can significantly outperform previous state-of-the-art (SOTA) methods under heterogeneous data distribution, simultaneously protecting the data privacy of the clients.



The local re-balance strategy means that each client utilizes re-balance methods based on the local label distribution, while the global re-balance strategy applies re-balance methods using global label distribution as the class-wise prior.



Figure 1: Illustration of the long-tailed FL problem. (a) Different local datasets exhibit diverse imbalanced label distributions, which even differ from the global dataset. (b) For existing class prior based re-balance algorithms, the global re-balance strategy outperforms the local one, motivating us to explore global prior information for the long-tailed FL problem.

