HANDLING COVARIATE SHIFTS IN FEDERATED LEARNING WITH GENERALIZATION GUARANTEES

Abstract

Covariate shift across clients is a major challenge for federated learning (FL). This work studies the generalization properties of FL under intra-client and interclient covariate shifts. To this end, we propose Federated Importance-weighteD Empirical risk Minimization (FIDEM) to optimize a global FL model, along with new variants of density ratio matching methods, aiming to handle covariate shifts. These methods trade off some level of privacy for improving the overall generalization performance. We theoretically show that FIDEM achieves smaller generalization error than classical empirical risk minimization under some certain settings. Experimental results demonstrate the superiority of FIDEM over federated averaging (McMahan et al., 2017) and other baselines, which would open the door to study FL under distribution shifts more systematically.

1. INTRODUCTION

Federated learning (FL) (Li et al., 2020; Kairouz et al., 2021; Wang et al., 2021) is an efficient and powerful paradigm to collaboratively train a shared machine learning model among multiple clients, such as hospitals and cellphones, without sharing local data. Existing FL literature mainly focuses on training a model under the classical empirical risk minimization (ERM) paradigm in learning theory, with implicitly assuming that the training and test data distributions of each client are the same. However, this stylized setup overlooks the specific requirements of each client. Statistical heterogeneity is a major challenge for FL, which has been mainly studied in terms of non-identical data distributions across clients, i.e., inter-client distribution shifts (Li et al., 2020; Kairouz et al., 2021; Wang et al., 2021) . Even for a single client, the distribution shift between training and test data, i.e., intra-client distribution shift, has been a major challenge for decades (Wang & Deng 2018; Kouw & Loog 2019, and references therein) . For instance, scarce disease data for training and test in a local hospital can be different. To adequately address the statistical heterogeneity challenge in FL, we need to handle both intra-client and inter-client distribution shifts under stringent requirements in terms of privacy and communication costs. We focus on the overall generalization performance on multiple clients by considering both intraclient and inter-client distribution shifts. There exist three major challenges to tackle this problem: 1) how to modify the classical ERM to obtain an unbiased estimate of an overall true risk minimizer under intra-client and inter-client distribution shifts; 2) how to develop an efficient density ratio estimation method under stringent privacy requirements of FL; 3) are there theoretical guarantees for the modified ERM under the improved density ratio method in FL? We aim to address the above challenges in our new paradigm for FL. For description simplicity, in our problem setting, we focus on covariate shift, which is the most commonly used and studied in theory and practice in distribution shifts (Sugiyama et al., 2007; Kanamori et al., 2009; Kato & Teshima, 2021; Uehara et al., 2020; Tripuraneni et al., 2021; Zhou & Levine, 2021 ).foot_0 To be specific, for any client k, covariate shift assumes the conditional distribution p tr k (y|x) = p te k (y|x) := p(y|x) remains the same; while marginal distributions p tr k (x) and p te k (x) can be arbitrarily different, which gives rise to intra-client and inter-client covariate shifts. Handling covariate shift is a challenging issue, especially in federated settings (Kairouz et al., 2021) . To this end, motivated by Sugiyama et al. (2007) under the classical covariate shift setting, we propose Federated Importance-weighteD Empirical risk Minimization (FIDEM), that considers covariate shifts across multiple clients in FL. We show that the learned global model under intra/inter-client covariate shifts is still unbiased in terms of minimizing the overall true risk, i.e., FIDEM is consistent in FL. To handle covariate shifts accurately, we propose a histogram-based density ratio matching method (DRM) under both intra/inter-client distribution shifts. Our method unifies well-known DRMs in FL, which has its own interest in the distribution shift community for ratio estimation (Zadrozny, 2004; Huang et al., 2006; Sugiyama et al., 2007; Kanamori et al., 2009; Sugiyama et al., 2012; Zhang et al., 2020; Kato & Teshima, 2021) . To fully eliminate any privacy risks, we introduce another variant of FIDEM, termed as Federated Independent Importance-weighteD Empirical risk Minimization (FIIDEM). It does not require any form of data sharing among clients and preserves the same level of privacy and same communication costs as those of baseline federated averaging (FedAvg) (McMahan et al., 2017) . An overview of FIDEM is shown in Fig. 1 .

1.1. TECHNICAL CHALLENGES AND CONTRIBUTIONS

Learning on multiple clients in FL under covariate shifts via importance-weighted ERM is challenging due to multiple data owners with own learning objectives, multiple potential but unpredictable train/test shift scenarios, privacy, and communication costs (Kairouz et al., 2021) . To be specific, 1) It is non-trivial to control privacy leakage to other clients while estimating ratios and relax the requirement to have perfect estimates of the supremum over true ratios, which is a key step for nonnegative BD (nnBD) DRM. Our work is the first step towards handling inter/inter-client distribution shifts in FL; 2) It is challenging to obtain per-client generalization bounds for a general nnBD DRM with multiple clients and imperfect estimates of the supremum due to intra/inter-client couplings in ratios. Note that, even if we have access to perfect estimates of density ratios, it is still unclear whether importanceweighted ERM results in smaller excess risk compared to classical ERM. Our work gives an initial attempt by providing an affirmative answer for ridge regression; 3) While well-established benchmarks for multi-client FL have been used, they are usually designed in a way that each client's test samples are drawn uniformly from a set of classes. However, we believe this might not be the case in real-world applications and then design realistic experimental settings in our work. To address those technical challenges, we • Algorithmically propose an intuitive framework to minimize average test error in FL, design efficient mechanisms to control privacy leakage while estimating ratios (FIDEM) along with a privacy-preserving and communication-efficient variant (FIIDEM), and improve nnBD DRM under FL without requiring perfect knowledge of the supremum over true ratios. • Theoretically establish generalization guarantees for general nnBD DRM with multiple clients under imperfect estimates of the supremum, which unifies a number of DRMs, and show benefits of importance weighting in terms of excess risk decoupled from density ratio estimation through bias-variance decomposition. • Experimentally demonstrate more than 16% overall test accuracy improvement over existing FL baselines when training ResNet-18 (He et al., 2016) on CIFAR10 (Krizhevsky) in challenging imbalanced federated settings in terms of data distribution shifts across clients. In conclusion, we expand the concept and application scope of FL to a general setting under intra/interclient covariate shifts, provide an in-depth theoretical understanding of learning with FIDEM via a general DRM, and experimentally validate the utility of the proposed framework. We hope that our work opens the door to a new FL paradigm.



Our results can be extended to other typical distribution shifts, e.g., target shift (Azizzadenesheli, 2022). We provide experimental results on target shift in Section 5.



Figure 1: An overview of FIDEM. Marginal train and test distributions of clients are arbitrarily different leading to intra-client and inter-client covariate shifts. To control privacy leakage, the server randomly shuffles unlabelled test samples and broadcasts to the clients.

