GENERALIZATION BOUNDS FOR FEDERATED LEARN-ING: FAST RATES, UNPARTICIPATING CLIENTS AND UNBOUNDED LOSSES

Abstract

In federated learning, the underlying data distributions may be different across clients. This paper provides a theoretical analysis of generalization error of federated learning, which captures both heterogeneity and relatedness of the distributions. In particular, we assume that the heterogeneous distributions are sampled from a meta-distribution. In this two-level distribution framework, we characterize the generalization error not only for clients participating in the training but also for unparticipating clients. We first show that the generalization error for unparticipating clients can be bounded by participating generalization error and participating gap caused by clients' sampling. We further establish fast learning bounds of order O( 1mn + 1 m ) for unparticipating clients, where m is the number of clients and n is the sample size at each client. To our knowledge, the obtained fast bounds are state-of-the-art in the two-level distribution framework. Moreover, previous theoretical results mostly require the loss function to be bounded. We derive convergence bounds of order O( 1 √ mn + 1 √ m ) under unbounded assumptions, including sub-exponential and sub-Weibull losses.

1. INTRODUCTION

In federated learning, a common model is trained based on the collaboration of the participating clients holding local data samples (McMahan et al., 2017) . Typically, the underlying distributions vary across clients since the data-generating processes are affected by the local environment. Federated learning is heterogeneous in the scenario where local distributions are different (Wang et al., 2021) . Most existing experimental and theoretical results focus on the convergence of optimization on training datasets (Li et al., 2020b; Karimireddy et al., 2020; Mitra et al., 2021; Mishchenko et al., 2022; Yun et al., 2022) . The generalization error, which is more natural and important in machine leanring, seems not to have been carefully examined in heterogeneous federated learning. As a key performance indicator of the machine learning model, generalization error measures the performance of a trained model by its population risk with the corresponding distribution. However, existing generalization results are generally derived for clients participating in the training, which only captures the performance of the learned model on seen distributions during training (Mohri et al., 2019; Chen et al., 2021; Masiha et al., 2021) . In practice, the probability that a client participates in the federated training is affected by many factors such as the reliability of network connections or the availability of the client. The realistic participation ratio may be slow and a variety of clients never have a chance to participate during the training process (Kairouz et al., 2021; Li et al., 2020a; Yuan et al., 2021) . Though the training process is operated only on participating clients, the trained model will be used by both unparticipating and participating clients. Since the data distributions of unparticipating clients are different from that of participating clients, it is natural and emergent to ask the following question:

Would the unparticipating clients benefit from the model trained by participating clients?

To answer this question theoretically, we take the participation gap into account in the analysis of generalization error, which is generally ignored by existing works. In addition to the ignored participating gap, existing theoretical results on the generalization error of heterogeneous federated learning have two more limitations to our knowledge. First, all previous learning rates in probability form are of the order O( 1 √ mn ), where m is the number of clients and n is the sample size at each client (Mohri et al., 2019) . We note that faster rates of order O( 1 mn ) are derived in (Chen et al., 2021) . However, their learning rates are in expectation form. Faster learning rates in probability form haven't been derived even only for participating clients. The guarantees in-expectation form reflect the average performance of the model trained based on the randomly sampled datasets. The theoretical bounds in probability form, which we focus on in this paper, reflect the performance of a single sampling on datasets (Klochkov & Zhivotovskiy, 2021; Kanade et al., 2022; Sefidgaran et al., 2022a) . Second, most previous generalization bounds are derived by assuming that the loss function is bounded. However, there are a variety of learning problems that do not satisfy this assumption. This includes regression problems where unbounded noise is added to labels (Kuchibhotla & Patra, 2022; Kuchibhotla & Chakrabortty, 2018; Zhang & Zhou, 2018) , clustering tasks with heavy-tailed distribution (Paul et al., 2021; Vellal et al., 2022) , domain adaptation, and so on. Notable exception works in this direction include (Barnes et al., 2022) and (Sefidgaran et al., 2022b) . However, their results are established under the assumption that local clients are homogeneous, which is highly restrictive in the general federated scenario. In this paper, we assume that data distributions of participating and unparticipating clients are drawn from a meta-distribution P . We argue that this assumption is reasonable in practice. For instance, in cross-device federated learning, the number of total clients is generally large and it is natural to assume that there exists a meta-distribution (Reisizadeh et al., 2020; Wang et al., 2021) . In this learning scenario, we assume that the total number of clients is M . Among all these M clients, only m clients have a chance to participate in the training phase, which means that the training process only involves the m distributions {D i } m i=1 . Note that the total number M and the number of unparticipating clients/distributions is generally larger than m (Hu et al., 2022; Xu & Wang, 2020; Yang et al., 2020) . Practically, the model is trained based on datasets {S i } m i=1 , where S i is the dataset located in client i and is sampled from D i . This two-level framework not only captures the heterogeneity of clients' distributions but also reflects the relatedness of the distributions. Thanks to this framework, we are allowed to characterize the generalization performance of both participating distributions and unparticipating distributions. A similar framework has been used by recent literature (Yuan et al., 2021; Reisizadeh et al., 2020; Wang et al., 2021) . However, these works mainly focus on the optimization performance or only involve experimental results on the generalization. The objective of this work is to provide theoretical results on generalization error in this framework. Our contributions are summarized as follows. • We provide a systematic analysis of the generalization error of federated learning in the two-level framework, which captures the missed participating gap in the existing works. This two-level framework captures both heterogeneity and relatedness of clients' distributions. Moreover, all learning bounds presented in this paper are in probability form instead of expectation form. • We derive fast learning rates in the empirical risk minimization setting. The unparticipating error is bounded by two terms. One is participating error. The other is the participation gap results from missing clients in the training. Our participating bounds and unparticipating bounds are of order O( 1 mn ) and O( 1 mn + 1 m ), respectively. • We study the learning bounds for unbounded loss functions, including sub-gaussian, subexponential, and heavy-tailed losses. Small-ball methods and concentration inequalities for unbounded random variables are used in the unbounded setting. Our bounds are comparable with the existing results with bounded assumptions. The rest of the paper is organized as follows. In Section 2, we describe the two-level distribution framework and provide basic theoretical results in this framework. In Section 3, we derive fast generalization bounds. In Section 4, we go beyond the bounded assumption and provide the generalization bounds for unbounded losses such as sub-exponential and sub-Weibull losses. In Section 5, we discuss related work on the generalization analysis of heterogeneous federated learning. Finally, we conclude this paper in Section 6. All proofs are postponed to the appendix.

