GENERALIZATION BOUNDS FOR FEDERATED LEARN-ING: FAST RATES, UNPARTICIPATING CLIENTS AND UNBOUNDED LOSSES

Abstract

In federated learning, the underlying data distributions may be different across clients. This paper provides a theoretical analysis of generalization error of federated learning, which captures both heterogeneity and relatedness of the distributions. In particular, we assume that the heterogeneous distributions are sampled from a meta-distribution. In this two-level distribution framework, we characterize the generalization error not only for clients participating in the training but also for unparticipating clients. We first show that the generalization error for unparticipating clients can be bounded by participating generalization error and participating gap caused by clients' sampling. We further establish fast learning bounds of order O( 1mn + 1 m ) for unparticipating clients, where m is the number of clients and n is the sample size at each client. To our knowledge, the obtained fast bounds are state-of-the-art in the two-level distribution framework. Moreover, previous theoretical results mostly require the loss function to be bounded. We derive convergence bounds of order O( 1 √ mn + 1 √ m ) under unbounded assumptions, including sub-exponential and sub-Weibull losses.

1. INTRODUCTION

In federated learning, a common model is trained based on the collaboration of the participating clients holding local data samples (McMahan et al., 2017) . Typically, the underlying distributions vary across clients since the data-generating processes are affected by the local environment. Federated learning is heterogeneous in the scenario where local distributions are different (Wang et al., 2021) . Most existing experimental and theoretical results focus on the convergence of optimization on training datasets (Li et al., 2020b; Karimireddy et al., 2020; Mitra et al., 2021; Mishchenko et al., 2022; Yun et al., 2022) . The generalization error, which is more natural and important in machine leanring, seems not to have been carefully examined in heterogeneous federated learning. As a key performance indicator of the machine learning model, generalization error measures the performance of a trained model by its population risk with the corresponding distribution. However, existing generalization results are generally derived for clients participating in the training, which only captures the performance of the learned model on seen distributions during training (Mohri et al., 2019; Chen et al., 2021; Masiha et al., 2021) . In practice, the probability that a client participates in the federated training is affected by many factors such as the reliability of network connections or the availability of the client. The realistic participation ratio may be slow and a variety of clients never have a chance to participate during the training process (Kairouz et al., 2021; Li et al., 2020a; Yuan et al., 2021) . Though the training process is operated only on participating clients, the trained model will be used by both unparticipating and participating clients. Since the data distributions of unparticipating clients are different from that of participating clients, it is natural and emergent to ask the following question: Would the unparticipating clients benefit from the model trained by participating clients? * Corresponding Author. 1

