CONVERGENCE ANALYSIS OF SPLIT LEARNING ON NON-IID DATA Anonymous authors Paper under double-blind review

Abstract

Split Learning (SL) is one promising variant of Federated Learning (FL), where the AI model is split and trained at the clients and the server collaboratively. By offloading the computation-intensive portions to the server, SL enables efficient model training on resource-constrained clients. Despite its booming applications, SL still lacks rigorous convergence analysis on non-IID data, which is critical for hyperparameter selection. In this paper, we first prove that SL exhibits an O(1/ √ T ) convergence rate for non-convex objectives on non-IID data, where T is the number of total iterations. The derived convergence results can facilitate understanding the effect of some crucial factors in SL (e.g., data heterogeneity and local update steps). Comparing with the convergence result of FL, we show that the guarantee of SL is worse than FL in terms of training rounds on non-IID data. The experimental results verify our theory. Some generalized conclusions on the comparison between FL and SL in cross-device settings are also reported.

1. INTRODUCTION

Federating Learning (FL) is a popular distributed learning paradigm where multiple clients collaborate to train a global model under the orchestration of one central server. There are two settings in Federating Learning (FL) (McMahan et al., 2017) including (i) cross-silo where clients are organizations and the client number is typically less than 100 and (ii) cross-device where clients are Iot devices and the client number can be up to 10 10 (Kairouz et al., 2021). To alleviate the computation bottleneck at resource-constrained IoT devices in the cross-device scenario, Split Learning (SL) (Gupta & Raskar, 2018; Vepakomma et al., 2018) splits the AI model to be trained at the clients and server separately. The computation-intensive portions are typically offloaded to the server, which is critical for the model training at resource-constrained devices. SL is regarded as one of the enabling technologies for edge intelligence in future networks (Zhou et al., 2019) . 



Figure 1: Illustration of the model updates of FL and SL for 2 clients and 2 local update steps during one round.The comparisons of FL and SL are of practical interest for the design and deployment of intelligent networks. Existing studies focus on various aspects for their comparisons(Thapa  et al., 2020; Gao et al., 2020; 2021), e.g., in terms of learning performance(Gupta & Raskar, 2018), computation efficiency(Vepakomma et al., 2018), communication overhead  (Singh et al., 2019), and privacy issues(Thapa et al., 2021). For example, with the emphasis on the learning performance comparison,Gao et al. (2020; 2021)  find that SL exhibits (i) faster convergence speed than FL under IID data in terms of communication rounds; (ii) better learning performance under imbalanced data; (iii) worse learning performance under (extreme) non-IID data, etc. The difference arises from the distinct process of model updates of FL and SL. In particular, FL takes the average of the local model parameters at the end of each round; SL only trains the clients in sequence and does not average the client updates. Figure1plots the client drift(Karimireddy et al., 2020; Wang et al., 2020; Li et al., 2022)  of FL and SL under both IID and non-IID data to visualize the update process. Under the IID setting, SL approach the global optima x * faster than FL given the sequential training mechanism. In contrast, under the non-IID setting, SL may deviate from the global optima for the same reason.

