FEDLITE: IMPROVING COMMUNICATION EFFICIENCY IN FEDERATED SPLIT LEARNING

Abstract

In classical federated learning, clients contribute to the overall training by communicating local updates for the underlying model on their private data to a coordinating server. However, updating and communicating the entire model becomes prohibitively expensive when resource-constrained clients collectively aim to train a large machine learning model. Split learning provides a natural solution in such a setting, where only a (small) part of the model is stored and trained on clients while the remaining (large) part of the model only stays at the servers. Unfortunately, the model partitioning employed in split learning significantly increases the communication cost compared to the classical federated learning algorithms. This paper addresses this issue by compressing the additional communication cost associated with split learning via a novel clustering algorithm and a gradient correction technique. An extensive empirical evaluation on standard image and text benchmarks shows that the proposed method can achieve up to 490× communication cost reduction with minimal drop in accuracy, and enables a desirable performance vs. communication trade-off.

1. INTRODUCTION

Federated learning (FL) is an emerging field that collaboratively trains machine learning models on decentralized data (Li et al., 2019; Kairouz et al., 2019; Wang et al., 2021) . One major advantage of FL is that it does not require clients to upload their data which may contain sensitive personal information. Instead, clients separately train local models on their private datasets, and the resulting locally trained model parameters are infrequently synchronized with the help of a coordinating server (McMahan et al., 2017) . While the FL framework helps alleviate data-privacy concerns for distributed training, most of existing FL algorithms critically assume that the clients have enough compute and storage resources to perform local updates on the entire machine learning model. However, this assumption does not necessarily hold in many modern applications. For example, classification problems with an extremely large number of classes (often in millions and billions) commonly arise in the context of recommender systems (Covington et al., 2016) , information retrieval (Agrawal et al., 2013) , and language modeling (Levy & Goldberg, 2014). Here, the classification layer of a neural network itself is large enough that a typical FL client, e.g., a mobile or IoT device, cannot even store and locally update this single layer, let alone the entire neural network. Split learning (SL) is a recently proposed technique (Vepakomma et al., 2018; Thapa et al., 2022) that naturally addresses the above issue of FL. It splits the underlying model between the clients and server such that the first few layers are shared across the clients and the server, while the remaining layers are only stored at the server. The reduction of resource requirement at the clients is particularly pronounced when the last few dense layers constitute a large portion of the entire model. For instance, in a common convolutional neural network (Krizhevsky, 2014), the last two fully connected layers take 95% parameters of the entire model. In this case, if we allocate the last two layers to the server, then the client-side memory usage can be reduced by 20×. Nonetheless, one major limitation of SL is that the underlying model partitioning leads to an increased communication cost for the resulting framework. Specifically, to train the split neural network, the activations and gradients at the layer where the model is split (referred to as cut layer) need to be communicated between the server and clients at each iteration. The additional message size is in proportion to the mini-batch size as well as the activation size. As a result, the communication cost for the model training can become prohibitive 1

