HYBRID FEDERATED LEARNING FOR FEATURE & SAMPLE HETEROGENEITY: ALGORITHMS AND IMPLE-MENTATION

Abstract

Federated learning (FL) is a popular distributed machine learning paradigm dealing with distributed and private data sets. Based on the data partition pattern, FL is often categorized into horizontal, vertical, and hybrid settings. All three settings have many applications, but the hybrid FL remains relatively less explored, because it deals with the challenging situation where both the feature space and the data samples are heterogeneous. This work designs a novel mathematical model that effectively allows the clients to aggregate distributed data with heterogeneous, and possibly overlapping features and samples. Our main idea is to partition each client's model into a feature extractor part and a classifier part, where the former can be used to process the input data, while the latter is used to perform the learning from the extracted features. The heterogeneous feature aggregation is done through building a server model, which assimilates local classifiers and feature extractors through a carefully designed matching mechanism. A communicationefficient algorithm is then designed to train both the client and server models. Finally, we conducted numerical experiments on multiple image classification data sets to validate the performance of the proposed algorithm. To our knowledge, this is the first formulation and algorithm developed for hybrid FL.

1. INTRODUCTION

Federated Learning (FL) is an emerging distributed machine learning (ML) framework which enables heterogeneous clients -such as organizations or mobile devices -to collaboratively train ML models (Konečnỳ et al., 2016; Yang et al., 2019) . The development of FL aims to address practical challenges in distributed learning, such as feature and data heterogeneity, high communication cost, and data privacy requirement. 



The challenge due to heterogeneous data is particularly evident in FL. The most well-known form of heterogeneous data is sample heterogeneity (SH), where the distributions of training samples are different across the clients (Kairouz et al., 2021; Bonawitz et al., 2019). Severe SH can cause common FL algorithms such as FedAvg to diverge (Khaled et al., 2019; Karimireddy et al., 2020b). Recently, better-performing algorithms and system architectures for distributed ML (including FL) under SH include Karimireddy et al. (2020b); Li et al. (2018); Wang et al. (2020); Fallah et al. (2020); Vahidian et al. (2021).

Figure 1: The heterogeneous data distribution in a medical diagnosis example. Besides SH, another form of heterogeneity is feature heterogeneity (FH). Traditionally, we say the samples are FH if we can partition them into subsets that bear distinct features. In the FL setting, when the sample subsets of different clients have different, but not necessarily distinct, features, we call it FH. That is, under FH, different clients have unique and possibly also common features. FH and SH arise in ML tasks such as collaborative medical diagnosis (Ng et al., 2021), recommendation system (Yang et al., 2020), and graph learning (Zhang et al., 2021), where the data collected by different clients have different, and possibly overlapping features and sample IDs. Next, we provide a few examples.

