HYBRID FEDERATED LEARNING FOR FEATURE & SAMPLE HETEROGENEITY: ALGORITHMS AND IMPLE-MENTATION

Abstract

Federated learning (FL) is a popular distributed machine learning paradigm dealing with distributed and private data sets. Based on the data partition pattern, FL is often categorized into horizontal, vertical, and hybrid settings. All three settings have many applications, but the hybrid FL remains relatively less explored, because it deals with the challenging situation where both the feature space and the data samples are heterogeneous. This work designs a novel mathematical model that effectively allows the clients to aggregate distributed data with heterogeneous, and possibly overlapping features and samples. Our main idea is to partition each client's model into a feature extractor part and a classifier part, where the former can be used to process the input data, while the latter is used to perform the learning from the extracted features. The heterogeneous feature aggregation is done through building a server model, which assimilates local classifiers and feature extractors through a carefully designed matching mechanism. A communicationefficient algorithm is then designed to train both the client and server models. Finally, we conducted numerical experiments on multiple image classification data sets to validate the performance of the proposed algorithm. To our knowledge, this is the first formulation and algorithm developed for hybrid FL.

1. INTRODUCTION

Federated Learning (FL) is an emerging distributed machine learning (ML) framework which enables heterogeneous clients -such as organizations or mobile devices -to collaboratively train ML models (Konečnỳ et al., 2016; Yang et al., 2019) . The development of FL aims to address practical challenges in distributed learning, such as feature and data heterogeneity, high communication cost, and data privacy requirement. The challenge due to heterogeneous data is particularly evident in FL. The most well-known form of heterogeneous data is sample heterogeneity (SH), where the distributions of training samples are different across the clients (Kairouz et al., 2021; Bonawitz et al., 2019) . Severe SH can cause common FL algorithms such as FedAvg to diverge (Khaled et al., 2019; Karimireddy et al., 2020b) . Recently, better-performing algorithms and system architectures for distributed ML (including FL) under SH include Karimireddy et al. Recommendation system (Yang et al., 2020; Zhan et al., 2010) . In this case, the clients are large retailers, and they collect samples (such as shopping records) from their customers. The retailers share a subset of common products and a subset of common customers. A third example pertains to an application of learning over multiple social networks (Zhang et al., 2021; Guo & Wang, 2020) . Here the clients are social network providers (e.g., Twitter, Facebook), and the samples are the set of participating users, their activities and relations. We summarize these three examples in Table . 1. In the previous three applications, client data can be heterogeneous in both feature and sample. Surprisingly, none of the existing FL algorithms can fully handle such data. Rather, Horizontal FL (HFL) and Vertical FL (VFL) methods can handle data with only one heterogeneity, the former with SH and the latter with FH. By keeping only the common features (and ignoring the other features), we can avoid FH and apply an HFL method. By keeping only the common samples (and discarding the remaining samples), we can avoid SH and apply a VFL method. Clearly, they both waste data. Consider the HFL algorithms (Konečnỳ et al., 2016; Karimireddy et al., 2020b; a; Dinh et al., 2021) . The clients perform multiple local model updates, and the server averages those updates and broadcasts the new model to the clients. This scheme works when the clients share the same model and their data share an identical set of features (see Figure 2b for an illustration); otherwise, the server cannot average their models. Consider the Vertical FL (VFL) algorithms (Liu et al., 2019; Chen et al., 2020) . They split the model into blocks. Each client processes a subset of the blocks while the server aggregates the processed features to compute training losses and gradients. They require all the clients to have the same set of samples (see Figure 2c ); otherwise, they cannot compute the loss and its gradient. According to Yang et al. (2019); Rahman et al. (2021) , the FL setting with heterogeneous feature and samples is referred to as hybrid FL. To develop a hybrid FL method, we must address the following challenges: 1. Global and local inference requires global and local models. Hybrid FL makes it possible for a client to make its local inference and also for all the clients (or the server) to make a global inference. The former requires only the features local to a client; the latter requires all the features and training a global model at the server. 2. Limited data sharing. In typical HFL, the clients do not share their local data or labels during training. In VFL, the labels are either made available by the clients to the server (Chen et al., 2020) or stored in a designated client (Liu et al., 2019) . A hybrid FL system may be subject to a "no



(2020b); Li et al. (2018); Wang et al. (2020); Fallah et al. (2020); Vahidian et al. (2021).

Figure 1: The heterogeneous data distribution in a medical diagnosis example. Besides SH, another form of heterogeneity is feature heterogeneity (FH). Traditionally, we say the samples are FH if we can partition them into subsets that bear distinct features. In the FL setting, when the sample subsets of different clients have different, but not necessarily distinct, features, we call it FH. That is, under FH, different clients have unique and possibly also common features. FH and SH arise in ML tasks such as collaborative medical diagnosis (Ng et al., 2021), recommendation system (Yang et al., 2020), and graph learning (Zhang et al., 2021), where the data collected by different clients have different, and possibly overlapping features and sample IDs. Next, we provide a few examples.

Figure 2: The data distribution patterns of a) heterogeneous client data; b) HFL and c) VFL. Medical diagnosis application (see Figure 1). The clients are clinics, and they collect data samples from patients. Each clinic may have a different set of diagnostic devices, e.g., clinic A has MRI and ultrasound, while clinic B has MRI and electrocardiographs (ECG). FH arises as the feature set of each sample collected by clinic A may partially overlap with that done by clinic B. Besides FH, SH also arises as multiple clinics may not have the chance of treating the same patient and each patient usually visit only a subset of clinics.

Examples of applications that generates heterogeneous data

