FEDMT: FEDERATED LEARNING WITH MIXED-TYPE LABELS

Abstract

In federated learning (FL), classifiers (e.g., deep networks) are trained on datasets from multiple centers without exchanging data across them, and thus improves sample efficiency. In the classical setting of FL, the same labeling criterion is usually employed across all centers being involved in training. This constraint greatly limits the applicability of FL. For example, standards used for disease diagnosis are more likely to be different across clinical centers, which mismatches the classical FL setting. In this paper, we consider an important yet under-explored setting of FL, namely FL with mixed-type labels where different labeling criteria can be employed by various centers, leading to inter-center label space differences and challenging existing FL methods designed for the classical setting. To effectively and efficiently train models with mixed-type labels, we propose a theory-guided and model-agnostic approach that can make use of the underlying correspondence between those label spaces and can be easily combined with various FL methods such as FedAvg. We present convergence analysis based on over-parameterized ReLU networks. We show that the proposed method can achieve linear convergence in label projection, and demonstrate the impact of the parameters of our new setting on the convergence rate. The proposed method is evaluated and the theoretical findings are validated on benchmark and medical datasets.

1. INTRODUCTION

Federated learning (FL) enables centers to jointly learn a model while keeping data at each center. It avoids the centralization of data which is restricted by regulations such as CCPA (Legislature, 2018), HIPAA (Act, 1996), and GDPR (Voigt et al., 2018) and has gained popularity in various applications. The widely used FL methods, such as FedAvg (McMahan et al., 2017 ), FedAdam (Reddi et al., 2020) , and others use iterative optimization algorithms to achieve jointly model training across centers. At each round, local center performs stochastic gradient descent (SGD) for several steps then centers communicate their current model weight to a central server to be aggregated. When training a classifier in the classical FL setting, the datasets across all centers are annotated with the same labeling criterion. However, in real applications such as healthcare, standards for disease diagnosis may be different across clinical centers due to varying levels of expertise or technology available at different sites. For example, when diagnosing ADHD with brain imaging, the labels are usually acquired over a long period of behavior studies. Different centers may follow different diagnosis and statistical manuals (McKeown et al., 2015) and it is difficult to ask centers to relabel data using a unified criterion as some behavior studies cannot be repeated. This leads to different label spaces across centers. In addition, the center with the most complex labeling criterion, whose label space is desired for future prediction, typically only has limited labeled samples due to labeling difficulty or cost. In this paper, we aim to answer the following important question: With limited samples from the desired label space, how to leverage the commonly used FL pipeline (e.g., FedAvg) and data from other centers in different label spaces to jointly learn an FL model in the desired label space, without additional feature exchanging and data relabeling? Problem Setting: We study an FL problem for a given classification task. Each center has one labeling criterion, and the criteria across centers can be different. Samples do not overlap across centers. As shown in Fig. 1 , first, label spaces are not necessarily nested. One class from the desired

