FEDERATED SEMI-SUPERVISED LEARNING WITH INTER-CLIENT CONSISTENCY & DISJOINT LEARNING

Abstract

While existing federated learning approaches mostly require that clients have fullylabeled data to train on, in realistic settings, data obtained at the client-side often comes without any accompanying labels. Such deficiency of labels may result from either high labeling cost, or difficulty of annotation due to the requirement of expert knowledge. Thus the private data at each client may be either partly labeled, or completely unlabeled with labeled data being available only at the server, which leads us to a new practical federated learning problem, namely Federated Semi-Supervised Learning (FSSL). In this work, we study two essential scenarios of FSSL based on the location of the labeled data. The first scenario considers a conventional case where clients have both labeled and unlabeled data (labels-at-client), and the second scenario considers a more challenging case, where the labeled data is only available at the server (labels-at-server). We then propose a novel method to tackle the problems, which we refer to as Federated Matching (FedMatch). FedMatch improves upon naive combinations of federated learning and semi-supervised learning approaches with a new inter-client consistency loss and decomposition of the parameters for disjoint learning on labeled and unlabeled data. Through extensive experimental validation of our method in the two different scenarios, we show that our method outperforms both local semi-supervised learning and baselines which naively combine federated learning with semi-supervised learning.

1. INTRODUCTION

Federated Learning (FL) (McMahan et al., 2017; Zhao et al., 2018; Li et al., 2018; Chen et al., 2019a; b) , in which multiple clients collaboratively learn a global model via coordinated communication, has been an active topic of research over the past few years. The most distinctive difference of federated learning from distributed learning is that the data is only privately accessible at each local client, without inter-client data sharing. Such decentralized learning brings us numerous advantages in addressing real-world issues such as data privacy, security, and access rights. For example, for on-device learning of mobile devices, the service provider may not directly access local data since they may contain privacy-sensitive information. In healthcare domains, the hospitals may want to improve their clinical diagnosis systems without sharing the patient records. Existing federated learning approaches (McMahan et al., 2017; Wang et al., 2020; Li et al., 2018) handle these problems by aggregating the locally learned model parameters. A common limitation is that they only consider supervised learning settings, where the local private data is fully labeled. Yet, the assumption that all of the data examples may include sophisticate annotations is not realistic for real-world applications. Suppose that we perform on-device federated learning, the users may not want to spend their time and efforts in annotating the data, and the participation rate across the users may largely differ. Even in the case of enthusiastic users may not be able to fully label all the data in the device, which will leave the majority of the data as unlabeled (See Figure 1 (a) ). Moreover, in some scenarios, the users may not have sufficient expertise to correctly label the data. For instance, suppose that we have a workout app that automatically evaluates and corrects one's body posture. In this case, the end users may not be able to evaluate his/her own body posture at all (See Figure 1 (b)). Thus, in many realistic scenarios for federated learning, local data will be mostly unlabeled. This leads us to practical problems of federated learning with deficiency of labels, namely Federated Semi-Supervised Learning (FSSL). A naive solution to these scenarios is to simply perform Semi-Supervised Learning (SSL) using any off-the-shelf methods (e.g. FixMatch (Sohn et al., 2020), UDA (Xie et al., 2019)), while using federated learning algorithms to aggregate the learned weights. Yet, this does not fully exploit the knowledge of the multiple models trained on heterogeneous data distributions. To address this problem, we present a novel framework which we refer to as Federated Matching (FedMatch), which enforces the consistency between the predictions made across multiple models. Further, conventional semi-supervised learning approaches are not applicable for scenarios where labeled data is only available at the server (Figure 1 • We introduce a practical problem of federated learning with deficiency of supervision, namely Federated Semi-Supervised Learning (FSSL), and study two different scenarios, where the local data is partly labeled (Labels-at-Client) or completely unlabeled (Labels-at-Server). • We propose a novel method, Federated Matching (FedMatch), which learns inter-client consistency between multiple clients, and decomposes model parameters to reduce both interference between supervised and unsupervised tasks, and communication cost. • We show that our method, FedMatch, significantly outperforms both local SSL and the naive combination of FL with SSL algorithms under the conventional labels-at-client and the novel labels-at-server scenario, across multiple clients with both non-i.i.d. and i.i.d. data.

2. PROBLEM DEFINITION

We begin with formal definition of Federated Learning (FL) and Semi-Supervised Learning (SSL). Then, we define Federated Semi-Supervised Learning (FSSL) and introduce two essential scenarios. 



Figure 1: Illustrations of Two Practical Scenarios in Federated Semi-Supervised Learning (a) Labels-at-Client scenario: both labeled and unlabeled data are available at local clients. (b) Labels-at-Server scenario: labeled instances are available only at server, while unlabeled data are available at local clients.

(b)), which is a unique SSL setting for federated learning. Also, even when the labeled data is available at the client (Figure1(a)), learning from the unlabeled data may lead to forgetting of what the model learned from the labeled data. To tackle these issues, we decompose the model parameters into two, a dense parameter for supervised and a sparse parameter for unsupervised learning. This sparse additive parameter decomposition ensures that training on labeled and unlabeled data are effectively separable, thus minimizing interference between the two tasks. We further reduce the communication costs with both the decomposed parameters by sending only the difference of the parameters across the communication rounds. We validate FedMatch on both scenarios (Figure1(a) and (b)) and show that our models significantly outperform baselines, including a naive combination of federated learning with semi-supervised learning, on the training data which are both non-i.i.d. and i.i.d. data. The main contributions of this work are as follows:

Federated Learning Federated Learning (FL) aims to collaboratively learn a global model via coordinated communication with multiple clients. Let G be a global model and L = {l k } K k=1 be a set of local models for K clients. Let D = {x i , y i } N i=1 be a given dataset, where x i is an arbitrary training instance with a corresponding one-hot label y i ∈ {1, . . . , C} for the C-way multi-class classification problem and N is the number of instances. D is composed of K sub-datasets D l k = {x l k i , y l k i } N l k i=1 privately collected at each client or local model l k . At each communication round r, G first randomly selects A local models that are available for training L r ⊂ L and |L r | = A. The global model G then initializes L r with global weights θ G , and the active local models l a ∈ L r perform supervised

