FEDERATED SELF-SUPERVISED LEARNING FOR HET-EROGENEOUS CLIENTS

Abstract

Federated Learning has become an important learning paradigm due to its privacy and computational benefits. As the field advances, two key challenges that still remain to be addressed are: (1) system heterogeneity -variability in the compute and/or data resources present on each client, and (2) lack of labeled data in certain federated settings. Several recent developments have tried to overcome these challenges independently. In this work, we propose a unified and systematic framework, Heterogeneous Self-supervised Federated Learning (Hetero-SSFL) for enabling self-supervised learning with federation on heterogeneous clients. The proposed framework allows collaborative representation learning across all the clients without imposing architectural constraints or requiring presence of labeled data. The key idea in Hetero-SSFL is to let each client train its unique self-supervised model and enable the joint learning across clients by aligning the lower dimensional representations on a common dataset. The entire training procedure could be viewed as self and peer-supervised as both the local training and the alignment procedures do not require presence of any labeled data. As in conventional self-supervised learning, the obtained client models are task independent and can be used for varied end-tasks. We provide a convergence guarantee of the proposed framework for non-convex objectives in heterogeneous settings and also empirically demonstrate that our proposed approach outperforms the state of the art methods by a significant margin.

1. INTRODUCTION

Federated learning has become an important learning paradigm for training algorithms in a privacy preserving way and has gained a lot of interest in the recent past. While traditional federated learning is capable of learning high performing models (Li et al., 2021b) , two practical challenges that remain under studied are: system heterogeneity, and lack of labeled data. In several real world scenarios, the clients involved in the training process are highly heterogeneous in terms of their data and compute resources. Requiring each client to train identical models, like in traditional FL, may thus be severely limiting. Similarly, assuming each client's local data resources to be fully labeled may not be pragmatic as annotating data is time consuming and may require expertise. Our focus in this work is to jointly address these two challenges in a systematic way so as to substantially increase the scope of FL approaches. Prior works have studied these two issues separately, and the approaches taken do not offer a natural way to be combined so as to provide an effective heterogeneous selfsupervised FL framework. For instance, to alleviate the scarcity of labeled data on local clients, both semi-supervised learning (Zhang et al., 2021b; Jeong et al., 2021; Lu et al., 2022; Lubana et al., 2022) and self-supervised learning (Zhuang et al., 2021; 2022; He et al., 2022; van Berlo et al., 2020) methods have been proposed, but they all assume identical model architectures on each client and in fact do not extend to heterogeneous settings. Some aspects of system heterogeneity have been independently addressed for supervised learning by building personalised models on clients (Tan et al., 2022; Jiang et al., 2020; Fallah et al., 2020) , but still assuming identical architectures. Existing recent works on federated learning with independent architectures across clients consider standard supervised learning scenarios (Makhija et al., 2022) rather than self-supervised learning. The need for self-supervised FL arises in multiple applications. For example, consider cross-silo analytics in healthcare systems where different hospitals may possess varying amounts of private medical images. Here, the data can neither be centralised nor can undergo extensive annotations. Moreover, to expect each client (e.g., a hospital) to train local models of identical capacities can be highly inhibiting. In such cases, the smaller capacity clients which are incapable of training large models of common architecture will not be accommodated in federated learning. On the other hand, if the common model architecture size is reduced, some of the clients will not be fully engaged. Differences in compute resources and the learning capacity demand system heterogeneity. To the best of our knowledge, ours is the first work that proposes a general framework for federated self-supervised learning in heterogeneous settings where non-identical clients can implement distinct, independent model architectures and obtain personalised solutions. The proposed framework is novel and flexible, and allows clients to train self-supervised models with unique (locally tuned) structures while still using the learnings from other clients in the network, in an architecture agnostic way. To achieve this, we add a proximal term to each client's loss function that helps in aligning the learnt lower dimensional representations (aka embeddings) across clients. This way of transferring and acquiring global knowledge allows variability in client model architectures while keeping the entire learning process independent of labeled training data. Furthermore, we use a kernel based distance metric for proximity calculation which provides much more flexibility to the clients in defining their own lower dimensional space without any constraints. Our Contributions are summarized as follows : 1. Our main contribution is the new framework, Hetero-SSFL, for training heterogeneous models in a federated setting in an unsupervised way. Hetero-SSFL allows each client to train its own customized model, using local data and computing resources while also utilising unlabeled supervision from peers. 2. We perform thorough experimental evaluation of the proposed approach in both image-based and text-based self-supervised learning settings. We observe that the proposed flexible approach substantially improves the accuracy of unsupervised and self-supervised federated learning in heterogeneous settings. 3. We also provide theoretical analysis and convergence guarantee of the algorithm for nonconvex loss functions. Organization. The rest of the paper is organised as follows. Section 2 provides a brief background on Federated Learning, Self-supervised Learning and related developments. In Section 3, we go over the preliminaries and then propose our framework. We study the convergence guarantee of the algorithm in Section 4 and include the related proofs in the Appendix. A thorough experimental evaluation of our method on different types of datasets is presented in Section 5 and we conclude the paper in Section 6.

2. RELATED WORK

This section provides an overview of the most relevant prior work in the fields of federated learning, self-supervised learning and federated self-supervised learning.

2.1. FEDERATED LEARNING(FL)

The problem of training machine learning models in distributed environments with restrictions on data/model sharing was studied by several researchers in the data mining community in the early 2000s under titles such as (privacy preserving) distributed data mining (Kargupta & Park, 2000; Gan et al., 2017; Aggarwal & Yu, 2008) . Much of this work was for specific procedures such as distributed clustering (Merugu & Ghosh, Nov, 2003) including in heterogenous settings with different feature spaces (Merugu & Ghosh, 2005) , and distributed PCA (Kargupta et al., 2001) , or for specific models such as SVMs (Yu et al., 2006) . Subsequently, with the re-surfacing in popularity of neural networks and proliferation of powerful deep learning approaches, the term "Federated Learning" got coined and popularized largely by an influential paper (McMahan et al., 2017) which introduced FedAvg. Indeed FedAvg is now considered the standard distributed training method for Federated Learning, which has become a very active area of research since then. One of the key challenges in this setting is the presence of non-iid datasets across clients. Several modifications of the original FedAvg algorithm have been proposed to address this challenge. Some of these approaches focus on finding better solutions to the optimization problem to prevent the divergence of the global solution (Li et al., 2020; Karimireddy et al., 2020; Zhang et al., 2021a; Pathak & Wainwright, 2020; Acar et al., 2021; Karimireddy et al., 2021) whereas some suggest the modification of the training procedure to incorporate appropriate aggregation of the local models (Chen & Chao, 2021; Wang et al., 2020a;  

