LEARNING SHAREABLE BASES FOR PERSONALIZED FEDERATED IMAGE CLASSIFICATION

Abstract

Personalized federated learning (PFL) aims to leverage the collective wisdom of clients' data while constructing customized models that are tailored to individual client's data distributions. The existing work of PFL mostly aims to personalize for participating clients. In this paper, we focus on a less studied but practically important scenario-generating a personalized model for a new client efficiently. Different from most previous approaches that learn a whole or partial network for each client, we explicitly model the clients' overall meta distribution and embed each client into a low dimension space. We propose FEDBASIS, a novel PFL algorithm that learns a set of few, shareable basis models, upon which each client only needs to learn the coefficients for combining them into a personalized network. FEDBASIS is parameter-efficient, robust, and more accurate compared to other competitive PFL baselines, especially in a low data regime, without increasing the inference cost. To demonstrate its applicability, we further present a PFL evaluation protocol for image classification, featuring larger data discrepancies across clients in both the image and label spaces as well as more faithful training and test splits.

1. INTRODUCTION

Recent years have witnessed a gradual shift in computer vision and machine learning from simply building a stronger model (e.g., image classifier) to taking more users' aspects into account. For instance, more attention has been paid to data privacy and ownership in collecting data for model training (Jordan & Mitchell, 2015; Papernot et al., 2016) . Building models that are tailored to users' data, preferences, and characteristics have been shown to greatly improve user experience (Rudovic et al., 2018) . Personalized federated learning (PFL) is a relatively new machine learning paradigm that can potentially fulfill the demands of both worlds (Kulkarni et al., 2020) . On the one hand, it follows the setup of federated learning (FL): training models with decentralized data held by users (i.e., clients) (Kairouz et al., 2019) . On the other hand, it aims to construct customized models for individual clients that would perform well for their respective data distributions. While appealing, existing work of PFL has mainly focused on how to train the personalized models, e.g., via federated multi-task learning (Li et al., 2020a; Smith et al., 2017) , model interpolation (Mansour et al., 2020) , fine-tuning (Chen & Chao, 2022; Yu et al., 2020) , etc. Specifically, existing algorithms mostly require saving for each client a whole or partial model (e.g., a ConvNet classifier or feature extractor). This implies a linear parameter complexity with respect to the number of clients, which is parameter-inefficient and unfavorable for personalized cloud service -the overall system needs a linear space of storage, not to mention the efforts for profiling, versioning, and provenance, for every client. Less attention has been paid to how to deploy and maintain the personalized system. A practical challenge of previous work is how to fulfill new clients' queries, who did not involve in the training phase. Beyond training personalized models for the participated clients only, we focus on preparing to serve new clients with fast, data-efficient personalization. A promising solution is Model Agnostic Meta-Learning (MAML) (Finn et al., 2017) that aims to learn a good initialization such that it can be adapted to a new task fast, e.g., in a few SGD steps. The model-based idea has been inserted into PFL as well, by learning a model ready to be fine-tuned on each client's local data (Fallah et al., 2020) . However, it still learns the parameters of a whole or partial model for each client. Several recent studies (Pillutla et al., 2022; Wu et al., 2022; Fallah et al., 2020) show that when individual clients' data are scarce, fine-tuning may suffer from overfitting and being sensitive to hyperparameters such as learning rates and the number of steps, eventually hurting some clients' test performance, though the average personalized performance could be improved. To address such a dilemma, we propose to improve the robustness of a personalization system by reducing the overall parameter complexity. Specifically, we aim to decouple the required total number of personalized parameters from the number of clients. We hypothesize that the clients' local distributions are not disjoint and could share some latent variables (e.g., domains, superclasses, etc). Learning a separate personalized model for each client could be redundant and unfavorable for the generalization of new data/clients. Specifically, we are interested in learning a meta-model that can generate a personalized model for every client such that the overall parameter complexity is bounded by the size of such a meta-model while providing flexibility to adapt the whole network. We propose a novel model architecture and learning algorithm for PFL. Our idea is to learn a few, shareable basis models of the same architecture, which can be combined layer by layer to form a personalized model with learnable combination coefficients, inspired by (Changpinyo et al., 2016; Evgeniou & Pontil, 2007) . The inference memory footprint and computation cost of the combined personalized model do not scale with #basis. An illustration is in Figure 1 . It can be treated as Principal Component Analysis (PCA) on the collections of high-dimensional neural networks, essentially learning sharable bases across clients. Learning the basis models in a federated setting, however, is nontrivial. As will be discussed in section 4, naively training them via the FEDAVG procedure (McMahan et al., 2017) -i.e., iterating between local model training for multiple epochs and global aggregation -would simply result in non-specialized bases that are unable to construct personalized models. We, therefore, present an improved coordinate descent style federated algorithm to overcome this problem. We name this architecture and algorithm FEDBASIS. FEDBASIS enjoys several desired properties. It maintains built-in overall parameter efficiency but also maintains high personalized classification accuracy. After the basis models are trained, a new client only needs to learn very few parameters, i.e., the coefficients for combining them, to accommodate the distribution discrepancy, which is more robust to learning rates and the training size. Last but not least, FEDBASIS is a stateless algorithm and does not increase inference-time cost, suitably for cross-device deployment. To demonstrate the applicability and generalizability of FEDBASIS, we further present PFLBED, a set of benchmark datasets for cross-domain PFL. We point out some existing PFL evaluations that either pose huge distribution mismatches between training and testing (thus misleading) (Caldas et al., 2018; Li et al., 2020a) or focus on the cases that only either labels or the input domains are non-IID across clients (thus less comprehensive) (Chen & Chao, 2022; Sun et al., 2021) . PFLBED is carefully designed to resolve both problems. Concretely, we split the datasets into personalized portions according to domains by leveraging either domain annotated datasets (Li et al., 2017; Venkateswara et al., 2017) or natural attributes like users, PFLBED is able to capture more diverse and realistic PFL scenarios to reflect real-world challenges.

2. RELATED WORK

Many approaches have been developed to improve different dimensions of PFL. We focus on a less studied route by learning a meta-model to summarize all the client models. Our FEDBASIS



Figure 1: In conventional PFL, each client learns a high-dimensional model, the overall parameters scale with number of clients. In our FEDBASIS, we learn a few sharable basis models of the same network architecture. After the basis models are trained, a new client only needs to learn a short combination vector as the coefficients to combine them in the parameter space into a personalized network thus more data efficient and robust.

