A STATISTICAL FRAMEWORK FOR PERSONALIZED FED-ERATED LEARNING AND ESTIMATION: THEORY, ALGO-RITHMS, AND PRIVACY

Abstract

A distinguishing characteristic of federated learning is that the (local) client data could have statistical heterogeneity. This heterogeneity has motivated the design of personalized learning, where individual (personalized) models are trained, through collaboration. There have been various personalization methods proposed in literature, with seemingly very different forms and methods ranging from use of a single global model for local regularization and model interpolation, to use of multiple global models for personalized clustering, etc. In this work, we begin with a statistical framework that unifies several different algorithms as well as suggest new algorithms. We apply our framework to personalized estimation, and connect it to the classical empirical Bayes' methodology. We develop novel private personalized estimation under this framework. We then use our statistical framework to propose new personalized learning algorithms, including AdaPeD based on information-geometry regularization, which numerically outperforms several known algorithms. We develop privacy for personalized learning methods with guarantees for user-level privacy and composition. We numerically evaluate the performance as well as the privacy for both the estimation and learning problems, demonstrating the advantages of our proposed methods.

1. INTRODUCTION

The federated learning (FL) paradigm has had huge recent success both in industry and academia (McMahan et al., 2017; Kairouz et al., 2021) , as it enables to leverage data available in dispersed devices for learning while maintaining data privacy. Yet, it was recently realized that for some applications, due to the statistical heterogeneity of local data, a single global learning model may perform poorly for individual clients. This motivated the need for personalized learning achieved through collaboration, and there have been a plethora of personalized models proposed in the literature as well (Fallah et al., 2020; Dinh et al., 2020; Deng et al., 2020; Mansour et al., 2020; Acar et al., 2021; Li et al., 2021; Ozkara et al., 2021; Zhang et al., 2021; Hu et al., 2020) . However, the proposed approaches appear to use very different forms and methods, and there is a lack of understanding of an underlying fundamental statistical framework. Such a statistical framework could help develop theoretical bounds for performance, suggest new algorithms as well as perhaps give grounding to known methods. Our work addresses this gap. In particular, we consider the fundamental question of how one can use collaboration to help personalized learning and estimation for users who have limited data that they want to keep private. Our proposed framework is founded on the requirement not only of personalization but also privacy, as maintaining local data privacy is what makes the federated learning framework attractive -and thus any algorithm that aims to be impactful needs to also give formal privacy guarantees. The goal of this paper is to develop a statistical framework that leads to new algorithms with provable privacy guarantees, and performance bounds. Our main contributions are (i) Development of a statistical framework for federated personalized estimation and learning (ii) Theoretical bounds and novel algorithms for private personalized estimation (iii) Design and privacy analysis of new private personalized learning algorithms; as elaborated below. Omitted proofs/details are in appendices. • Statistical framework: We connect this problem to the classical empirical Bayes' method, pioneered by Stein (1956); James & Stein (1961); Robbins (1956) , which proposed a hierarchical statistical model Gelman et al. (2013) . This is modeled by an unknown population distribution P from which local parameters {θ i } are generated, which in turn generate the local data through the distribution Q(θ i ). Despite the large literature on this topic, especially in the context of statistical estimation, creating a framework for FL poses new challenges. In contrast to classical empirical Bayes' estimation, we introduce a distributed setting and develop a framework that allows information (communication and privacy) constraintsfoot_0 . This framework enables us to develop statistical performance bounds as well as suggests (private) personalized federated estimation algorithms. Moreover, we develop our framework beyond estimation, for (supervised) distributed learning, where clients want to build local predictive models with limited local (labeled) samples; we develop this framework in Section 3, which leads to new (private) personalized learning algorithms. • Private personalized estimation: Our goal is to estimate individual (local) parameters, when each user has very limited (heterogeneous) data. Such a scenario motivates federated estimation of individual parameters, privately. More precisely, the users observe data generated by an unknown distribution parametrized by their individual (unknown) local parameters θ i , and want to estimate their local parameters θ i leveraging very limited local data; see Section 2 for more details. For the hierarchical statistical model, classical results have shown that one can enhance the estimate of individual parameters based on the observations of a population of samples, despite having independently generated parameters from an unknown population distributions. However, this has not been studied for the distributed case, with privacy and communication constraints, which we do (see Theorem 2 for the Gaussian case and Theorem 4 for the Bernoulli case, and also for mixture population models in Appendix D). We estimate the (parametrized) population distribution under these privacy and communication constraints and use this as an empirical prior for local estimation. The effective amplification of local samples through collaboration, in Section 2, gives us theoretical insight about when collaboration is most useful, under privacy and/or communication constraints. Our results suggest how to optimally balance estimates from local and population models. We also numerically evaluate these methods, including application to polling data (see Section 4 and Appendices) to show advantages of such collaborative estimation compared to local methods. • Private personalized learning: The goal here is to obtain individual learning models capable of predicting labels with limited local data in a supervised learning setting. This is the use case for federated learning with privacy guarantees. It is intimately related to the estimation problem with distinctions including (i) to design good label predictors rather than just estimate local parameters (ii) the focus on iterative methods for optimization, requiring strong compositional privacy guarantees. Therefore, the statistical formulation for learning has a similar flavor to that in estimation, where there is a population model for local (parametrized) statistics for labeled data; see Section 3 for more details. We develop several algorithms, including AdaPeD (in Section 3.2), AdaMix (in Section 3.1), and DP-AdaPeD (in Section 3.3), inspired by the statistical framework. AdaPeD uses information divergence constraints along with adaptive weighting of local models and population models. By operating in probability (rather than Euclidean) space, using information-geometry (divergence), enables AdaPeD to operate with different local model sizes and architectures, giving it greater flexibility than existing methods. We integrate it with user-level privacy to develop DP-AdaPeD, with strong compositional privacy guarantees (Theorem 5). AdaMix is inspired by mixture population distributions, which adaptively weighs multiple global models and combines it with local data for personalization. We numerically evaluate these algorithms for synthetic and real data in Section 4. Related Work. Our work can be seen in the intersection of personalized learning, estimation, and privacy. Below we give a brief description of related work; a more detailed comparison which connects our framework to other personalized algorithms is given in Appendix J. Personalized FL: Recent work adopted different approaches for learning personalized models, which can be explained by our statistical framework for suitable choices of population distributions as explained in Appendix J: These include, meta-learning based methods (Fallah et al., 2020; Acar et al., 2021; Khodak et al., 2019); regularization (Deng et al., 2020; Mansour et al., 2020; Hanzely 



The homogeneous case for distributed estimation is well-studied; see (Zhang, 2016) and references.

