PERSONALIZED FEDERATED COMPOSITE LEARNING WITH FORWARD-BACKWARD ENVELOPES

Abstract

Federated composite optimization (FCO) is an optimization problem in federated learning whose loss function contains a non-smooth regularizer. It arises naturally in the applications of federated learning (FL) that involve requirements such as sparsity, low rankness, and monotonicity. In this study, we propose a personalization method, called pFedFBE, for FCO by using forward-backward envelope (FBE) as clients' loss functions. With FBE, we not only decouple the personalized model from the global model, but also allow personalized models to be smooth and easily optimized. In spite of the nonsmoothness of FCO, pFedFBE shows the same convergence complexity results as FedAvg for FL with unconstrained smooth objectives. Numerical experiments are shown to demonstrate the effectiveness of our proposed method.

1. INTRODUCTION

Federated learning (FL) is originally proposed by (McMahan et al., 2016) to solve learning tasks with decentralized data arising in various applications. For example, data are generated from medical institutions which can not share its data to each other due to confidentiality or legal constraints. Instead of accessing all the data sets, different institutions or clients are under the coordination of a central server and the central server aggregates the local information to train a global model. Similar methodologies have been investigated in the literature of decentralized optimization (Colorni et al., 1991; Boyd et al., 2011; Yang et al., 2019) . For more introductions and open problems in the field of federated optimization, we refer to the review articles (Kairouz et al., 2021; Wang et al., 2021) . The local loss functions of FL can be nonsmooth. In particular, it is a summation of a smooth function and a nonsmooth regularizer, where the regularizer is used to promote certain structure of the optimal parameters such as sparisity, low rankness, total variation, and additional constraints on the parameters. This has motivated the recent study of the federated setting of composite optimization (Yuan et al., 2021) . The mathematical formulation of FCO is to optimize min w∈R d f (w) := 1 N N i=1 (f i (w) + h(w)) , where f i (w) = E ξi fi (w, ξ i ) or its empirical version f i (w) = 1 |D i | ξi∈D i fi (w, ξ i ) is a smooth function with local dataset D i , and h : R d → R is a nonsmooth but convex regularizer. Besides, we assume that the proximal operator of h, prox h (w) := arg min u∈R d h(u) + 1 2 ∥u -w∥ 2 has closed-form expressions and is easy to compute. The difference to the centralized setting is that D i is local data of client i and the data distribution of each client may not be the same. Optimizing FL with unconstrained smooth objectives, i.e., problem (1) with h ≡ 0, has been extensively studied in the literature, e.g., FedAvg (McMahan et al., 2016 ), FedProx (Li et al., 2020) , SCAFFOLD (Karimireddy et al., 2020b) and MIME (Karimireddy et al., 2020a) , to name a few. When h ̸ = 0, FedDual (Yuan et al., 2021 ), FedDR (Tran Dinh et al., 2021 ) and FedADMM (Wang et al., 2022) are developed. One of the challenges of these algorithms is the heterogeneity of the local dataset D i , where the distribution of D i is none-identical. The model parameter w learned by minimizing f (w) may perform poorly for each client. If each client learns its parameter by its own data, the local model parameters may not generalize well due to the insufficient data. For the case h ≡ 0, in order to learn the global parameter and the local parameters jointly, the concept of the personalized FL has been studied, e.g., (Smith et al., 2017; Hanzely and Richtárik, 2020; Hanzely et al., 2020; Fallah et al., 2020b; Mansour et al., 2020; Chen and Chao, 2021; T Dinh et al., 2020) . To our best knowledge, there is no paper which directly investigating the personalization techniques to FCO, although the above personalized methods may be generalized. Our Contributions. In this paper, we consider constructing the personalized model for the FCO (1), while existing methods mentioned above all conduct personalizations to FL with unconstrained smooth objectives, i.e., h = 0. Although their approaches may be generalized to (1) by replacing gradients with subgradients, the computational results could be worse due to the slow convergence of subgradient methods. The main contributions are summarized as follows. • We present a personalized model by utilizing forward-backward envelope (FBE) for FCO ( 1 can be used to solve the resulting model. By applying FedAvg, the optimization process of our proposed personalized model can be regarded as several local variable-metric proximal gradient updates followed by a global aggregated parameter. The variable-metric proximal gradient steps are able to protect the local information, and the aggregation steps guarantee the total loss is minimized at the aggregated parameter. A proper choice of parameter of FBE allows local parameter to move towards their own models and not to go far away from the global parameter. An algorithm, called pFedFBE, is proposed. We show its convergence for the nonconvex f i under mild assumptions. The complexity result matches the standard results of FedAvg for unconstrained smooth FL. • Based on the properties of FBE, the convergence rate of pFedFBE match with the standard analysis of applying FedAvg under standard assumptions over f , h and the stochasticity. Numerical experiments on various applications are performed to demonstrate the effectiveness of our proposed personalized model. Notations. For a vector w ∈ R d or a matrix H ∈ R d×d , we use ∥w∥ and ∥H∥ to denote its ℓ 2 norm and Frobenius norm, respectively. For a smooth function f : R d → R, ∇f (x) and ∇ 2 f (x) represent its gradient and Hessian at x, respectively. For a nonsmooth and convex function h, we denote by ∂h(x) its subgradients at x. We use |D| to denote the cardinality of a set D.

2. PERSONALIZED FEDERATED LEARNING WITH FORWARD-BACKWARD ENVELOPE (PFEDFBE)

The personalized FedAvg (Per-FedAvg) (Fallah et al., 2020b) and personalized FL with Moreau envelope (pFedMe) (T Dinh et al., 2020) are proposed to deal with the data heterogeneity for the smooth setting, i.e., h = 0. For pFedMe, the local model is constructed based on the Moreau envelope of f i , namely, Fi (w) = min θi∈R d f i (θ i ) + λ 2 ∥θ i -w∥ 2 . (2) Then, the resulting personalized model is a bi-level problem: Fi (w). (3) Solving (3) will give both the global parameter w and local personalized parameter θ i (w) := prox fi/λ (w) := arg min θi∈R d f i (θ i ) + λ 2 ∥θ i -w∥ 2 . A crucial benefit of optimizing with the Moreau envelope Fi 's lies on the flexible choices of λ. When λ = ∞, we have Fi (w) = f i (w) and θ i (w) = w, which means no personalization is introduced. If λ = 0, Fi (w) is a constant function taking value f (θ i (w)) with θ i (w) ≡ arg min θi∈R d f i (θ i ). In this case, there is only personalization



), called pFedFBE. As a generalized of the Moreau envelope of the Moreau envelope under the Bregman distance (Liu and Pong, 2017), FBE is smooth and has explicit forms of gradients, which is a crucial benefit compared to the Moreau envelope. Analogous to the personalized method by the Moreau envelope (T Dinh et al., 2020), our proposed method is able to obtain both global parameters for generalization and local parameters for personalization. To our best knowledge, this is the first work to investigate personalizations to FCO. • Based on FBE, the local loss functions and global loss function are smooth and hence FedAvg

