FEDERATED LEARNING ON ADAPTIVELY WEIGHTED NODES BY BILEVEL OPTIMIZATION Anonymous

Abstract

We propose a federated learning method with weighted nodes in which the weights can be modified to optimize the model's performance on a separate validation set. The problem is formulated as a bilevel optimization problem where the inner problem is a federated learning problem with weighted nodes and the outer problem focuses on optimizing the weights based on the validation performance of the model returned from the inner problem. A communication-efficient federated optimization algorithm is designed to solve this bilevel optimization problem. We analyze the generalization performance of the output model and identify the scenarios when our method is in theory superior to training a model locally and superior to federated learning with static and evenly distributed weights.

1. INTRODUCTION

Federated learning (FL) is an emerging technique for training a model using data distributed over a network of nodes without sharing data between nodes (Konečnỳ et al., 2016; McMahan et al., 2017) . In this paper, we focus on the case where data distributions across nodes are heterogeneous and each node aims at a model with an optimal local generalization performance. In the classical setting of FL, a globally shared model is learned by minimizing a weighted average loss across all nodes. However, given the heterogeneity of data distributions, a global model is likely to be sub-optimal for some node (Fallah et al., 2020) . Alternatively, each node can train a model only using its local data, but such a local model may not generalize well neither when the volume of local data is small. To achieve a good local generalization performance, each node can still exploit global training data through FL but, at the same time, identify and collaborate only with the nodes whose data distributions are similar or identical to its local distribution. One way to implement this strategy is to allow each node to solve its own weighted average loss minimization problem with weights designed based on the performance on a separate set of local (validation) data. Ideally, each node can learn a better model by allocating more weights on its peers whose data distribution is similar to its local distribution. In this paper, we formulate the choice of the weights as a bilevel optimization (BO) problem (Colson et al., 2005; Vicente & Calamai, 1994) , which can be solved by a federated bilevel optimization algorithm, and analyze the generalization performances of the resulting model. We consider a standard learning problem where the goal is to learn a vector of model parameters θ from a set Θ that minimizes a generalization loss. This problem can be formulated as θ * ∈ arg min θ∈Θ {L 0 (θ) := E z∼p0 [l(θ; z)]} , where l(θ; z) is the loss of θ on a data point z from a space Z, and E z∼p0 represents the expectation taken over z when z follows an unknown ground truth distribution p 0 . Directly solving (P) is challenging as p 0 is unknown, and, typically, training data sampled from p 0 is needed for learning an approximation of θ * . In this paper, we consider the scenario where the amount of data sampled directly from p 0 may not be sufficient to learn a good approximation of θ * , but there exist external data distributed on K nodes that can potentially help the learning on θ * . In particular, we denote the set of nodes by K := {1, . 



. . , K} and assume a training set D train k is stored in node k. We also define D train := D train k K k=1 and assume |D train k | = n k and D train k = {z (i) k } n k i=1 , where z (i) k ∈ Z is an i.i.d. sample from an unknown distribution p k for k ∈ K.

