PRECISION COLLABORATION FOR FEDERATED LEARN-ING

Abstract

Inherent heterogeneity of local data distributions, which causes inefficient model learning and significant degradation of model performance, has been a key challenge in Federated Learning (FL). So far, plenty of efforts have focused on addressing data heterogeneity by relying on a hypothetical clustering structure or a consistent information sharing mechanism. However, because of the diversity of the real-world local data, these assumptions may be largely violated. In this work, we argue that information sharing is mostly fragmented in the federated network in reality. More specifically, the distribution overlaps are not consistent but scattered among local clients. We propose the concept "Precision Collaboration" which refers to learning from the informative overlaps precisely while avoiding the potential negative transfer induced by others. In particular, we propose to infer the local data manifolds and estimate the exact local data density simultaneously. The learned manifold aims to precisely identify the overlaps from other clients, and the estimated likelihood allows to generate samples from the manifold in an optimal sampling density. Experiments show that our proposed PCFL significantly overcomes baselines on benchmarks and a real-world clinical scenario.

1. INTRODUCTION

Federated learning (FL) has drawn considerable interest from a variety of disciplines in recent years. FL enables collaborative model learning without the need to access the raw data across different clients, which facilitates real-world scenarios where privacy preservation is crucial, such as finance (Yang et al., 2019 ), healthcare (Xu et al., 2021) and criminal justice (Berk, 2012) . While it is common that the data samples in local clients are non-i.i.d., existing research reveals that data heterogeneity could lead to non-guaranteed convergence, inconsistent performance and catastrophic forgetting across different clients (Qu et al., 2022) . Despite the promise of FL, an increasing concern is how to effectively handle data heterogeneity before FL is applied in real-world data scenarios. In view of this challenge, an important direction is personalization. A variety of efforts have been made to explore this direction. For example, Ghosh et al. (2020) proposed to cluster the clients according to their sample distributions and build a customized model for each cluster. However, their hypothesis excludes the possibility of knowledge transfer across clusters. Li et al. (2021b) enhanced personalized model learning by introducing a global regularization term, which assumed that the shared knowledge was consistent across all clients. Considering the diversity of local data, in this paper, we study a more flexible and general scenario where the distribution overlaps could be fragmented as shown in Figure 1 (a) . Since the informative and ambiguous data shards exist simultaneously in another client, collaborating with all data could do harm to the model learning. An interesting and challenging problem is how to selectively collaborate with the favorable part of other clients in a privacy-preserving way. In this paper, we put forward the concept "Precision Collaboration" for fragmented information sharing. To begin with, we argue that data heterogeneity comes from inconsistent local data manifolds. In particular, the data manifolds of different local clients could share different overlaps. Maximizing the benefit of collaboration requires a precise utilization of these overlaps. Moreover, local data are usually gathered from the manifold based on a particular density. If we want to generate data from the manifold, a precise distribution density approximation for each client could facilitate model learning. 1

