PRECISION COLLABORATION FOR FEDERATED LEARN-ING

Abstract

Inherent heterogeneity of local data distributions, which causes inefficient model learning and significant degradation of model performance, has been a key challenge in Federated Learning (FL). So far, plenty of efforts have focused on addressing data heterogeneity by relying on a hypothetical clustering structure or a consistent information sharing mechanism. However, because of the diversity of the real-world local data, these assumptions may be largely violated. In this work, we argue that information sharing is mostly fragmented in the federated network in reality. More specifically, the distribution overlaps are not consistent but scattered among local clients. We propose the concept "Precision Collaboration" which refers to learning from the informative overlaps precisely while avoiding the potential negative transfer induced by others. In particular, we propose to infer the local data manifolds and estimate the exact local data density simultaneously. The learned manifold aims to precisely identify the overlaps from other clients, and the estimated likelihood allows to generate samples from the manifold in an optimal sampling density. Experiments show that our proposed PCFL significantly overcomes baselines on benchmarks and a real-world clinical scenario.

1. INTRODUCTION

Federated learning (FL) has drawn considerable interest from a variety of disciplines in recent years. FL enables collaborative model learning without the need to access the raw data across different clients, which facilitates real-world scenarios where privacy preservation is crucial, such as finance (Yang et al., 2019) , healthcare (Xu et al., 2021) and criminal justice (Berk, 2012) . While it is common that the data samples in local clients are non-i.i.d., existing research reveals that data heterogeneity could lead to non-guaranteed convergence, inconsistent performance and catastrophic forgetting across different clients (Qu et al., 2022) . Despite the promise of FL, an increasing concern is how to effectively handle data heterogeneity before FL is applied in real-world data scenarios. In view of this challenge, an important direction is personalization. A variety of efforts have been made to explore this direction. For example, Ghosh et al. (2020) proposed to cluster the clients according to their sample distributions and build a customized model for each cluster. However, their hypothesis excludes the possibility of knowledge transfer across clusters. Li et al. (2021b) enhanced personalized model learning by introducing a global regularization term, which assumed that the shared knowledge was consistent across all clients. Considering the diversity of local data, in this paper, we study a more flexible and general scenario where the distribution overlaps could be fragmented as shown in Figure 1 (a). Since the informative and ambiguous data shards exist simultaneously in another client, collaborating with all data could do harm to the model learning. An interesting and challenging problem is how to selectively collaborate with the favorable part of other clients in a privacy-preserving way. In this paper, we put forward the concept "Precision Collaboration" for fragmented information sharing. To begin with, we argue that data heterogeneity comes from inconsistent local data manifolds. In particular, the data manifolds of different local clients could share different overlaps. Maximizing the benefit of collaboration requires a precise utilization of these overlaps. Moreover, local data are usually gathered from the manifold based on a particular density. If we want to generate data from the manifold, a precise distribution density approximation for each client could facilitate model learning. < l a t e x i t s h a 1 _ b a s e 6 4 = " R T P Q D 6 U Q x A r / a G q C J h r 4 t d h 2 M Y A = " > A A A C 1 H i c j V H L S s N A F D 2 N r 1 p f U Z d u g k V w V R J R d C m 6 c S N U s A 9 o V S b j t A b z I p k I J X Y l b v 0 B t / p N 4 h / o X 3 h n T E E t o h O S n D n 3 n D t z 7 3 V j 3 0 u l b b + W j I n J q e m Z 8 m x l b n 5 h c c l c X m m m U Z Z w 0 e C R H y V t l 6 X C 9 0 L R k J 7 0 R T t O B A t c X 7 T c 6 0 M V b 9 2 I J P W i 8 F Q O Y n E W s H 7 o 9 T z O J F E X 5 l I 3 Y P K K M z 8 / H p 7 n / e G F W b V r t l 7 W O H A K U E W x 6 p H 5 g i 4 u E Y E j Q w C B E J K w D 4 a U n g 4 c 2 I i J O 0 N O X E L I 0 3 G B I S r k z U g l S M G I v a Z v n 3 a d g g 1 p r 3 K m 2 s 3 p F J / e h J w W N s g T k S 4 h r E 6 z d D z T m R X 7 W + 5 c 5 1 R 3 G 9 D f L X I F x E p c E f u X b 6 T 8 r 0 / V I t H D n q 7 B o 5 p i z a j q e J E l 0 1 1 R N 7 e + V C U p Q 0 y c w p c U T w h z 7 R z 1 2 d K e V N e u e s t 0 / E 0 r F a v 2 v N B m e F e 3 p A E 7 P 8 c 5 D p p b N W e 7 t n O y X d 0 / K E Z d x h r W s U n z 3 M U + j l B H Q 8 / 8 E U 9 4 N p r G r X F n 3 H 9 K j V L h W c W 3 Z T x 8 A O S h l g I = < / l a t e x i t > (d) sample from D i < l a t e x i t s h a _ b a s e = " b g M q N P Y x S i A O x N b f o Q E = " > A A A C y H i c j V H L S s N A F D N r p f V Z d u g k V w V R K p L K o C F V w b S F W i V J p V o X k w m S i n d + A N u c v E P C / M Y Q S i E K c O f e e M P v Z K A p K y X g r G z O z c / E J x s b S v L K V l f a K Z x J n z m + H E Q i b n p i z g E X M k l w F r J K o R e w l j c V v H W L R M p j M L O U p Y N Q H E e z V E O S d X Y z L l e s q q W X O Q s H F S Q r Z c f s Y l e o j h I M I h g i S c A A X K T d L C Q E N f F m D h B i O s w w Q l m a U x S j D J X Z I w H t O j k b V p l r t y k B v Y K U J n Z I E O e I K x O M U K / c r D V U b K v k F i J G L / n m / l e n a p H o D X w K m m R D O q O j y X R X M N L V J c k i I U h H c U H Y r P P p t a k + r a V W d H X / V m Y p V e z / P z f C m b k k D t n + O c x o p r b p / X q v U j / J R F G F b e z S P A Q x y k a c M i b w G P e D L O j M S M Y f q U Y h z i z L u w G y p Z E < / l a t e x i t >

M i

< l a t e x i t s h a 1 _ b a s e 6 4 = " S b 0 h c j 0 Y K l 0 c 0 7 a 1 c u 9 f 5 Z C R Q I k = " > A A A C 1 H i c j V H L S s N A F D 2 N 7 2 e j L t 0 E i + C q p F L R Z d G N G 0 H B 1 o K t Z T J O 2 6 F 5 k U y E U r s S t / 6 A W / 0 m 8 Q / 0 L 7 w z R v C B 6 I Q k Z 8 4 9 5 8 7 c e 7 3 Y l 6 l y 3 e e C N T E 5 To realize our proposed precision collaboration, we develop a novel framework named PCFL shown in Figure 1 . We assert that the key to precisely collaborative model learning is identifying and utilizing the distribution overlaps scattered in other clients. These overlaps between clients indeed correspond to a specific data manifold region. We propose to infer the local data manifold to identify the overlaps. While it is hard to learn the local manifold from the insufficient data in local clients directly, we firstly infer the underlying manifold M g of the data from all clients, so that the data from all overlapped distributions are utilized for the manifold inference. Then the local manifold M i ⊂ M g of the i-th client could be determined by local data D i as shown in Figure 1 (b). N T 0 z O z e / s L i 0 X L R X V h t p l C V c 1 H n k R 0 n T Y 6 n w Z S j q S i p f N O N E s M D z x Z k 3 O N D x s y u R p D I K T 9 U w F u 2 A 9 U L Z l Z w p o j p 2 s R U w 1 e f M H x 2 N L 0 Z y 3 L F L b t k 1 y / k J K j k o I V / H k f 2 E F i 4 R g S N D A I E Q i r A P h p S e c 1 T g I i a u j R F x C S F p 4 g J j z J M 3 I 5 U g B S N 2 Q N 8 e 7 c 5 z N q S 9 z p k a N 6 d T f H o T c j r Y J E 9 E u o S w P s 0 x 8 c x k 1 u x v u U c m p 7 7 b k P 5 e n i s g V q F P 7 F + + D + V / f b o W h S 7 2 T A 2 S a o o N o 6 v j e Z b M d E X f 3 P l U l a I M M X E a X 1 I 8 I c y N 8 6 P P j v G k p n b d W 2 b i L 0 a p W b 3 n u T b D q 7 4 l D b j y f Z w / Q W O 7 X K m W d 0 6 q p d p + P u p Z r G M D W z T P X d R w i G P U z c z v 8 Y B H q 2 F d W z f W 7 b v U K u S e N X x Z 1 t 0 b 6 W O W B A = = < / l a t e x i t > From Figure 1 (c), the local data manifold M i is used to identify the beneficial overlaps from other clients. In particular, if a subset of the data from D j lies on M i , this subset is the overlaps between the i-th and j-th clients. To further boost the local model training, we suggest sampling from M i with an optimal sampling probability estimated from local data as shown in Figure 1 (d) , which effectively mitigates the potential distribution discrepancy. We highlight our key contributions as follows: • While existing research studies FL under certain assumptions about the information sharing, we investigate a more general learning scenario where the data sharing a common distribution is fragmented among local clients; • We achieve a more precise collaboration for the federated network by proposing a framework PCFL. Our framework identifies the meaningful overlaps and excludes ambiguous information from other clients, which avoids potential negative transfer; • PCFL could be used to improve other SOTA algorithms in a plug-and-play way. Empirical experiments corroborate that PCFL significantly outperforms all baselines on a series of benchmark data sets and a real-world clinical data set.

2. RELATED WORK

2.1 FEDERATED LEARNING AND DATA HETEROGENEITY Recent years have witnessed growing attention to federated learning (McMahan et al., 2017) , of which several challenges have been concerning topics including communication efficiency (Konečnỳ et al., 2016) , privacy (Agarwal et al., 2018) and data heterogeneity (Karimireddy et al., 2020) . While data heterogeneity could cause the lack of convergence and the potential of catastrophic forgetting (Qu et al., 2022) , there are researchers aiming to tackle the heterogeneity by learning a global model. For example, Li et al. (2020) propose a proximal term to restrict the local updates to be closer to the initial model. Mohri et al. (2019) seek a fair model performance distribution by maximizing the model performance on any arbitrary target distribution. Li et al. (2021a) develop MOON that corrects local training by maximizing the agreements of representation between local and global models. Instead of pursuing a balanced performance distribution, we are interested in achieving the best performance for each client by precisely learning the shared informative overlaps from others.

2.2. PERSONALIZED FEDERATED LEARNING

In addition to reaching a global consensus, personalized model learning also attracts widespread concern in FL community, which may boost the flexibility of learned models when adapting to local distributions (Cui et al., 2022; Li et al., 2021b trade-off between local and global models. For example, Fallah et al. (2020) proposed to train local models that can quickly adapt to local data starting from an initial shared model in a meta-learning way. Some works train personalized models by interpolating between global and local models (Deng et al., 2020; Dinh et al., 2020) . Li et al. (2021b) achieve such a trade-off through regularizing local models close to the global model. There are other works suggesting a partially shared model structure for efficient information transferring (Liang et al., 2020; Collins et al., 2021) . Nonetheless, we are concerned that a global model is hard to model the various shared information between clients. The fragmented knowledge requires precise identification when collaboratively learning from others. D 0 < l a t e x i t s h a 1 _ b a s e 6 4 = " J C v 6 W O h B 1 5 V 2 A U t i 8 R w v k 8 R Y j i 8 = " > A A A C y H i c j V H L S s N A F D 2 N r 1 p f V Z d u g k V w V R K p 6 L K o C 3 F V w b S F W i V J p 3 V o X k w m S i n d + A N u 9 c v E P 9 C / 8 M 4 Y Q S 2 i E 5 K c O f e e M 3 P v 9 Z K A p 9 K y X g r G z O z c / E J x s b S 0 v L K 6 V l 7 f a K Z x J n z m + H E Q i 7 b n p i z g E X M k l w F r J 4 K 5 o R e w l j c 8 V v H W L R M p j 6 M L O U p Y N 3 Q H E e 9 z 3 5 V E O S d X Y 2 t y X a 5 Y V U s v c x r Y O a g g X 4 2 4 / I x L 9 B D D R 4 Y Q D B E k 4 Q A u U n o 6 s G E h I a 6 L M X G C E N d x h g l K p M 0 o i 1 G G S + y Q v g P a d X I 2 o r 3 y T L X a p 1 M C e g U p T e y Q J q Y 8 Q V i d Z u p 4 p p 0 V + 5 v 3 W H u q u 4 3 o 7 + V e I b E S N 8 T + p f v M / K 9 O 1 S L R x 6 G u g V N N i W Z U d X 7 u k u m u q J u b X 6 q S 5 J A Q p 3 C P 4 o K w r 5 W f f T a 1 J t W 1 q 9 6 6 O v 6 q M x W r 9 n 6 e m + F N 3 Z I G b P 8 c 5 z R o 7 l X t W n X / v F a p H + W j L m I L 2 9 i l e R 6 g j l M 0 4 J A 3 x w M e 8 W S c G Y l x Z 4 w + U o 1 C r t n E t 2 X c v w M r D J D / < / l a t e x i t > D 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " j L 4 A O A b e 4 z v n M o k V p E Y 2 w 0 3 e b O w = " > A A A C y H i c j V H L S s N A F D 2 N r 1 p f V Z d u g k V w V R K p 6 L K o C 3 F V w b S F W i V J p 3 V o X k w m S i n d + A N u 9 c v E P 9 C / 8 M 4 Y Q S 2 i E 5 K c O f e e M 3 P v 9 Z K A p 9 K y X g r G z O z c / E J x s b S 0 v L K 6 V l 7 f a K Z x J n z m + H E Q i 7 b n p i z g E X M k l w F r J 4 K 5 o R e w l j c 8 V v H W L R M p j 6 M L O U p Y N 3 Q H E e 9 z 3 5 V E O S d X Y 3 t y X a 5 Y V U s v c x r Y O a g g X 4 2 4 / I x L 9 B D D R 4 Y Q D B E k 4 Q A u U n o 6 s G E h I a 6 L M X G C E N d x h g l K p M 0 o i 1 G G S + y Q v g P a d X I 2 o r 3 y T L X a p 1 M C e g U p T e y Q J q Y 8 Q V i d Z u p 4 p p 0 V + 5 v 3 W H u q u 4 3 o 7 + V e I b E S N 8 T + p f v M / K 9 O 1 S L R x 6 G u g V N N i W Z U d X 7 u k u m u q J u b X 6 q S 5 J A Q p 3 C P 4 o K w r 5 W f f T a 1 J t W 1 q 9 6 6 O v 6 q M x W r 9 n 6 e m + F N 3 Z I G b P 8 c 5 z R o 7 l X t W n X / v F a p H + W j L m I L 2 9 i l e R 6 g j l M 0 4 J A 3 x w M e 8 W S c G Y l x Z 4 w + U o 1 C r t n E t 2 X c v w M t b Z E A < / l a t e x i t > D 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " h L C U D t 5 Q x C 3 6 8 u 6 P L g Q E 2 1 a u p 0 g = " > A A A C y H i c j V H L T s J A F D 3 U F + I L d e m m k Z i 4 I i 3 B 6 J K o C + M K E w s k i K Y d B p x Q 2 q a d a g h h 4 w + 4 1 S 8 z / o H + h X f G k q j E 6 D R t z 5 x 7 z 5 m 5 9 3 q R L x J p W a 8 5 Y 2 5 + Y X E p v 1 x Y W V 1 b 3 y h u b j W S M I 0 Z d 1 j o h 3 H L c x P u i 4 A 7 U k i f t 6 K Y u 0 P P 5 0 1 v c K L i z T s e J y I M L u U o 4 p 2 h 2 w 9 E T z B X E u W c X o 8 r k 5 t i y S p b e p m z w M 5 A C d m q h 8 U X X K G L E A w p h u A I I A n 7 c J H Q 0 4 Y N C x F x H Y y J i w k J H e e Y o E D a l L I 4 Z b j E D u j b p 1 0 7 Y w P a K 8 9 E q x m d 4 t M b k 9 L E H m l C y o s J q 9 N M H U + 1 s 2 J / 8 x 5 r T 3 W 3 E f 2 9 z G t I r M Q t s X / p p p n / 1 a l a J H o 4 0 j U I q i n S j K q O Z S 6 p 7 o q 6 u f m l K k k O E X E K d y k e E 2 Z a O e 2 z q T W J r l 3 1 1 t X x N 5 2 p W L V n W W 6 K d 3 V L G r D 9 c 5 y z o F E p 2 9 X y w U W 1 V D v O R p 3 H D n a x T / M 8 R A 1 n q M M h b 4 F H P O H Z O D c i 4 9 4 Y f a Y a u U y z j W / L e P g A L 8 6 R A Q = = < / l a t To mitigate the potential overfitting when learning from limited local data, some works also attempt generative methods to improve the model performance (Du & Wu, 2020; Zhu et al., 2021) . In particular, Zhu et al. (2021) regulate local training with the distilled knowledge from all clients. Du & Wu (2020) lead into GAN for generating similar data for local clients. However, generating data in an arbitrary density could result in distribution discrepancy. An optimal sampling density may present more benefits for local learning tasks.

3. NOTATIONS AND PROBLEM DEFINITION

3.1 NOTATIONS Suppose there are N clients in a federated network, each client owns a private dataset D k with n k data samples. The dataset D k = X k , Y k consists of the input space X k and output space Y k . We use z = {x, y} to denote a data point, and z ∈ M denotes the data manifold. The input space and the output space are shared across all clients. In the following, we also use D i to denote the i-th client without causing further confusion. The goal of each client is to learn a best model to predict the label y by collaborating with others. For example, McMahan et al. (2017) propose FedAvg, which learns a global model f for all clients by minimizing the empirical risk over the samples from all clients, i.e., min f ∈F 1 N k=1 n k N k=1 n k i=1 l f x k i , y k i , where F is the hypothesis space and l denotes the loss objective of all clients. From Eq.1, FedAvg assumes that the data from different clients associate with a common data manifold M and sampling density p g z (z), i.e., ∀ D i ∈ D 0 , D i , ..., D N -1 , s.t., z ∈ D i ⊂ M, z ∼ p g z (z). (2) 3.2 ASSUMPTIONS ON DATA HETEROGENEITY However, i.i.d. assumption in Eq.( 2) is largely violated as the local data distributions may be significantly distinctive. In this event, learning a consensus by averaging the local gradients could cause severe performance degradation on certain clients (Li et al., 2019b; Cui et al., 2021) . There are research studying federated learning with non-i.i.d. data and the assumptions on data heterogeneity are mainly from two perspectives. Clustered sharing. As shown Figure 2 (a), the clients partitioned in each cluster own a common data manifolds (M j ) and sampling density (p j z (z)), i.e., ∀ i ∈ {0, 1, ..., N -1} , ∃ j ∈ {0, 1, ..., K} (K < N ), s.t., z ∈ D i ⊂ M j and z ∼ p j z (z). (3) From Eq.(3), clustered sharing requires that all message is shared within the clusters, and there is no knowledge transferring across clusters. Common partial sharing. From Figure 2 (b), a common distribution overlap is shared across all clients. Meanwhile, each client owns specific knowledge that cannot be leveraged by others. Formally, each client associates with a specific data manifold M i , and the overlapped region of the manifold is shared across all clients, i.e., ∀ i ∈ {0, 1, ..., N -1} , s.t., z ∈ D i ⊂ M i and z ∼ p i z (z), ∀ i, j, k ∈ {0, 1, ..., N -1} (i ̸ = j ̸ = k), s.t., M i ∩ M j ̸ = ∅ and M i ∩ M j ⊂ M k . ( ) Compared with the previous assumptions above, we study a more general scenario fragmented sharing, where the shared distribution overlaps are scattered among the clients. Besides, these overlaps are inconsistent across all clients as shown in Figure 2 (c ). Fragmented sharing. The local data z ∈ D i are sampled from the local manifold M i in a particular density p i z (z), and there exist overlaps among data manifolds, i.e., z ∈ M i ⊂ R d , z ∼ p i z (z) (5a) ∃ i, j ∈ {0, 1, ..., N -1} , s.t., M i ∩ M j ̸ = ∅. ( ) where d in Eq.( 5a) is the dimension of z. Eq.( 5b) implies the shared overlaps may not be consistent across all clients, e.g., ∩ N -1 i=0 M i = ∅.

4. METHODOLOGY

4.1 PRELIMINARY: NORMALIZING FLOW Normalizing flow. The generative method NF achieves exact likelihood estimation through an invertible transformation from a known distribution to a complex target distribution. Given a target dataset D = {z 0 , z 1 , ..., z n-1 } , z i ∈ R d and a base variable e ∈ R d with a known density p e (e), classic NF methods learn a diffeomorphism f : e i = g(z i ) which maps p z to the density p e : p z (z) = p e (g(z)) |det J g (g(z))| -1 , where det J g (g(z)) ∈ R d×d denotes the Jacobian matrix evaluated at g(z). Since g is bijective, it is trackable and Eq.( 6) could be effectively computed. By fitting the dataset D, the approximated distribution p ′ z (z) is optimized through a pushforward operation. To enhance the scalability of g, one could compose several diffeomorphisms g = g n-1 • • • • • g 1 • g 0 for a larger model capacity. 4.2 AN OVERVIEW OF PCFL M i s < l a t e x i t s h a 1 _ b a s e 6 4 = " O D f W + T K Q L j d q 3 O B / 9 6 6 C E x o r U h 4 = " > A A A C 2 H i c j V H L S s N A F D 3 G d 3 1 V u 3 Q T L I K r k o q i y 6 I b N 4 K C r W K r Z T K d 1 s G 8 S C Z C C Q V 3 4 t Y f c K t f J P 6 B / o V 3 x h T U I j o h y Z l z 7 z k z 9 1 4 3 8 m S i H O d 1 z B q f m J y a n p k t z M 0 v L C 4 V l 1 c a S Z j G X N R 5 6 I X x m c s S 4 c l A 1 J V U n j i L Y s F 8 1 x O n 7 v W + j p / e i D i R Y X C i + p G 4 8 F k v k F 3 J m S K q X S y 1 f K a u O P O y w 8 F l J g f t L B m 0 i 2 W n 4 p h l j 4 J q D s r I 1 1 F Y f E E L H Y T g S O F D I I A i 7 I E h o a e J K h x E x F 0 g I y 4 m J E 1 c Y I A C a V P K E p T B i L 2 m b 4 9 2 z Z w N a K 8 9 E 6 P m d I p H b 0 x K G + u k C S k v J q x P s 0 0 8 N c 6 a / c 0 7 M 5 7 6 b n 3 6 u 7 m X T 6 z C F b F / 6 Y a Z / 9 X p W h S 6 2 D U 1 S K o p M o y u j u c u q e m K v r n 9 p S p F D h F x G n c o H h P m R j n s s 2 0 0 i a l d 9 5 a Z + J v J 1 K z e 8 z w 3 x b u + J Q 2 4 + n O c o 6 C x W a l u V b a P t 8 q 1 v X z U M 1 j F G j Z o n j u o 4 Q B H q J N 3 H 4 9 4 w r N 1 b t 1 a d 9 b 9 Z 6 o 1 l m t K + L a s h w / c 1 J f 2 < / l a t e x i t > unshared region generate data with the estimated density p z (z) < l a t e x i t s h a 1 _ b a s e 6 4 = " 7 8 O D m g e e q P M l U V G Q t q w 2 m 0 9 m i P w = " > A A A C y 3 i c j V H L S s N A F D 2 N r 1 p f V Z d u g k W o m 5 J I R Z d F N 2 6 E C v Y B t Z Q k n d b Q N A m T i d D W L v 0 B t / p f 4 h / o X 3 h n n I J a R C c k O X P u O X f m 3 u v G g Z 8 I y 3 r N G A u L S 8 s r 2 d X c 2 v r G 5 l Z + e 6 e e R C n 3 W M 2 L g o g 3 X S d h g R + y m v B F w J o x Z 8 7 Q D V j D H Z z L e O O O 8 c S P w m s x i l l 7 6 P R D v + d 7 j i C q G X c m 4 2 l x f N j J F 6 y S p Z Y 5 D 2 w N C t C r G u V f c I M u I n h I M Q R D C E E 4 g I O E n h Z s W I i J a 2 N C H C f k q z j D F D n y p q R i p H C I H d C 3 T 7 u W Z k P a y 5 y J c n t 0 S k A v J 6 e J A / J E p O O E 5 W m m i q c q s 2 R / y z 1 R O e X d R v R 3 d a 4 h s Q K 3 x P 7 l m y n / 6 5 O 1 C P R w q m r w q a Z Y M b I 6 T 2 d J V V f k z c 0 v V Q n K E B M n c Z f i n L C n n L M + m 8 q T q N p l b x 0 V f 1 N K y c q 9 p 7 U p 3 u U t a c D 2 z 3 H O g / p R y S 6 X j q / K h c q Z H n U W e 9 h H k e Z 5 g g o u U E V N z f E R T 3 g 2 L o 3 E G B v 3 n 1 I j o z 2 7 + L a M h w + h 4 Z J f < / l a t e x i t > [ N 1 j=0,j6 =i (M j \ M i ) < l a t e x i t s h a 1 _ b a s e 6 4 = " I z j q o j P h B g i B 4 c U r V i T X n v V a l b Y = " > A A A D B n i c j V H L S u R A F D 1 G Z 3 S c V 6 t L N 4 X N g I L T J I O i G 8 H H x o 2 i Y K t g t K m U p V a b F 0 l F k J C 9 f + L O n b j 1 B 2 a r + A f 6 F 9 4 q I / h g G C s k O X X u P a f q 3 h u k o c q 1 6 9 7 1 O L 1 9 n z 7 3 D 3 w Z / P r t + 4 + f j a H h z T w p M i H b I g m T b D v g u Q x V L N t a 6 V B u p 5 n k U R D K r e B 4 y c S 3 T m S W q y T e 0 K e p 3 I 3 4 Y a w O l O C a q E 5 j w R d F 2 i m 7 c + 4 k 6 / p x o u d U t V e u / v Y q N u 5 H X B 8 J H p Y r R H U r 5 g u e s l e k q i Z Y p 9 F 0 W 6 5 d 7 D 3 w a t B E v d a S x i 1 8 7 C O B Q I E I E j E 0 4 R A c O T 0 7 8 O A i J W 4 X J X E Z I W X j E h U G S V t Q l q Q M T u w x f Q 9 p t 1 O z M e 2 N Z 2 7 V g k 4 J 6 c 1 I y f C L N A n l Z Y T N a c z G C + t s 2 H 9 5 l 9 b T 3 O 2 U / k H t F R G r c U T s / 3 T P m R / V m V o 0 D j B r a 1 B U U 2 o Z U 5 2 o X Q r b F X N z 9 q I q T Q 4 p c Q b v U z w j L K z y u c / M a n J b u + k t t / F 7 m 2 l Y s x d 1 b o E H c 0 s a s P d 2 n O / B 5 p + W N 9 W a X p 9 q z i / W o x 7 A K M Y w T v O c w T y W s Y Y 2 e Z / j L 2 5 w 6 5 w 5 F 8 6 l c / W U 6 v T U m h G 8 W s 7 1 I 6 M 8 q X E = < / l a t e x i t > shared region learn from the overlapped data (2) for the unshared region of M i , PCFL generates synthetic data with the exact density p i z (z). Learning an optimal personalized model f i for the i-th client expects a sufficient utilization of the shared overlaps with other clients. However, due to privacy concerns, one cannot identify these overlaps with direct access to the raw data. We suggest leveraging the overlaps via the learned data manifold to prevent privacy leakage. As shown in Figure 3 , in general, our proposed precisely collaborative learning scheme contains: M i < l a t e x i t s h a _ b a s e = " S b h c j Y K l c a c u f Z C R Q I k = " > A A A C H i c j V H L S s N A F D N e j L t E i + C q p F L R Z d G N G H B o K t Z T J O F k U y E U r s S t / A W / m Q / L w z R v C B I Q k Z c e Y l l y e e C N T E N T z O z e / s L i X L R X V h t p l C V c H n k R n T Y n w Z S j q S i p f N O N E s M D z x Z k O N D x s y u R p D I K T U w F u A U L Z l Z w p o j p s R U w e f M H x N L Z y L F L b t k y / k J K j k o I V / H k f E F i R g S N D A I E Q i r A P h p S e c T g I i a u j R F x C S F p g J j z J M I U g B S N Q N e c z N q S z p k a N d T f H o T c j r Y J E E u o S w P s x c x k u x v u U c m p b k P e n i s g V q F P F + + D + V / f b o W h S T A S a o o N o v j e Z b M d E X f P l U l a I M M X E a X I I c y N P P j v G k p n b d W b i L a p W b n u T b D q l D b j y f Z w / Q W O X K m W d q p d p + P u p Z r G M D W z T P X d R w i G P U z c z v Y B H q F d W z f W b v U K u S e N X x Z t b W O W B A = = < / l a t e x i t > • for the shared overlaps in other clients, we aim to precisely learn from the shared overlaps identified by the local data manifold M i ; • for the remaining unshared region of M i , we expect to advance models with the generated synthetic data from M i in an optimal sampling density.

4.3. PRECISION COLLABORATION I: LEARNING FROM THE SHARED OVERLAPS

From Figure 1 (a), different clients could share different distribution overlaps, and the distribution overlaps associate with the overlapped region of the local data manifolds. While the data manifold of local clients is mostly agnostic and hardly be inferred by limited local data, we propose to learn the global data manifold with the data on all clients. In this way, all data are utilized and contribute to the manifold inference. p z (z) < l a t e x i t s h a 1 _ b a s e 6 4 = " 7 8 O D m g e e q P M l U V G Q t q w 2 m 0 9 m i P w = " > A A A C y 3 i c j V H L S s N A F D 2 N r 1 p f V Z d u g k W o m 5 J I R Z d F N 2 6 E C v Y B t Z Q k n d b Q N A m T i d D W L v 0 B t / p f 4 h / o X 3 h n n I J a R C c k O X P u O X f m 3 u v G g Z 8 I y 3 r N G A u L S 8 s r 2 d X c 2 v r G 5 l Z + e 6 e e R C n 3 W M 2 L g o g 3 X S d h g R + y m v B F w J o x Z 8 7 Q D V j D H Z z L e O O O 8 c S P w m s x i l l 7 6 P R D v + d 7 j i C q G X c m 4 2 l x f N j J F 6 y S p Z Y 5 D 2 w N C t C r G u V f c I M u I n h I M Q R D C E E 4 g I O E n h Z s W I i J a 2 N C H C f k q z j D F D n y p q R i p H C I H d C 3 T 7 u W Z k P a y 5 y J c n t 0 S k A v J 6 e J A / J E p O O E 5 W m m i q c q s 2 R / y z 1 R O e X d R v R 3 d a 4 h s Q K 3 x P 7 l m y n / 6 5 O 1 C P R w q m r w q a Z Y M b I 6 T 2 d J V V f k z c 0 v V Q n K E B M n c Z f i n L C n n L M + m 8 q T q N p l b x 0 V f 1 N K y c q 9 p 7 U p 3 u U t a c D 2 z 3 H O g / p R y S 6 X j q / K h c q Z H n U W e 9 h H k e Z 5 g g o u U E V N z f E R T 3 g 2 L o 3 E G B v 3 n 1 I j o z 2 7 + L a M h w + h 4 Z J f < / l a t e x i t > z ⇠ p z (z) < l a t e x i t s h a 1 _ b a s e 6 4 = " g C 4 E 2 H D Z j J W D v 3 m z n O e a I B W 4 p k I = " > A A A C 1 H i c j V H L T s J A F D 3 U F + K D q k s 3 j c Q E N 6 Q Y j C 6 J b l x i I o 8 E C G n L g B P 7 S j s 1 A W R l 3 P o D b v W b j H + g f + G d s S Q q M T p N 2 z P n n n N n 7 r 1 2 6 P J Y m O Z r R l t Y X F p e y a 7 m 1 t Y 3 N v P 6 1 n Y j D p L I Y X U n c I O o Z V s x c 7 n P 6 o I L l 7 X C i F m e 7 b K m f X 0 m 4 8 0 b F s U 8 8 C / F K G R d z x r 6 f M A d S x D V 0 / N j o x N z z w h 7 k / G 0 O D 7 o 6 Q W z Z K p l z I N y C g p I V y 3 Q X 9 B B H w E c J P D A 4 E M Q d m E h p q e N M k y E x H U x I S 4 i x F W c Y Y o c e R N S M V J Y x F 7 T d 0 i 7 d s r 6 t J c 5 Y + V 2 6 B S X 3 o i c B v b J E 5 A u I i x P M 1 Q 8 U Z k l + 1 v u i c o p 7 z a i v 5 3 m 8 o g V u C L 2 L 9 9 M + V + f r E V g g B N V A 6 e a Q s X I 6 p w 0 S 6 K 6 I m 9 u f K l K U I a Q O I n 7 F I 8 I O 8 o 5 6 7 O h P L G q X f b W U v E 3 p Z S s 3 D u p N s G 7 v C U N u P x z n P O g c V g q V 0 p H F 5 V C 9 T Q d d R a 7 2 E O R 5 n m M K s 5 R Q 1 3 N / B F P e N Y a 2 q 1 2 p 9 1 / S r V M 6 t n B t 6 U 9 f A D 5 i 5 U 1 < / l a t e x i t > g(z) < l a t e x i t s h a 1 _ b a s e 6 4 = " N X m 2 i C O 2 + g z R a o A / h p Y W 4 2 b M L 8 0 = " > A A A C x 3 i c j V H L S s N A F D 3 G V 6 2 v q k s 3 w S L U T U m k o s u i G 9 1 V s A + o R Z J 0 2 g 7 N i 8 m k W I s L f 8 C t / p n 4 B / o X 3 h l T U I v o h C R n z r 3 n z N x 7 3 d j n i b S s 1 z l j f m F x a T m 3 k l 9 d W 9 / Y L G x t N 5 I o F R 6 r e 5 E f i Z b r J M z n I a t L L n 3 W i g V z A t d n T X d 4 p u L N E R M J j 8 I r O Y 5 Z J 3 D 6 I e 9 x z 5 G K 6 p f u D m 4 K R a t s 6 W X O A j s D R W S r F h V e c I 0 u I n h I E Y A h h C T s w 0 F C T x s 2 L M T E d T A h T h D i O s 5 w j z x p U 8 p i l O E Q O 6 R v n 3 b t j A 1 p r z w T r f b o F J 9 e Q U o T + 6 S J K E 8 Q V q e Z O p 5 q Z 8 X + 5 j 3 R n u p u Y / q 7 m V d A r M S A 2 L 9 0 0 8 z / 6 l Q t E j 2 c 6 B o 4 1 R R r R l X n Z S 6 p 7 o q 6 u f m l K k k O M X E K d y k u C H t a O e 2 z q T W J r l 3 1 1 t H x N 5 2 p W L X 3 s t w U 7 + q W N G D 7 5 z h n Q e O w b F f K R 5 e V Y v U 0 G 3 U O u 9 h D i e Z 5 j C r O U U O d v A d 4 x B O e j Q s j M k b G 7 W e q M Z d p d v B t G Q 8 f m t e Q X Q = = < / l a t e x i t > g 1 (e) < l a t e x i t s h a 1 _ b a s e 6 4 = " L w k Q 8 Q f 2 Z U 5 m P T / t 3 5 q w w X 3 O N B g = " > A A A C z H i c j V H L S s N A F D 2 N r 1 p f V Z d u g k W o C 0 s i F V 0 W 3 b i S C v Y h t U q S j j U 0 L y Y T o Z R u / Q G 3 + l 3 i H + h f e G e c g l p E J y Q 5 c + 4 9 Z + b e 6 y a B n w r L e s 0 Z M 7 N z 8 w v 5 x c L S 8 s r q W n F 9 o 5 n G G f d Y w 4 u D m L d d J 2 W B H 7 G G 8 E X A 2 g l n T u g G r O U O T m S 8 d c 9 4 6 s f R h R g m r B s 6 / c i / 9 T 1 H E H X Z v x 7 t 2 e M y 2 7 0 p l q y K p Z Y 5 D W w N S t C r H h d f c I U e Y n j I E I I h g i A c w E F K T w c 2 L C T E d T E i j h P y V Z x h j A J p M 8 p i l O E Q O 6 B v n 3 Y d z U a 0 l 5 6 p U n t 0 S k A v J 6 W J H d L E l M c J y 9 N M F c + U s 2 R / 8 x 4 p T 3 m 3 I f 1 d 7 R U S K 3 B H 7 F + 6 S e Z / d b I W g V s c q R p 8 q i l R j K z O 0 y 6 Z 6 o q 8 u f m l K k E O C X E S 9 y j O C X t K O e m z q T S p q l 3 2 1 l H x N 5 U p W b n 3 d G 6 G d 3 l L G r D 9 c 5 z T o L l f s a u V g / N q q X a s R 5 3 H F r Z R p n k e o o Z T 1 N E g 7 x C P e M K z c W Y I Y 2 S M P 1 O N n N Z s 4 t s y H j 4 A P D K S L g = = < / l a t e x i t > e < l a t e x i t s h a 1 _ b a s e 6 4 = " Y E L Y T s q Z 6 C C 6 b v 6 B I D Q r 8 U 4 K t u U = " > A A A C x H i c j V H L S s N A F D 2 N r 1 p f V Z d u g k V w V R K p 6 L I o i M s W 7 A N q k W Q 6 r a F 5 M T M R S t E f c K v f J v 6 B / o V 3 x h T U I j o h y Z l z 7 z k z 9 1 4 / D Q O p H O e 1 Y C 0 s L i 2 v F F d L a + s b m 1 v l 7 Z 2 2 T D L B e I s l Y S K 6 v i d 5 G M S 8 p Q I V 8 m 4 q u B f 5 I e / 4 4 3 M d 7 9 x x I Y M k v l K T l P c j b x Q H w 4 B 5 i q g m v y l X n K p j l j 0 P 3 B x U k K 9 G U n 7 B N Q Z I w J A h A k c M R T i E B 0 l P D y 4 c p M T 1 M S V O E A p M n O M e J d J m l M U p w y N 2 T N 8 R 7 X o 5 G 9 N e e 0 q j Z n R K S K 8 g p Y 0 D 0 i S U J w j r 0 2 w T z 4 y z Z n / z n h p P f b c J / f 3 c K y J W 4 Z b Y v 3 S z z P / q d C 0 K Q 5 y a G g K q K T W M r o 7 l L p n p i r 6 5 / a U q R Q 4 p c R o P K C 4 I M 6 O c 9 d k 2 G m l q 1 7 3 1 T P z N Z G p W 7 1 m e m + F d 3 5 I G 7 P 4 c 5 z x o H 1 X d W v W 4 W a v U z / J R F 7 G H f R z S P E 9 Q x y U a a B n v R z z h 2 b q w Q k t a 2 W e q V c g 1 u / i 2 r I c P Q J S P c g = = < / l a t e x i t > u < l a t e x i t s h a 1 _ b a s e 6 4 = " K R 4 B + U j 5 0 W I U 8 V G K E d 1 p V k p 0 y 2 k = " > A A A C x H i c j V H L S s N A F D 2 N r 1 p f V Z d u g k V w V R K p 6 L I o i M s W 7 A N q k W Q 6 r a G T B 5 m J U I r + g F v 9 N v E P 9 C + 8 M 6 a g F t E J S c 6 c e 8 + Z u f f 6 i Q i k c p z X g r W w u L S 8 U l w t r a 1 v b G 6 V t 3 f a M s 5 S x l s s F n H a 9 T 3 J R R D x l g q U 4 N 0 k 5 V 7 o C 9 7 x x + c 6 3 r n j q Q z i 6 E p N E t 4 P v V E U D A P m K a K a 2 U 2 5 4 l Q d s + x 5 4 O a g g n w 1 4 v I L r j F A D I Y M I T g i K M I C H i Q 9 P b h w k B D X x 5 S 4 l F B g 4 h z 3 K J E 2 o y x O G R 6 x Y / q O a N f L 2 Y j 2 2 l M a N a N T B L 0 p K W 0 c k C a m v J S w P s 0 2 8 c w 4 a / Y 3 7 6 n x 1 H e b 0 N / P v U J i F W 6 J / U s 3 y / y v T t e i M M S p q S G g m h L D 6 O p Y 7 p K Z r u i b 2 1 + q U u S Q E K f x g O I p Y W a U s z 7 b R i N N 7 b q 3 n o m / m U z N 6 j 3 L c z O 8 6 1 v S g N 2 f 4 5 w H 7 a O q W 6 s e N 2 u V + l k + 6 i L 2 s I 9 D m u c J 6 r h E A y 3 j / Y g n P F s X l r C k l X 2 m W o V c s 4 t v y 3 r 4 A G a U j 4 I = < / l a t e x i t > v < l a t e x i t s h a 1 _ b a s e 6 4 = " V T j u W H H r M D g k v J x B 5 D R P H C P z E a 8 = " > A A A C x H i c j V H L S s N A F D 2 N r / q u u n Q T L I K r k k h F l 0 V B X L Z g H 1 C L J N N p H Z o X m U m h F P 0 B t / p t 4 h / o X 3 h n T E E t o h O S n D n 3 n j N z 7 / W T Q E j l O K 8 F a 2 F x a X m l u L q 2 v r G 5 t V 3 a 2 W 3 J O E s Z b 7 I 4 i N O O 7 0 k e i I g 3 l V A B 7 y Q p 9 0 I / 4 G 1 / d K H j 7 T F P p Y i j a z V J e C / 0 h p E Y C O Y p o h r j 2 1 L Z q T h m 2 f P A z U E Z + a r H p R f c o I 8 Y D B l C c E R Q h A N 4 k P R 0 4 c J B Q l w P U + J S Q s L E O e 6 x R t q M s j h l e M S O 6 D u k X T d n I 9 p r T 2 n U j E 4 J 6 E 1 J a e O Q N D H l p Y T 1 a b a J Z 8 Z Z s 7 9 5 T 4 2 n v t u E / n 7 u F R K r c E f s X 7 p Z 5 n 9 1 u h a F A c 5 M D Y J q S g y j q 2 O 5 S 2 a 6 o m 9 u f 6 l K k U N C n M Z 9 i q e E m V H O + m w b j T S 1 6 9 5 6 J v 5 m M j W r 9 y z P z f C u b 0 k D d n + O c x 6 0 j i t u t X L S q J Z r 5 / m o i 9 j H A Y 5 o n q e o 4 Q p 1 N I 3 3 I 5 7 w b F 1 a g S W t 7 D P V K u S a P X x b 1 s M H a P S P g w = = < / l a t e x i t > h(u) < l a t e x i t s h a 1 _ b a s e 6 4 = " 8 I e s m V w K u v p o J b g R 9 s 4 K m V K Q B y w = " > A A A C x 3 i c j V H L S s N A F D 2 N r 1 p f V Z d u g k W o m 5 J I R Z d F N 7 q r Y B 9 Q i y T p t A 3 N i 8 m k W I o L f 8 C t / p n 4 B / o X 3 h m n o B b R C U n O n H v P m b n 3 u k n g p 8 K y X n P G w u L S 8 k p + t b C 2 v r G 5 V d z e a a Z x x j 3 W 8 O I g 5 m 3 X S V n g R 6 w h f B G w d s K Z E 7 o B a 7 m j c x l v j R l P / T i 6 F p O E d U N n E P l 9 3 3 O E p I b l 7 P C 2 W L I q l l r m P L A 1 K E G v e l x 8 w Q 1 6 i O E h Q w i G C I J w A A c p P R 3 Y s J A Q 1 8 W U O E 7 I V 3 G G e x R I m 1 E W o w y H 2 B F 9 B 7 T r a D a i v f R M l d q j U w J 6 O S l N H J A m p j x O W J 5 m q n i m n C X 7 m / d U e c q 7 T e j v a q + Q W I E h s X / p Z p n / 1 c l a B P o 4 V T X 4 V F O i G F m d p 1 0 y 1 R V 5 c / N L V Y I c E u I k 7 l G c E / a U c t Z n U 2 l S V b v s r a P i b y p T s n L v 6 d w M 7 / K W N G D 7 5 z j n Q f O o Y l c r x 1 f V U u 1 M j z q P P e y j T P M 8 Q Q 0 X q K N B 3 k M 8 4 g n P x q U R G 2 P j 7 j P V y G n N L r 4 t 4 + E D k V W Q W Q = = < / l a t e x i t > (0 vector) h 1 (u 0 ) < l a t e x i t s h a 1 _ b a s e 6 4 = " q / i d d b m K r e 3 u s q X 2 e y 5 / Z X 0 H 8 r s  = " > A A A C z X i c j V H L T s J A F D 3 U F + I L d e m m k R h x I W k N R p d E N + 7 E R B 4 R 0 b R l w A l 9 p Z 2 a E M S t P + B W f 8 v 4 B / o X 3 h l L o h K j 0 7 Q 9 c + 4 9 Z + b e a 4 c u j 4 V h v G a 0 q e m Z 2 b n s f G 5 h c W l 5 J b + 6 V o + D J H J Y z Q n c I G r a V s x c 7 r O a 4 M J l z T B i l m e 7 r G H 3 j 2 W 8 c c u i m A f + u R i E r O 1 Z P Z 9 3 u W M J o i 5 u r o a 7 5 q i Y b O 9 c 5 w t G y V B L n w R m C g p I V z X I v + A S H Q R w k M A D g w 9 B 2 I W F m J 4 W T B g I i W t j S F x E i K s 4 w w g 5 0 i a U x S j D I r Z P 3 x 7 t W i n r 0 1 5 6 x k r t 0 C k u v R E p d W y R J q C 8 i L A 8 T V f x R D l L 9 j f v o f K U d x v Q 3 0 6 9 P G I F b o j 9 S z f O / K 9 O 1 i L Q x a G q g V N N o W J k d U 7 q k q i u y J v r X 6 o S 5 B A S J 3 G H 4 h F h R y n H f d a V J P i R A M E Q T h A C 4 y e r p w Y C M h 7 g Q D 4 l J C X M c Z h i i R V l I W o w y X 2 E v 6 n t O u m 7 M R 7 Z V n p t U + n R L Q m 5 L S w i p p Y s p L C a v T L B 2 X 2 l m x v 3 k P t K e 6 W 5 / + X u 4 V E i t w Q e x f u l H m f 3 W q F o E z b O s a O N W U a E Z V 5 + c u U n d F 3 d z 6 V J U g h 4 Q 4 h U 8 p n h L 2 t X L U Z 0 t r M l 2 7 6 q 2 r 4 6 8 6 U 7 F q 7 + e 5 E m / q l j R g 5 / s 4 f 4 L 2 R s 2 p 1 z b 3 6 5 X G T j 7 q I p a x g j W a 5 x Y a 2 E M T L f K + x g M e 8 W Q c G T f G r X H 3 k W o U c s 0 S v i z j / h 1 q n J W 5 < / l a t e x i t > p u 0 (u 0 ) < l a t e x i t s h a 1 _ b a s e 6 4 = " / F 8 I D z k G Z p 9 Q o r u B 4 C V I T k x S u K w = " > A A A C z X i c j V H L T s J A F D 3 U F + I L d e m m k R h w Q 1 q D 0 S X R j T s x k U d E Q t o y Y E N f a W d M C O L W H 3 C r v 2 X 8 A / 0 L 7 4 w l U Y n R a d q e O f e e M 3 P v t S P P T b h h v G a 0 u f m F x a X s c m 5 l d W 1 9 I 7 + 5 1 U h C E T u s 7 o R e G L d s K 2 G e G 7 A 6 d 7 n H W l H M L N / 2 W N M e n s p 4 8 5 b F i R s G l 3 w U s Y 5 v D Q K 3 7 z o W J + o q 6 o 5 F c V I S x f 1 u v m C U D b X 0 W W C m o I B 0 1 c L 8 C 6 7 R Q w g H A j 4 Y A n D C H i w k 9 L R h w k B E X A d j 4 m J C r o o z T J A j r a A s R h k W s U P 6 D m j X T t m A 9 t I z U W q H T v H o j U m p Y 4 8 0 I e X F h O V p u o o L 5 S z Z 3 7 z H y l P e b U R / O / X y i e W 4 I f Y v 3 T T z v z p Z C 0 c f x 6 o G l 2 q K F C O r c 1 I X o b o i b 6 5 / q Y q T Q 0 S c x D 2 K x 4 Q d p Z z 2 W V e a R N U u e 2 u p + J v K l K z c O 2 m u w L u 8 J Q 3 Y / D n O W d A 4 K J u V 8 u F F p V A 9 S U e d x Q 5 2 U a J 5 H q G K M 9 R Q J + 8 A j 3 j C s 3 a u C e 1 O u / 9 M 1 T K p Z h v f l v b w A Z R 3 k r c = < / l a t e x i t >

Split

Figure 4 : Illustrations of the manifold learning via a NF method. For a complex distribution p z (z), we learn a tractable injective chart g • h, which models p z (z) to a simple distribution p u ′ (u ′ ). Learn the global manifold. The data z ∈ D are usually supported on an unknown lower-dimensional manifold M. In our realization, we propose to use a NF method to learn the global manifold M g . Given the data z, a bijective transformation g θ is used to obtain the latent representation e ∈ E, e = g θ (z), where z = g -1 θ (e), (7) which avoids the risk of information loss during encoding. While a classical NF requires the latent variable e ∈ E to have the same dimension with the data space Z, following the design in (Brehmer & Cranmer, 2020) , we separate the latent space E = U × V as shown in Figure 4 , where U = R d ′ denotes the coordinates on the manifold. V = 0 d-d ′ denotes the remaining coordinates, which are the directions orthogonal to the manifold. To model the density p u (u), we transform the variable u to the variable u ′ with the given density p u ′ (u ′ ) using a bijective model h ϕ : u ′ = h ϕ (u), where u = Split(e), where Split(e) denotes deleting the dd ′ dimensional 0 vector from e, and Pad(u) denotes the inverse operation. Please note that in the rest of this paper we will use g * θ to denote Split • g θ and g * -1 θ denotes g -1 θ • Pad. After the training of the model with the parameters θ and ϕ, we learn a diffeomorphism from the data z ∼ p z (z) to a lower dimension space u ′ ∼ p u ′ (u ′ ) with the encoder h ϕ • g * θ . This means that we transform the original data manifold M g to the projected data manifold U ′ . Note that the decoder is the inverse of the encoder. The data is reconstructed given the latent variable u ′ ∈ U ′ , i.e., z = g * -1 θ • h -1 ϕ (u ′ ). Following the work in (Brehmer & Cranmer, 2020) , we train g θ and h ϕ by a two-stage optimization framework. In particular, we first train g θ to obtain the projection onto the manifold by minimizing the reconstruction error. Then, we optimize h ϕ to approximate the density by maximizing the likelihood (Brehmer & Cranmer, 2020) . More implementation details could be found in Appendix due to the page limit. Determine the local manifold. A local data manifold M i should contain the local data D i . Considering that the original global manifold M g and the latent space U ′ (U ′ = R d ′ ) is topologically equivalent, we propose to approximate the local manifold with the projected representation: M i = g * -1 θ • h -1 ϕ (U ′i ), where U ′i = h ϕ • g * θ (x i j ) n i j=1 , where U ′i denotes the set of the samples transformed to U ′ from D i , and U ′i is called the projected local data manifold, which is computed as the convex hull of U ′i . Note that U ′i may have a clustered structure. In the realization, we could also firstly cluster the U ′i . The union of the convex hull of all clusters is the projected local data manifold, and the original local data manifold (M i ) could be obtained by M i = g * -1 θ • h -1 ϕ (U ′i ). Identify the data overlaps from other clients. Since we cannot determine the data overlaps directly because of privacy concerns, we propose to identify the overlaps using the learned local manifolds. Note that the data overlaps correspond to the overlaps of the data manifolds. For example, suppose D i,j is a subset of D i , if D i,j lies on M i , D i,j is the data overlap between D i and D j . From Eq.( 9), M i is reconstructed by g * -1 θ • h -1 ϕ with U ′i . Therefore, D i,j could be identified as follows: D i,j = z j k |h ϕ • g * θ (z j k ) ∈ U ′i , k = 1, ..., n j ⊂ M i . From Eq.( 10), the data overlap D i,j is the subset in which each sample is transformed in the projected local data manifold U ′i . By learning from the overlaps identified from other clients, we have the following objective, min fi∈F 1 n i n i j=1 ℓ(f (x i j ), y i j ) + α • 1 N -1 N -1 k=0,k̸ =i E (x k ,y k )∈D i,k (ℓ(f (x k ), y k )), where α > 0 is the regularization parameter, which controls the trade-off between the risk on the i-th client and other clients.

4.4. PRECISION COLLABORATION II: LEARNING WITH AN OPTIMAL SAMPLING DENSITY

In Sec.4.3, we learn personalized models from the data overlaps between clients. However, the model performance on the unshared data cannot be improved by collaborating with others. The specific region M i s has no overlap with others, which is formulated as M i s = g * -1 θ • h -1 ϕ (U ′i s ), where U ′i s = U ′i -∪ N -1 j=0,j̸ =i (U ′j ∩ U ′i ), We propose to advance the model by generating data sampled from the local manifold M i . While an arbitrary sampling density could generate data D ′i deviated from the local distribution d(D ′i , D i ) > ϵ, this could induce bias to the learned model. An optimal utilization of the synthetic data expects a sampling density close to p i z (z). Therefore, we propose to sample from M i with the exact estimation of p i z (z). Exact likelihood estimation. Note that we learn the manifold by applying a normalizing flow framework, which achieves the exact likelihood estimation simultaneously. Since we learn the global data manifold, the global data density p g z (z) is transformed to p u ′ (u ′ ). For the local data density p i z (z), we have the following proposition. Proposition 1. (proof in Appendix) For any data point z ∈ M i s , the local density p i z (z) satisfies p i z (z) = c • p u ′ (h ϕ • g * θ (z)) det J h ϕ (h ϕ • g * θ (z)) -1 det J T g * θ (g * θ (x)) J g * θ (g * θ (z)) -1 2 , ( ) where c is a proportionality constant, and J h ϕ and J g * θ are the Jacobian matrix of h ϕ and g * θ , respectively. From the Proposition 1, to sample z ∈ M i s in the density p i z (z), we could firstly sample u ′ ∼ p u ′ (u ′ ) and choose u ′ ∈ U ′i s defined in Eq.( 12). Then we transform the sampled u ′ to the data space by z = g * -1 θ • h -1 ϕ (u ′ ). The final objective is as follows min fi∈F 1 n i n i j=1 ℓ(f (x i j ), y i j )+α• 1 N -1 N -1 k=0,k̸ =i E (x k ,y k )∈D i,k (ℓ(f (x k ), y k ))+β•E (x,y)∼p i z (z) ℓ(f (x), y), where the sampled (x, y) in the third term satisfies (x, y) ∈ M i s .

4.5. MORE DISCUSSION ON PCFL

From the objective formulated in Eq.( 14), PCFL advances a generative framework for efficiently collaborative learning. In addition to the properties summarized as follows, more discussions could be found in Appendix, • Scalability: Since PCFL achieves a smarter utilization of data from other clients, it is pluggable for other FL algorithms. Experiments in Sec. 5 also verify that PCFL could benefit other SOTA baselines; • Similarity metric: PCFL identifies the overlaps among federated networks, which inspires a novel metric for measuring the similarity between clients. More information about it could be found in Appendix. 

5. EXPERIMENTS

We intuitively show the motivation of our method by conducting experiments on synthetic data. We compare our method with various baselines on a wide range of benchmark datasets, including image and tabular datasets. More importantly, the practicability of our method is validated in a real-world clinical federated scenario on eICU dataset (Pollard et al., 2018) . The source codes are made publically available at https://github.com/pcfl/pcfl.

5.1. SYNTHETIC EXPERIMENTS

Synthetic data. Suppose there are 96 clients: D i , i ∈ {1, 2..., 96}. The data points z = {x, y} is generated from two objectives y = sin(x) + ϵ or y = -sin(x) + ϵ shown in Figure 5 (a), where ϵ ∼ N (0, 0.1) denotes label noise. Fragmented data overlaps. To generate heterogeneous and overlapped local data, we sample x from the overlapped ranges. In particular, we separate the input space X into four intervals [0, π 2 ], [ π 2 , π], [π, 3π ] and [ 3π 2 , 2π], and each client randomly chooses two different intervals to sample data. To create conflicting learning tasks, the label of the selected 48 clients is calculated by y = sin(x) + ϵ, and the label of the remaining 48 clients is calculated by y = -sin(x) + ϵ. In this setting, learning a global model for all clients could hurt the model performance as there are two conflicting learning tasks as shown in Figure 5 (c) . The best way of collaborative learning for each client is identifying the data overlaps which are sampled from the identical objective with the same intervals. For example, D 0 consists of the data sampled from [0, π 2 ] and [ π 2 , π], while D 1 consists of the data sampled from [ π 2 , π] and [π, 3π 2 ] shown in Figure 5 (a) . Learning an optimal model for D 0 needs to precisely identify the data overlap sampled from [ π 2 , π] in D 1 . From Figure 5 (e), PCFL efficiently obtains local data manifolds and identifies the data overlaps between clients. Therefore, PCFL learns a better model by precision collaboration which maximizes the benefits and avoids potential negative transfer from other clients as shown in Figure 5 (d) .

FedRep* PCFL+baseline

Datasets. We adopt three benchmark image datasets: CIFAR10 (Krizhevsky et al., 2009) , FEM-NIST (Caldas et al., 2018) , CelebA (Liu et al., 2015) , and a tabular dataset Adult (Kohavi et al., 1996) . We create the federated environment with data heterogeneity for CIFAR10 by randomly allocating several classes to each client following the work (McMahan et al., 2017) . We use K to denote the number of clients and S to denote the number of classes in each client. For CIAFR10, K = 150, S = 5 means there are 150 clients and each client contains 5 classes of images. For FEMNIST which has 10 classes of handwritten letters, we consider the setting of K = 200, S = 5. The number of samples in each client is determined according to a log-normal distribution (Li et al., 2019a) . The task on CelebA is to classify whether the celebrity in the image is smiling (Li et al., 2021b) . There are 545 clients and 21 samples per client in average. For the tabular dataset Adult, the task is to predict whether an individual's income is beyond 50K/year based on some census features, including age, race, workclass, etc. Following the setting in (Mohri et al., 2019) , all individuals are split into two clients. One is PhD client and the other is non-PhD client. Baselines. We compare our method with various baseline, including global and personalized methods. Global baselines include: 1) Fe-dAvg (McMahan et al., 2017) ; 2) FedProx (Li et al., 2020) . Personalized baselines include: 1) Fed-MTL (Smith et al., 2017) ; 2) PerFe-dAvg (Fallah et al., 2020) ; 3) LG-FedAvg (Liang et al., 2020) ; 4) Fed-Per (Arivazhagan et al., 2019); 5) Fe-dRep (Collins et al., 2021) ; 6) APFL (Deng et al., 2020) ; 7) L2GD (Hanzely & Richtárik, 2020) ; 8) Ditto (Li et al., 2021b) .

5.2. BENCHMARK EXPERIMENTS

Experimental Results. The accuracy of all methods on CIFAR10 dataset are shown in Table 1 1 . PCFL outperforms all baselines on this classification task. Since each client has insufficient data samples (n i = 333), FedAvg (72.3%) learning from all data has a better performance compared with local (68.9). From Table 1 , FedRep (82.2%) surpasses other baselines by learning a global feature extractor. As a pluggable method, PCFL could be used to enhance the performance of other art methods. From Figure 6 , PCFL improves the performance of FedRep by 5.1%, which indicates that PCFL effectively identifies the informative knowledge from others. Similar phenomena could also be found in the experimental results on FEMNIST. FedAvg achieves a better performance compared with other baselines because of the relatively slighter heterogeneity. PCFL also outperforms all baselines on both learning tasks. Moreover, PCFL successfully boosts the models learning on the three baselines as shown in Figure 7 . More experimental results could be found in Appendix. 3 , all methods achieve similar performance on non-PhD client. Because of the severe distribution discrepancy, naive averaging may not lead to optimal accuracy on non-PhD client. From Figure 8 , by leveraging the favorable data in non-PhD client and the learned manifold of PhD client, PCFL substantially improves the model performance of local training (↑ 7.1%).

5.3. REAL DATA EXPERIMENTS

To further verify the practicability of PCFL, we conduct experiments on a real-world clinical dataset eICU (Pollard et al., 2018) . eICU contains the patients to ICUs with hospital information. Naturally, hospitals located in different areas are local clients as in (Cui et al., 2022) , where the patient data are kept confidential. We preprocess the data following the work (Sheikhalishahi et al., 2019) and each data spans a 1-hour window. The task is to predict in-hospital mortality of each instance using the 48-hour monitoring data. In the experiments, we randomly select 14 hospitals in the federated network. We use a Bi-LSTM to implement this binary classification. We use AUC as the metric due to the severe label imbalance (more than 90% samples have negative labels). From Table 4 , PCFL still maintains the best model utility. FedAvg achieves a comparable performance compared with local. While different hospitals own different populations which could result in data heterogeneity, Ditto learns a robust personalized model and achieves better performance (78.3%). The results shown in Figure 9 prove that PCFL could also benefit the baselines in real-world scenarios.

5.4. ABLATION STUDIES

While PCFL formulated in Eq.( 14) consists three terms: 1). loss of local training; 2). loss on the identified overlapped data from other clients; 3). loss on sampled data from the manifold, to analysis the effect of each component, we conduct ablation studies on several datasets. More ablation studies and implementation details could be found in Appendix. When α = 0, only the local data and sampled data from local manifolds are used for model learning. When β = 0, only the local data and the identified overlapped data from other clients are utilized for model learning. Experiments in Table 5 demonstrate that 1). both the identified distributional overlaps (β = 0) and the data sampled with a learned distribution density (α = 0) facilitate the model learning; 2). the identified overlaps (β = 0) could achieve more performance gain than the generated data (α = 0). 

6. CONCLUSION

In this paper, we propose a precise collaboration framework PCFL for a more general FL learning scenario, where the fragmented and shared knowledge is distributed among other clients. Experiments on benchmark datasets and a real-world clinical dataset verify the superiority of our method because of optimal and precise utilization of the shared information. Our framework determines the overlaps between clients, which suggests several attractive topics, such as identifying malicious clients or noisy data in the federated network. Moreover, PCFL encourages a novel similarity metric stated in Sec. C.1. This metric could be used to provide incentives or impose charges on each client, to promote the practicality of FL in real-world applications. θ (g * θ (z)) J g * θ (g * θ (z)) -1/2 . ( ) Since we learn a global manifold M g with the data from all clients, the density of the data from all clients p g z (z) is approximated in Eq.( 17). From the definition of M i s in Eq.( 12), if there is a data point z ∈ M i s , z will cannot be sampled from any other manifolds M j ( j ̸ = i ) but M i , i.e., ∀z ∈ M g , s.t., z / ∈ M j (j ̸ = i) if z ∈ M i s . Therefore, we have p g z (z, z ∈ M i s |z ∈ M i ) = p g z (z, z ∈ M i s , z ∈ M i ) p g z (z ∈ M i ) = p g z (z, z ∈ M i s ) p g z (z ∈ M i ) = 1 p g z (z ∈ M i ) • p i z (z, z ∈ M i s ). Combining with Eq.( 17), for z ∈ M i s , we have p i z (z) = c • p u ′ (h ϕ • g * θ (z)) det J h ϕ (h ϕ • g * θ (z)) -1 det J ⊤ g * θ (g * θ (z)) J g * θ (g * θ (z)) -1/2 , and Proposition 1 holds.

B PIPELINE OF OUR FRAMEWORK PCFL

The pipeline of the global manifold learning M g is elaborated in Algorithm 1. We learn a global manifold model in the federated learning setting. There are two phases of training. Firstly, only the parameters of g θ are updated as in Line 5-7. Then the parameters of h ϕ are updated as in Line 9-10. The learned manifold model g * θ • h ϕ is utilized in our framework PCFL, whose pipeline is elaborated in Algorithm 2. To begin with, the local manifolds of clients are extracted based on Eq.( 9) and the distribution overlaps are calculated based on Eq.( 10). Since only the borders of convex hulls are exchanged, there is no leakage of sensitive information. The data from the overlapped distribution of other clients are used to train the models. They are utilized by transmitting the average gradients through the server as in Line 7-14.

Algorithm 1 Learn the global manifold in the federated learning framework

Input: epoch T m , batch size B m , initial manifold model M g with the parameters θ and ϕ. 1: for t = 0, ..., T m -1 do 2: randomly select a subset of clients S t 3: for client D i ∈ S t in parallel do 4: draw mini-batch z i : z i t1 , ..., z i t Bm ∼ D i 5: if t < T m /2 then 6: calculate the loss: 1 Bm Bm i=1 ∥ z i -g -1 θ (g θ (z i )) ∥; 7: then calculate the gradients of loss with respect to parameters θ; calculate the loss: -1 Bm Bm i=1 log p u ′ (h ϕ • g * θ (z i )) -log det J h (h ϕ • g * θ (z i )) ; 10: then calculate the gradients of loss with respect to parameters ϕ; 11: end if 12: end for 13: Server aggregates the gradients of selected clients and update the parameters θ and ϕ. 14: end for 15: Output: the learned manifold model g θ and h ϕ . Algorithm 2 Federated learning framework PCFL Input: epoch T , batch size B, initial models f 0 , ..., f N -1 , hyperparameters α and β; 1: all the clients determine the local manifold M i and U ′i based on Eq.( 9), and send U ′i to the Server. 2: the Server calculates the overlaps of U ′i between clients, calculates U ′i s based on Eq.( 12), and sends them to each client; 3: for t = 0, ..., T -1 do draw mini-batch (x i , y i ) ∼ D i ; 8: calculate the loss E (x i ,y i )∈D i (ℓ(f i (x k ), y k )) + β • E (x,y)∈p i z (z) ℓ(f i (x), y), and update the model f i using the gradients of loss; end for 16: end for 17: Output: the learned personalized models f 0 , ..., f N -1 .

C MORE DISCUSSIONS ABOUT PCFL

C.1 A NEW METRIC OF CLIENT SIMILARITY Our framework PCFL inspires a novel metric for measuring the similarity between local clients. For example, suppose the i-th client and j-th client has the identical local manifold M i = M j , the similarity between clients is close to 1. On the contrary, if the two local manifolds are disjoint M i ∩ M j = ∅, the measured similarity should be 0. In particular, we propose to measure the similarity as the Intersection of Union (IoU) of the projected local manifold, S(D i , D j ) = IoU(U ′i , U ′j ). A communication-efficient client-level collaboration. Our proposed metric allows efficient collaborator identification which reduces the communication and computation overhead. For example, we require D i to collaborate with certain clients who have a higher client similarity: min f ∈F 1 N -1 k=0,S(D i ,D k )≥ϵ n k N -1 k=0,S(D i ,D k )≥ϵ n k i=1 l f x k i , y k i , where ϵ ≥ 0 is a pre-defined threshold. Note that the objective in Eq.( 22) is different from clustered FL methods. Clustered FL methods learn a common model for each cluster while Eq.( 22) learns a personalized model for each client. Experimental results shown in Sec. E.2 verify that this method achieves a comparable performance while reducing communication and computation overhead. Previous work has explored the problem of identifying similar datasets in a graph network for downstream learning tasks (Hallac et al., 2015) . In particular, Jung (2020) formulate the learning from distributed local datasets as a convex optimization problem, and proposes to cluster the local datasets according to the learned parameters. Jung & Tran (2019) extend network lasso methods in regression tasks under a clustering assumption. These cluster-based methods could be applied in federated learning with a proper design for privacy-preserving. In our experiments, we use network lasso to cluster the local datasets under the federated setting. In Table 6 , we show the comparison of PCFL and the clustered methods. Our method outperforms all cluster-based methods, which demonstrates that a precision identification of overlaps in other clients facilitates model learning. Moreover, an interesting direction is the application of our proposed similarity metric in the graph network. For example, the manifold learning of local datasets in the graph network may also be used for similarity measurement. A classical NF method requires a fixed dimensionality of the latent space, which is the same as the dimension of the data. In this case, learning such a NF model could bring a huge computation overhead when the data is high-dimensional. In the first phase of our framework, we learn a low-dimensional manifold in a NF method, which significantly reduces the computation overhead. Our method learns from local data, data overlaps of other clients, and sampled data in the manifold. By precision collaboration, we avoid learning from all data. We make comparisons of run-time consumption with the baselines. The experiments are conducted on the same device NVIDIA GeForce RTX 2080 Ti. The results on eICU dataset are displayed in Table 7 . As a pluggable method, the time consumption of PCFL is comparable to the corresponding baselines. Fed-MTL involves computing the correlation of the parameters among all client models, which could result in more computation overhead. 10 . Moreover, we conduct experiments on FEMNIST on more heterogeneous settings with more clients. We partition the dataset into 400 clients with the Dirichlet distribution Dir 400 (0.1) and Dir 400 (0.5) following the work in (Wang et al., 2019) . We compare our method with the baselines, and the results are shown in Table 10 . With more clients, each client has fewer training samples. Local method shows poor performance (63.6% in Dir 400 (0.5)). Global methods (FedAvg and FedProx) achieve better performance under a less heterogeneous setting (Dir 400 (0.5)), while the performance of personalized methods degrades. Under two settings (Dir 400 (0.1) and Dir 400 (0.5)), PCFL outperforms all baselines by identifying the informative overlaps for each client. We conduct experiments on eICU dataset, in which we select the most similar 7 clients for each client to learn a personalized model. The experimental results are shown in Table 12 . From Table 12 , our method for identifying the collaborators achieves a comparable performance compared with baselines and reduces computation and communication overhead by collaborating with a subset of local clients. To explore the effect of ϵ on the performance of the learned models, we set ϵ by controlling the number of clients to collaborate for each client. There are 14 clients in eICU dataset. We test the number of the collaborator (C) to be 1, 3, 5, ... etc. The results are shown in Table ??. When C = 7, the learned model achieves the highest AUC (78.0). When C > 7, the performance tends to remain unchanged.

E.3 IMPLEMENTATION DETAILS

Our method is implemented with Pytorch and all experiments are run 5 times to calculate the average results with stds. We use a four-layer MLP for the synthetic experiment, three-layer MLP for FEMNIST, two-layer CNNs for CIFAR10 and CelebA, and a one-layer MLP for Adult. Following the work (Collins et al., 2021) , for all the methods we sample 10% of the clients in every global epoch. We train the models for 200 global epochs on FEMNIST, CIFAR10 and CelebA, 50 on Adult. And we train 15 local epochs for FEMNIST, CIFAR10 and Adult in every global epoch, 25 for CelebA. All models are trained with stochastic gradient descent. We use grid search to find the optimal hyperparameters α and β in the validation set of each dataset. We set α = 0.5, β = 0.5 for CIFAR10, CelebA, Adult; and set α = 1, β = 0.5 for FEMNIST and eICU. Besides, we test different manifold dimensions d ′ for each benchmark dataset. We keep d ′ as small as possible while ensuring reconstruction quality on the validation set. We d ′ = 256 for CIFAR10 and CelebA, d ′ = 12 for FEMNIST, d ′ = 32 for Adult and eICU. For synthetic experiment, the data dimension d = 3 and manifold dimension d ′ = 2 since one element of data z identically equals to 0. The source codes are made publically available at https://github.com/pcfl/pcfl.

E.4 DATASETS

In our experiments, CIFAR10, FEMNIST, CelebA and Adult are all public dataset. For the synthetic experiment, the data point z = {x, 0, y} has three elements. We add a zero element to data so that the manifold dimension is smaller than the data dimension, which simulates the situation in real-world datasets. We create the federated environment with data heterogeneity for CIFAR10 and FEMNSIT by randomly allocating several classes to each client following the work (McMahan et al., 2017) . For the dataset eICU, we follow the procedure on the website https://eicu-crd.mit.edu and got the approval for the dataset. We follow the data preprocessing as in Sheikhalishahi et al. (2019) and randomly select 14 hospitals as introduced in the main text. 



We show the best performance of PCFL in all tables. For all figures in experiments, we show the results of Local, FedAvg and the best baseline.



t e x i t s h a 1 _ b a s e 6 4 = " S b 0 h c j 0 Y K l 0 c 0 7 a 1 c u 9 f 5 ZC R Q I k = " > A A A C 1 H i c j V H L S s N A F D 2 N 7 2 e j L t 0 E i + C q p F L R Z d G N G 0 H B 1 o K t Z T J O 2 6 F 5 k U y E U r s S t / 6 A W / 0 m 8 Q / 0 L 7 w z R v C B 6 I Q k Z8 4 9 5 8 7 c e 7 3 Y l 6 l y 3 e e C N T E 5N T 0 z O z e / s L i 0 X L R X V h t p l C V c 1 H n k R 0 n T Y 6 n w Z S j q S i p f N O N E s M D z x Z k 3 O N D x s y u R p D I K T 9 U w F u 2 A 9 U L Z l Z w p o j p 2 s R U w 1 e f M H x 2 N L 0 Z y 3 L F L b t k 1 y / k J K j k o I V / H k f 2 E F i 4 R g S N D A I E Q i r A P h p Se c 1 T g I i a u j R F x C S F p 4 g J j z J M 3 I 5 U g B S N 2 Q N 8 e 7 c 5 z N q S 9 z p k a N 6 d T f H o T c j r Y J E 9 E u o S w P s 0 x 8 c x k 1 u x v u U c m p 7 7 b k P 5 e n i s g V q F P 7F + + D + V / f b o W h S 7 2 T A 2 S a o o N o 6 v j e Z b M d E X f 3 P l U l a I M M X Ea X 1 I 8 I c y N 8 6 P P j v G k p n b d W 2 b i L 0 a p W b 3 n u T b D q 7 4 l D b j y f Z w / Q W O 7 X K m W d 0 6 q p d p + P u p Z r G M D W z T P X d R w i G P U z c z v 8 Y B H q 2 F d W z f W 7 b v U K u S e N X x Z 1 t 0 b 6 W O W B A = = < / l a t e x i t > M g

Figure 1: Overview of our proposed PCFL. (a) Fragmented distribution overlaps exist among clients; (b) learn the global data manifold and determine the local manifold for each client; (c) the data from other clients lie on the local manifold M i are identified as informative overlaps; (d) learn a precise local density for synthetic data generation.

Figure 3: Two meanings of Precision Collaboration. (1) for the shared region of the local manifold M i , PCFL precisely learns from the overlapped data of other clients;(2) for the unshared region of M i , PCFL generates synthetic data with the exact density p i z (z). Learning an optimal personalized model f i for the i-th client expects a sufficient utilization of the shared overlaps with other clients. However, due to privacy concerns, one cannot identify these overlaps with direct access to the raw data. We suggest leveraging the overlaps via the learned data manifold to prevent privacy leakage. As shown in Figure3, in general, our proposed precisely collaborative learning scheme contains:

l a 1 y 9 5 a K v 6 m M i U r 9 0 6 a m + B d 3 p I G b P 4 c 5 y S o 7 5 X M c m n / r F y o H K W j z m I D m y j S P A 9 Q w Q m q q J G 3 j 0 c 8 4 V k 7 1 R L t T r v / T N U y q W Y d 3 5 b 2 8 A H p + p J w < / l a t e x i t >u 0 ⇠ p u 0 (u 0 ) < l a t e x i t s h a 1 _ b a s e 6 4 = " T c z L U K m p 9 U d 1 / U q V R m e y S F z r c 9 g = " > A A A C 1 3 i c j V H L S s N A F D 2 N r 1 p f s S 7 d B I t U N y W R i i 6 L b l x W s K 1 S p S R x 1 M G 8 S G b E U o o 7 c e s P u N U / E v 9 A / 8 I 7 Y w o + E J 2 Q 5 M y 5 9 5 y Z e 6 + X B D w T t v 1 S M M b G J y a n i t O l m d m 5 + Q V z s d z O Y p n 6 r O X H Q Z w ee m 7 G A h 6 x l u A i Y I d J y t z Q C 1 j H u 9 x V 8 c 4 V S z M e R w e i n 7 C T 0 D 2 P + B n 3 X U F U z y z L q n W c 8 d B K e g N Z H a 7 J 6 n r P r N g 1 W y / r J 3 B y U E G + m r H 5 j G O c I o Y

Figure 5: Illustrations of the synthetic experiments. (a) the learning tasks of the six clients; (b), (c) and (d) are the performance of the models learned by local training, FedAvg and PCFL; (e) the learned projected global data manifolds. The points denote the samples from different clients. The colored lines denote the identified local manifolds.

Figure 6: CIFAR10

Figure 7: FEMNIST

Figure 8: Adult We also evaluate the availability of our proposed method on a tabular data Adult, and the results on the two clients are shown in Table 3. Since the classifier on Adult is one-layer MLP, LG-FedAvg, FedPer and FedRep degrade into local training. Compared with PhD client which has 413 training samples, non-PhD has more than 30000 training samples. From Table3, all methods achieve similar performance on non-PhD client. Because of the severe distribution discrepancy, naive averaging may not lead to optimal accuracy on non-PhD client. From Figure8, by leveraging the favorable data

send their local models to the Server;6:for client D i ∈ S t in parallel do 7:

for k = 0, ..., N, k ̸ = i do 10:draw mini-batch (x k , y k ) ∼ D i,k 11: calculate the loss α • E (x k ,y k )∈D i,k ℓ(f i (x k ), y k ), and update the model f i using the gradients of loss; Server aggregates the parameters of f i from other clients and send the average to the i-th client; 14: then the i-th client D i updates the model f i with the received parameters and local gradients.

Figure 10: CelebA

COMPUTING RESOURCES Part of the experiments is conducted on a local server with Ubuntu 16.04 system. It has two physical CPU chips which are Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz with 32 logical kernels. The other experiments are conducted on a remote server. It has 8 GPUs which are GeForce RTX 2080 Ti.









Ablation studies of PCFL formulated Eq.(14).

PCFL is realized by a two-staged optimization framework. For the training of the normalizing flow in the first stage, PCFL learns a global model for all clients, which has the computation complexity as FedAvg. For the identification of the manifold overlaps, it has O(1) time complexity as the server only computes it once. For the training of local models in the second stage, PCFL learns a personalized model for each client, which has the computation complexity as other personalized methods. From the above all, PCFL achieves a similar computation complexity as baselines.

PCFL maintains data confidentiality as baselines because there is no shared data between local clients. PCFL achieves privacy-preserving as baselines because our framework learns models by communicating model parameters only. Federated learning may need further exploration to maintain data privacy. Some researchers claim there is information leakage when sharing models or gradients(Zhu et al., 2019). To alleviate this issue, there are research proposing to apply other techniques to FL methods, such as differential privacy(Wei et al., 2020), secure multi-party computation, etc. PCFL is also compatible with these techniques. We test local, FedAvg, FedRep and Ditto which are implemented with/without our method as in Table8 and Table 9. In the datset Adult, all individuals are split into two clients, one of which is PhD client and the other is non-PhD client. The non-PhD client contains 32148 training samples while the PhD client contains 413 samples. Therefore the non-PhD client of Adult can not benefit much from federated learning methods. In other datasets, our method boosts the baselines by large margins.

Experiment results of PCFL implemented on CIFAR10, FEMNIST, and CelebA (%)

Experiment results of PCFL on eICU and Adult (%)

More experimental results on FEMNIST (Acc %) methods Dir 400 (0.1) Dir 400 (0.5)



Experimental results on eICU

