CENTRAL SERVER FREE FEDERATED LEARNING OVER SINGLE-SIDED TRUST SOCIAL NETWORKS

Abstract

Federated learning has become increasingly important for modern machine learning, especially for data privacy-sensitive scenarios. Existing federated learning mostly adopts the central server-based architecture or centralized architecture. However, in many social network scenarios, centralized federated learning is not applicable (e.g., a central agent or server connecting all users may not exist, or the communication cost to the central server is not affordable). In this paper, we consider a generic setting: 1) the central server may not exist, and 2) the social network is unidirectional or of single-sided trust (i.e., user A trusts user B but user B may not trust user A). We propose a central server free federated learning algorithm, named Online Push-Sum (OPS) method, to handle this challenging but generic scenario. A rigorous regret analysis is also provided, which shows interesting results on how users can benefit from communication with trusted users in the federated learning scenario. This work builds upon the fundamental algorithm framework and theoretical guarantees for federated learning in the generic social network scenario.

1. INTRODUCTION

Federated learning has been well recognized as a framework able to protect data privacy Konečnỳ et al. (2016) ; Smith et al. (2017a) ; Yang et al. (2019) . State-of-the-art federated learning adopts the centralized network architecture where a centralized node collects the gradients sent from child agents to update the global model. Despite its simplicity, the centralized method suffers from communication and computational bottlenecks in the central node, especially for federated learning, where a large number of clients are usually involved. Moreover, to prevent reverse engineering of the user's identity, a certain amount of noise must be added to the gradient to protect user privacy, which partially sacrifices the efficiency and the accuracy Shokri and Shmatikov (2015) . To further protect the data privacy and avoid the communication bottleneck, the decentralized architecture has been recently proposed Vanhaesebrouck et al. (2017); Bellet et al. (2018) , where the centralized node has been removed, and each node only communicates with its neighbors (with mutual trust) by exchanging their local models. Exchanging local models is usually favored to the data privacy protection over sending private gradients because the local model is the aggregation or mixture of quite a large amount of data while the local gradient directly reflects only one or a batch of private data samples. Although advantages of decentralized architecture have been well recognized over the state-of-the-art method (its centralized counterpart), it usually can only be run on the network with mutual trusts. That is, two nodes (or users) can exchange their local models only if they trust each other reciprocally (e.g., node A may trust node B, but if node B does not trust node A, they cannot communicate). Given a social network, one can only use the edges with mutual trust to run decentralized federated learning algorithms. Two immediate drawbacks will be: (1) If all mutual trust edges do not form a connected network, the federated learning does not apply; (2) Removing all single-sided edges from the communication network could significantly reduce the efficiency of communication. These drawbacks lead to the question: How do we effectively utilize the single-sided trust edges under decentralized federated learning framework? In this paper, we consider the social network scenario, where the centralized network is unavailable (e.g., there does not exist a central node that can build up the connection with all users, or the centralized communication cost is not affordable). We make a minimal assumption on the social network: The data may come in a streaming fashion on each user node as the federated learning algorithm runs; the trust between users may be single-sided, where user A trusts user B, but user B may not trust user A ("trust" means "would like to send information to"). For the setting mentioned above, we develop a decentralized learning algorithm called online pushsum (OPS) which possesses the following features: • Only models rather than local gradients are exchanged among clients in our algorithm. This scheme can reduce the risk of exposing clients' data privacy Aono et al. ( 2017). • Our algorithm removes some constraints imposed by typical decentralized methods, which makes it more flexible in allowing arbitrary network topology. Each node only needs to know its out neighbors instead of the global topology. • We provide the rigorous regret analysis for the proposed algorithm and specifically distinguish two components in the online loss function: the adversary component and the stochastic component, which can model clients' private data and internal connections between clients, respectively. Notation We adopt the following notation in this paper: • For random variable ξ (i) t subject to distribution D (i) t , we use Ξ n,T and D n,T to denote the set of random variables and distributions, respectively: Ξ n,T = ξ (i) t 1≤i≤n,1≤t≤T , D n,T = D (i) t 1≤i≤n,1≤t≤T . Notation Ξ n,T ∼ D n,T implies ξ (i) t ∼ D (i) t for any i ∈ [n] and t ∈ [T ]. • For a decentralized network with n nodes, we use W ∈ R n×n to present the confusion matrix, where W ij ≥ 0 is the weight that node i sends to node j (i, j ∈ [n]). N out i = {j ∈ [n] : W ij > 0} and N in i = {k ∈ [n] : W ki > 0} are also used for denoting the sets of in neighbors of and out neighbors of node i respectively. • Norm • denotes the 2 norm • 2 by default.

2. RELATED WORK

The concept of federated learning was first proposed in McMahan et al. (2016) , which advocates a novel learning setting that learns a shared model by aggregating locally-computed gradient updates without centralizing distributed data on devices. Early examples of research into federated learning also include Konečný et al. (2015; 2016) 



Figure 1: Different types of architectures.

, and a widespread blog article posted by Google AI McMahan and Ramage (2017). To address both statistical and system challenges, Smith et al. (2017b) and Caldas et al. (2018) propose a multi-task learning framework for federated learning and its related optimization algorithm, which extends early works SDCA Shalev-Shwartz and Zhang (2013); Yang (2013); Yang et al. (2013) and COCOA Jaggi et al. (2014); Ma et al. (2015); Smith et al. (2016) to the federated learning setting. Among these optimization methods, Federated Averaging (FedAvg), proposed by McMahan et al. (2016), beats conventional synchronized mini-batch SGD regarding communication rounds as well as converges on non-IID and unbalanced data. Recent rigorous theoretical analysis Stich (2018); Wang and Joshi (2018); Yu et al. (2018); Lin et al. (2018) shows that FedAvg is a special case of averaging periodic SGD (also called "local SGD") which allows nodes

