A REAL-TIME CONTRIBUTION MEASUREMENT METHOD FOR PARTICIPANTS IN FEDERATED LEARN-ING

Abstract

Federated learning is a framework for protecting distributed data privacy and has participated in commercial activities. However, there is a lack of a sufficiently reasonable contribution measurement mechanism to distribute the reward for each agent. In the commercial union, if there is no mechanism like this, every agent will get the same reward. This is unfair to agents that provide better data, so such a mechanism is needed. To address this issue, this work proposes a real-time contribution measurement method. Firstly, the method defines the impact of each agent. Furthermore, we comprehensively consider the current round and the previous round to obtain the contribution rate of each agent. To verify effectiveness of the proposed method, the work conducts pseudo-distributed training and an experiment on the Penn Treebank dataset. Comparing the Shapley Value in game theory, the comparative experiment result shows that the proposed method is more sensitive to both data quantity and data quality under the premise of maintaining real-time.

1. INTRODUCTION

There are lots of data generate, collect, and access every day by smart terminals. But cause of the privacy of these data, it is usually difficult to use them. Such as the language model to predict the next word or even entire reply (Ion et al., 2016) . The emergence of federated learning breaks this data barrier. It can use agent computing power to conduct model training while maintaining data localization and privacy protection, and obtain an excellent global model. But in a commercial federation, each agent should get corresponding rewards based on its contribution to the model, not the same rewards. There are many methods for contribution measurement. Such Wang et al. ( 2019) measured the contribution of each group features in vertical federated learning, and Zhan et al. ( 2020) proposed an incentive mechanism to make each agent willing to contribute better data. But most of them need to consume large computing resources and many methods are calculated offline. In order to address this problem, this paper proposes a method for obtaining the contribution of each agent in real time with a small amount of calculation in horizontal federated learning. Our contributions in this paper are as follows: • We propose a method to measure agents' contributions and compare this method with Shapley Value. • The method we propose is sensitive to data volume and data quality, and can be used for mutual comparison between agents. • In the training process, the contribution to each agent can be obtained in real time, with low computational complexity. 

2. RELATED WORK

In this section, we will introduce the current research situation and application of federated learning, as well as the existing problems of reward distribution in the commercialization of federated learning framework, and introduce the necessity and importance of our work.

2.1. FEDERATED LEARNING

Distribute the training data on each mobile device to maintain the localization of the data, instead of transmitting the data to the central server, updating the model locally, and uploading the update results to the server. While maintaining data localization and privacy, it can aggregate the data of each agent. The central server collects agent data and uses FedSGD, FedAVG (Brendan McMahan et al., 2016) and other algorithms to maintain the central model in combination with the different optimizer (Felbab et al., 2019) , and sends the updated model to each agent. During the transmission process, methods such as homomorphic encryption are used to protect the security of data transmission and maintain the continuous iterative update of the model. This method is federated learning. At present, for different datasets, federated learning framework can be classified into horizontal federated learning, vertical federated learning, and federated transfer learning (Yang et al., 2019) . Horizontal federated learning is suitable for situations where the data provided by the agents has more of the same characteristics. In (McMahan et al., 2017) , Google proposed a solution to update the horizontal federated learning model in Android phones. vertical federated learning is suitable for situations where there is less feature overlap but more user id overlap. 2017) use the data transmitted by the homomorphic encryption agents and the central server for model training, which further strengthens the privacy of the agents' data. In (Lu et al., 2019; Kim et al., 2019) , data verification is carried out in conjunction with the blockchain to prevent the gradient information from being maliciously tampered with. Konečnỳ et al. (2016) research on reducing the consumption of communication resources in federated learning.

2.2. APPLICATION AND COMMERCIALIZATION OF FEDERATED LEARNING

Since federated learning was proposed, federated learning has been successfully applied to more and more scenarios. When considering data privacy issues, many companies will choose to use federated learning to protect data privacy to achieve cooperation. Such WeBank has successfully used federated learning in bank federations for credit evaluation and other financial aspect. In (Ren et al., 2019) , federated learning is applied to dynamic IoT system. Lu et al. (2019); Kim et al. (2019) 



Figure 1: Experimental results with randomize the word sequence.

Hardy et al. (2017) proposed a vertical federated learning method for training a logistic model. Chen et al. (2020) proposed the FedHealth method, which uses federated learning to aggregate data and uses transfer learning to obtain a personalized model. Under the framework of federated learning, there are many different directions of research. Aono et al. (

