SSELF: ROBUST FEDERATED LEARNING AGAINST STRAGGLERS AND ADVERSARIES

Abstract

While federated learning allows efficient model training with local data at edge devices, two major issues that need to be resolved are: slow devices known as stragglers and malicious attacks launched by adversaries. While the presence of both stragglers and adversaries raises serious concerns for the deployment of practical federated learning systems, no known schemes or known combinations of schemes, to our best knowledge, effectively address these two issues at the same time. In this work, we propose Sself, a semi-synchronous entropy and loss based filtering/averaging, to tackle both stragglers and adversaries simultaneously. The stragglers are handled by exploiting different staleness (arrival delay) information when combining locally updated models during periodic global aggregation. Various adversarial attacks are tackled by utilizing a small amount of public data collected at the server in each aggregation step, to first filter out the model-poisoned devices using computed entropies, and then perform weighted averaging based on the estimated losses to combat data poisoning and backdoor attacks. A theoretical convergence bound is established to provide insights on the convergence of Sself. Extensive experimental results show that Sself outperforms various combinations of existing methods aiming to handle stragglers/adversaries.

1. INTRODUCTION

Large volumes of data collected at various edge devices (i.e., smart phones) are valuable resources in training machine learning models with a good accuracy. Federated learning (McMahan et al., 2017; Li et al., 2019a; b; Konečnỳ et al., 2016) is a promising direction for large-scale learning, which enables training of a shared global model with less privacy concerns. However, current federated learning systems suffer from two major issues. First is the devices called stragglers that are considerably slower than the average, and the second is the adversaries that enforce various adversarial attacks. Regarding the first issue, waiting for all the stragglers at each global round can significantly slow down the overall training process in a synchronous setup. To address this, an asynchronous federated learning scheme was proposed in (Xie et al., 2019a) where the global model is updated every time the server receives a local model from each device, in the order of arrivals; the global model is updated asynchronously based on the device's staleness t -τ , the difference between the current round t and the previous round τ at which the device received the global model from the server. However, among the received results at each global round, a significant portion of the results with large staleness does not help the global model in a meaningful way, potentially making the scheme ineffective. Moreover, since the model update is performed one-by-one asynchronously, the scheme in (Xie et al., 2019a) would be vulnerable to various adversarial attacks; any attempt to combine this type of asynchronous scheme with existing adversary-resilient ideas would not likely be fruitful. There are different forms of adversarial attacks that significantly degrade the performance of current federated learning systems. First, in untargeted attacks, an attacker can poison the updated model at the devices before it is sent to the server (model update poisoning) (Blanchard et al., 2017; Lamport et al., 2019) or can poison the datasets of each device (data poisoning) (Biggio et al., 2012; Liu et al., 2017) , which degrades the accuracy of the model. In targeted attacks (or backdoor attacks) (Chen et al., 2017a; Bagdasaryan et al., 2018; Sun et al., 2019) , the adversaries cause the model to misclassify the targeted subtasks only, while not degrading the overall test accuracy. To resolve these issues, a robust federated averaging (RFA) scheme was recently proposed in (Pillutla et al., 2019) which utilizes the geometric median of the received results for aggregation. However, RFA tends to lose performance rapidly as the portion of adversaries exceeds a certain threshold. In this sense, RFA is not an ideal candidate to be combined with known straggler-mitigating strategies (e.g., ignoring stragglers) where a relatively small number of devices are utilized for global aggregation; the attack ratio can be very high, significantly degrading the performance. To our knowledge, there are currently no existing methods or known combinations of ideas that can effectively handle both stragglers and adversaries at the same time, an issue that is becoming increasingly important in practical scenarios. Contributions. In this paper, we propose Sself, semi-synchronous entropy and loss based filtering/averaging, a robust federated learning strategy which can tackle both stragglers and adversaries simultaneously. In the proposed idea, the straggler effects are mitigated by semi-synchronous global aggregation at the server, and in each aggregation step, the impact of adversaries are countered by a new aggregation method utilizing public data collected at the server. The details of our key ideas are as follows. Targeting the straggler issue, our strategy is to perform periodic global aggregation while allowing the results sent from stragglers to be aggregated in later rounds. The key strategy is a judicious mix of both synchronous and asynchronous approaches. At each round, as a first step, we aggregate the results that come from the same initial models (i.e., same staleness), as in the synchronous scheme. Then, we take the weighted sum of these aggregated results with different staleness, i.e., coming from different initial models, as in the asynchronous approach. Regarding the adversarial attacks, robust aggregation is realized via entropy-based filtering and loss-weighted averaging. This can be employed at the first step of our semi-synchronous strategy described above, enabling protection against model/data poisoning and backdoor attacks. To this end, our key idea is to utilize public IID (independent, identically distributed) data collected at the server. We can imagine a practical scenario where the server has some global data uniformly distributed over classes, as in the setup of (Zhao et al., 2018) . This is generally a reasonable setup since data centers mostly have some collected data (although they can be only a few) of the learning task. For example, different types of medical data are often open to public in various countries. Based on the public data, the server computes entropy and loss of each received model. We use the entropy of each model to filter out the devices whose models are poisoned. In addition, by taking the loss-weighted averaging of the survived models, we can protect the system against local data poisoning and backdoor attacks. We derive a theoretical bound for Sself to ensure acceptable convergence behavior. Experimental results on different datasets show that Sself outperforms various combinations of straggler/adversary defense methods with only a small portion of public data at the server. Related works. The authors of (Li et al., 2019c; Wu et al., 2019; Xie et al., 2019a) have recently tackled the straggler issue in a federated learning setup. The basic idea is to allow the devices and the server to update the models asynchronously. Especially in (Xie et al., 2019a) , the authors proposed an asynchronous scheme where the global model is updated every time the server receives a local model of each device. However, a fair portion of the received models with large staleness does not help the global model in meaningful ways, potentially slowing down the convergence speed. A more critical issue here is that robust methods designed to handle adversarial attacks, such as RFA (Pillutla et al., 2019 ), Multi-Krum (Blanchard et al., 2017) or the presently proposed entropy/loss based idea, are hard to be implemented in conjunction with this asynchronous scheme. To combat adversaries, various aggregation methods have been proposed in a distributed learning setup with IID data across nodes (Yin et al., 2018a; b; Chen et al., 2017b; Blanchard et al., 2017; Xie et al., 2018) . The authors of (Chen et al., 2017b) suggests a geometric median based aggregation rule of the received models or the gradients. In (Yin et al., 2018a) , a trimmed mean approach is proposed which removes a fraction of largest and smallest values of each element among the received results. In Multi-Krum (Blanchard et al., 2017) , among N workers in the system, the server tolerates f Byzantine workers under the assumption of 2f + 2 < N . Targeting federated learning with non-IID data, the recently introduced RFA method of (Pillutla et al., 2019) utilizes the geometric median of models sent from devices, similar to (Chen et al., 2017b) . However, as mentioned above, these methods are ineffective when combined with a straggler-mitigation scheme, potentially degrading the performance of learning. Compared to Multi-Krum and RFA, our entropy/loss based scheme can tolerate adversaries even with a high attack ratio, showing remarkable advantages, especially when combined with straggler-mitigation schemes. Finally, we note that the authors of (Xie et al., 2019c) considered both stragglers and adversaries but in a distributed learning setup with IID data across the nodes. Compared to these works, we target non-IID data distribution setup in a federated learning scenario.

