UNDERSTANDING ADVERSARIAL TRANSFERABILITY IN FEDERATED LEARNING

Abstract

With the promises Federated Learning (FL) delivers, various topics regarding its robustness and security issues have been widely studied in recent years: such as the possibility to conduct adversarial attacks (or transferable adversarial attacks) in a while-box setting with full knowledge of the model (or the entire data), or the possibility to conduct poisoning/backdoor attacks during the training process as a malicious client. In this paper, we investigate the robustness and security issues from a different, simpler, but practical setting: a group of malicious clients has impacted the model during training by disguising their identities and acting as benign clients, and only revealing their adversary position after the training to conduct transferable adversarial attacks with their data, which is usually a subset of the data that FL system is trained with. Our aim is to offer a full understanding of the challenges the FL system faces in this setting across a spectrum of configurations. We notice that such an attack is possible, but the federated model is more robust compared with its centralized counterpart when the accuracy on clean images is comparable. Through our study, we hypothesized the robustness is from two factors: the decentralized training on distributed data and the averaging operation. Our work has implications for understanding the robustness of federated learning systems and poses a practical question for federated learning applications.

1. INTRODUCTION

The ever-growing usage of mobile devices such as smartphones and tablets leads to an explosive amount of distributed data collected from user-end. Such private and sensitive data, if can be fully utilized, will greatly improves the power of more intelligent systems. Federated learning (FL) provides a solution for decentralized learning by training quality models through local updates and parameter aggregation McMahan et al. (2017) . A FL system maintains a loose federation of participated clients and a centralized server that holds no data but the aggregated model. During training, the central server distributes the global model to a random subset of participants where they updates the model with the private data locally and submits the updated model back to the server for aggregation (e.g. average) at each round. By design, the system has no visibility to the local data, allowing it to benefit from a wide range of private data while maintaining participant privacy, and the averaging provides an efficient way to leverage the updated parameters compared with distributed SGD. Despite the fact that FL protects privacy, the loose organization and its invisibility to local data makes it more vulnerable to the various attacks including data poisoning Huang et al. Different from the above two settings, we notice that FL is naturally raising another security and robustness challenge: during the training, the malicious client can disguise to be a benign one to contribute regularly to the update of the model parameters, and only reveal its adversary identity after the training. Thus, the client naturally obtains a subset of the data used to train the FL model, and has the potential to exploit this slice of data for adversarial attacks. While most of the federated application poses no criteria for eliminating the hostile participants Hard et al. ( 2018 In this paper, we take the first step to explore this practical perspective of robustness in FL. Stemming from the above discussed scenarios, we propose a simple yet practical assumption: the attacker possesses some but limited amount of the users' data but no knowledge about the target model or the full training set. To better understand and evaluate the robustness of current FL systems and provide implications for future research to improve the security on this regard, we conduct investigations on the adversarial transferability under FL settings. First, we establish baseline models with ResNet50 on CIFAR10 to provide preliminary understandings about the robustness of FL under white-box attack. Then we evaluate the transferability of adversarial examples generated from different source models to attack a federated-trained model. We further investigate two properties of FL, namely the decentralized training and the averaging operation, and their correlation with federated robustness. We have the following findings: • We find that, while there is indeed security challenges of this setting (i.e. the malicious clients can attack the federated model after the training through transferable adversarial examples), the federated model is more robust under white-box attack compared with its centralized-trained counterpart when their accuracy on clean images are comparable. 



(2011), model poisoning (backdoor attack) Bhagoji et al. (2019); Bagdasaryan et al. (2020), free-riders attack Lin et al. (2019) and various reconstruction attack that leaks the data and privacy of individual participants Geiping et al. (2020); Zhu et al. (2019). Various anomaly detection based methods have been proposed to prevent possible poisoning attacks such as Byzantine-tolerant aggregation Yin et al. (2018), clustering-based selection Shen et al. (2016) and anomaly detection in spectral domain Li et al. (2020). Reputation is introduced to prevent free-rider Xu & Lyu (2020) and differential privacy techniques are leveraged to preserve the privacy against GANs-based reconstruction attacks Augenstein et al. (2019); Hao et al. (2019). Another line of FL security research focuses on the attack during inference, i.e., the adversarial attack Biggio et al. (2013); Szegedy et al. (2013). Same as any other deep learning application, federated systems are also found vulnerable to the adversarial examples carefully crafted to deceive the model Zizzo et al. (2020). FAT first discusses the possibility

Figure 1: The practical setting of our interest: some clients are disguised as normal clients and participate the training process of a FL system regularly, but later contribute the data to train a malicious model for attacking the trained FL system. This paper studies the possibility of such attack across different configurations of the FL system. to attack the federated model with adversarial examples and the interplay of adversarial training with FL Zizzo et al. (2020). What is following is a line of research proposing more robust FL method tailored to defend the adversarial examples Zhou et al. (2020); Reisizadeh et al. (2020); Hong et al. (2021). Despite great progresses made, the community mainly focuses on a white-box setting where the attacker gets full access to the various aspects of the target model such as the gradient and the output of the model. However, in most real-world scenarios where FL system, e.g. Gboard Hard et al. (2018), is trained and deployed, adversary have no access to any knowledge of the model or the full training set.

), even if there are selection mechanisms (e.g. Krum for defending backdoor attack) Fang et al. (2020); Li et al. (2020); Bagdasaryan et al. (2020), the attacker undoubtedly escapes from it since no hostile actions are performed during training. The attacker, after acquiring the data, can train a substitute model to perform the transfer-based black-box attack.

• We investigate the transferability of adversarial examples generated from models trained by various number of users' data. We observe that, without any elaborated techniques such as dataset synthesisPapernot et al. (2017)  or attentionWu et al. (2020), a regularly trained source model with only limited users' data can perform transfer attack. With ResNet50 on CIFAR10 datatset, we achieve an transfer rate of over 60% with only 10% of the total clients and an transfer rate of almost 90% with 20%. With strong augmentation, source model can attack with a transfer rate of almost 70% and 80% using only 5% and 7% of the total users. • We investigate two intrinsic properties of the FL, namely the property of distributed training and the averaging operation and discover that both heterogeneity and dispersion degree of the decentralized data as well as the averaging operations can significantly decreases the transfer rate of transfer-based black-box attack.

