FADE: ENABLING LARGE-SCALE FEDERATED AD-VERSARIAL TRAINING ON RESOURCE-CONSTRAINED EDGE DEVICES

Abstract

Federated adversarial training can effectively complement adversarial robustness into the privacy-preserving federated learning systems. However, the high demand for memory capacity and computing power makes large-scale federated adversarial training infeasible on resource-constrained edge devices. Few previous studies in federated adversarial training have tried to tackle both memory and computational constraints at the same time. In this paper, we propose a new framework named Federated Adversarial Decoupled Learning (FADE) to enable AT on resource-constrained edge devices. FADE decouples the entire model into small modules to fit into the resource budget of each edge device respectively, and each device only needs to perform AT on a single module in each communication round. We also propose an auxiliary weight decay to alleviate objective inconsistency and achieve better accuracy-robustness balance in FADE. FADE offers a theoretical guarantee for convergence and adversarial robustness, and our experimental results show that FADE can significantly reduce the consumption of memory and computing power while maintaining accuracy and robustness.

1. INTRODUCTION

As a privacy-preserving distributed learning paradigm, Federated Learning (FL) makes a meaningful step toward the practice of secure and trustworthy artificial intelligence (Konečnỳ et al., 2015; 2016; McMahan et al., 2017; Kairouz et al., 2019) . In contrast to traditional centralized training, FL pushes the training to edge devices (clients), and client models are locally trained and uploaded to the server for aggregation. Since no private data is shared with other clients or the server, FL substantially improves the data privacy during the training process. While FL can preserve the privacy of the participants, other threats can still impact the reliability of the machine learning model running on the FL system. One of such threats is adversarial samples, which aim to cause misclassifications of the model by adding imperceptible noise into the input data (Szegedy et al., 2013; Goodfellow et al., 2014) . Previous research has shown that performing adversarial training (AT) on a large model is an effective method to attain robustness against adversarial samples while maintaining high accuracy on clean samples (Liu et al., 2020) . However, large-scale AT also puts high demand for both memory capacity and computing power, which is affordable for some edge devices with limited resources, such as mobile phones and IoT devices, in FL scenarios (Kairouz et al., 2019; Li et al., 2020; Wong et al., 2020; Zizzo et al., 2020; Hong et al., 2021) . Table 1 shows that strong robustness of the whole FL system cannot be attained by allowing only a small portion (e.g., 20%) of the clients to perform AT. Therefore, enabling resourceconstrained edge devices (which usually contribute to the majority of the participants in cross-device FL (Kairouz et al., 2019) ) to perform AT is necessary for achieving strong robustness in FL. Some previous works have tried to tackle client-wise systematic heterogeneity in FL (Li et al., 2018; Lu et al., 2020; Wang et al., 2020b; Xie et al., 2019) . The most common method to deal with the slow devices is to allow them performing less epochs of local training than the others (Li et al., 2018; Wang et al., 2020b) . While this method can reduce the computational costs on the slow devices, the memory capacity limitation on edge devices has not been well discussed in these works.

