ADVERSARIALLY ROBUST FEDERATED LEARNING FOR NEURAL NETWORKS

Abstract

In federated learning, data is distributed among local clients which collaboratively train a prediction model using secure aggregation. To preserve the privacy of the clients, the federated learning paradigm requires each client to maintain a private local training data set, and only uploads its summarized model updates to the server. In this work, we show that this paradigm could lead to a vulnerable model, which collapses in performance when the corrupted data samples (under adversarial manipulations) are used for prediction after model deployment. To improve model robustness, we first decompose the aggregation error of the central server into bias and variance, and then, propose a robust federated learning framework, named Fed BVA, that performs on-device adversarial training using the bias-variance oriented adversarial examples supplied by the server via asymmetrical communications. The experiments are conducted on multiple benchmark data sets using several prevalent neural network models, and the empirical results show that our framework is robust against white-box and black-box adversarial corruptions under both IID and non-IID settings.

1. INTRODUCTION

The explosive amount of decentralized user data collected from the ever-growing usage of smart devices, e.g., smartphones, wearable devices, home sensors, etc., has led to a surge of interest in the field of decentralized learning. To protect the privacy-sensitive data of the clients, federated learning (McMahan et al., 2017; Yang et al., 2019) has been proposed. Federated learning only allows a group of clients to train local models using their own data, and then collectively merges the model updates on a central server using secure aggregation (Acar et al., 2018) . Due to its high privacy-preserving property, federated learning has attracted much attention in recent years along with the prevalence of efficient light-weight deep models (Howard et al., 2017) and low-cost network communications (Wen et al., 2017; Konečnỳ et al., 2016) . In federated learning, the central server only inspects the secure aggregation of the local models as a whole. Consequently, it is susceptible to clients' corrupted updates (e.g., system failures, etc). Recently, multiple robust federated learning models (Fang et al., 2019; Pillutla et al., 2019; Portnoy & Hendler, 2020; Mostafa, 2019) have been proposed. These works only focus on performing clientlevel robust training or designing server-level aggregation variants with hyper-parameter tuning for Byzantine failures. However, none of them have the ability to mitigate the federated learning's vulnerability when the adversarial manipulations are present during testing, which as we shown in Section 4.1 that is mainly due to the generalization error in the model aggregation. Our work bridges this gap by investigating the error incurred during the aggregation of federated learning from the perspective of bias-variance decomposition (Domingos, 2000; Valentini & Dietterich, 2004) . Specifically, we show that the generalization error of the aggregated model on the central server can be decomposed as the combination of bias (triggered by the main prediction of these clients) and variance (triggered by the variations among clients' predictions). Next, we propose to perform the local robust training on clients by supplying them with a tiny amount of the bias-variance perturbed examples generated from the central server via asymmetrical communications. The experiments are conducted on neural networks with cross-entropy loss, however, other loss functions are also applicable as long as their gradients w.r.t. bias and variance are tractable to estimate. In this way, any gradient-based adversarial training strategies (Goodfellow et al., 2015; Madry et al., 2018 ) could be used. Compared with previous work, our major contributions include: • We provide the exact solution of bias-variance analysis w.r.t. the generalization error which is perfectly suitable for neural network based federated learning. As a comparison, performing adversarial attacks or training with conventional federated learning methods will only focus on the bias of the central model but ignore the variance. • We demonstrate that the conventional federated learning framework is vulnerable to the strong attacking methods with increasing communication rounds even if the adversarial training using the locally generated adversarial examples is performed on each client. • Without violating the clients' privacy, we show that providing a tiny amount of bias-variance perturbed data from the central server to the clients through asymmetrical communication could dramatically improve the robustness of the training model under various settings.

2. PRELIMINARIES

2.1 SETTINGS In federated learning, there is a central server and K different clients, each with access to a private training set  D k = {(x k i , t k i )} n k i=1 , client (k = 1, • • • , K). Each data D k is exclusively owned by client k and will not be shared with the central server or other clients. In addition, there is a small public training set D s = {(x s j , t s j )} ns j=1 with n s training examples from the server that is shared with clients, where n s ⌧ P K k=1 n k . Note that this will not break the privacy constraints, for example, hospitals (local devices) that contribute to a federated learned medical image diagnosis system could take a few publicly accessible images as additional inputs. The goal of federated learning is to train a global classifier f (•) using knowledge from all the clients such that it generalizes well over test data D test . The notation used in this paper is summarized in the Appendix (see Table 4 ).

2.2. PROBLEM DEFINITION

In this paper, we study the adversarial robustness of neural networksfoot_0 in federated learning setting, and we define robust decentralized learning as follows. Output: A trained model on the central server that is robust against adversarial perturbation. We would like to point out that our problem definition has the following properties: Asymmetrical communication: The asymmetrical communication between each client and server cloud is allowed: the server provides both global model parameters and limited shared data to the clients; while each client only uploads its local model parameters back to the server. Data distribution: All training examples on the clients and the server are assumed to follow the same data distribution. However, the experiments show that our proposed algorithm also achieves outstanding performance under the non-IID setting, which could be common among personalized clients in real scenarios. Shared learning algorithm: All the clients are assumed to use the identical model f (•), including architectures as well as hyper-parameters (e.g., learning rate, local epochs, local batch size). Remark. The basic assumption of this problem setting is that the learning process is clean (no malicious behaviors are observed during training), however, the intentionally generated adversarial poisoning data will be mixed with clean data during training. The eventual trained model being deployed on the devices will be robust against potential future adversarial attacks.

2.3. BIAS-VARIANCE TRADE-OFF

Following (Domingos, 2000; Valentini & Dietterich, 2004) , we define the optimal prediction, main prediction as well as the bias, variance, and noise for any real-valued loss function L(•, •) as follows: Definition 2.2. (Optimal Prediction and Main Prediction) Given loss function L(•, •) and learning algorithm f (•), optimal prediction y ⇤ and main prediction y m for an example are defined as: y ⇤ (x) = arg min y E t [L(y, t)] and y m (x) = arg min y 0 E D [L(f D (x), y 0 )] where t and D are viewed as the random variables to denote the class label and training set, and f D denotes the model trained on D. In short, the main prediction is the prediction whose average loss relative to all the predictions over data distributions is minimum, e.g., the main prediction for zeroone loss is the mode of predictions. In this work, we show that the main prediction is the average prediction of client models for mean squared (MSE) loss and cross-entropy (CE) loss in Section 4.1. Definition 2.3. (Bias, Variance and Noise) Given a loss function L(•, •) and a learning algorithm f (•), the expected loss E D,t [L(f D (x), t)] for an example x can be decomposedfoot_1 into bias, variance and noise as follows: B(x) = L(y m , y ⇤ ) and V (x) = E D [L(f D (x), y m )] and N (x) = E t [L(y ⇤ , t)] In short, bias is the loss incurred by the main prediction w.r.t. the optimal prediction, and variance is the average loss incurred by predictions w.r.t. the main prediction. Noise is conventionally assumed to be irreducible and independent to f (•). Remark. Our definitions on optimal prediction, main prediction, bias, variance and noise slightly differ from previous ones (Domingos, 2000; Valentini & Dietterich, 2004) . For example, conventional optimal prediction was defined as y ⇤ (x) = arg min y E t [L(t, y)], and it is equivalent to our definition when loss function is symmetric over its arguments, i.e., L(y 1 , y 2 ) = L(y 2 , y 1 ). Note that this decomposition holds for any real-valued loss function in the binary setting (Domingos, 2000) with a bias & variance trade-off coefficient that has a closed-form expression. For multi-class setting, we inherit their definition of bias & variance directly, and treat the trade-off coefficient as a hyper-parameter to tune because no closed-form expression of is available.

3. THE PROPOSED FRAMEWORK

A typical framework (Kairouz et al., 2019) of privacy-preserving federated learning can be summarized as follows: ( 1 It has been well studied (Belkin et al., 2019; Domingos, 2000; Valentini & Dietterich, 2004 ) that in the classification setting, the generalization error of a learning algorithm on an example is determined by the bias, variance, and irreducible noise as defined in Eq. ( 2). Similar to the previous work, we also assume a noise-free learning scenario where the class label t is a deterministic function of x (i.e., if x is sampled repeatedly, the same values of its class t will be observed). This motivates us to generate the adversarial examples by attacking the bias and variance induced by clients' models as: max x2⌦(x) B(x; w 1 , • • • , w K ) + V (x; w 1 , • • • , w K ) 8(x, t) 2 D s (3) where B(x; w 1 , • • • , w K ) and V (x; w 1 , • • • , w K ) could be empirically estimated from a finite num- ber of clients' parameters trained on local training sets {D 1 , D 2 , • • • , D K }. Here is a hyperparameter to measure the trade-off of bias and variance, and ⌦(x) is the perturbation constraint. Note that D s (on the server) is the candidate subset of all available training examples that would lead to their perturbed counterparts. This is a more feasible setting as compared to generating adversarial examples on clients' devices because the server usually has much powerful computational capacity in real scenarios that allows the usage of flexible poisoning attack algorithms. In this case, both poisoned examples and server model parameters would be sent back to each client (Backward Communication), while only clients' local parameters would be uploaded to the server (Forward Communication), i.e., the asymmetrical communication as discussed in Section 2.2. Client Update. The robust training of one client's prediction model (i.e., w k ) can be formulated as the following minimization problem. min w k 0 @ n k X i=1 L(f D k (x k i ; w k ), t k i ) + ns X j=1 L(f D k (x s j ; w k ), t s j ) 1 A (4) where xs j 2 ⌦(x s j ) is the perturbed examples that is asymmetrically transmitted from the server. Remark. Intuitively, the bias measures the systematic loss of a learning algorithm, and the variance measures the prediction consistency of the learner over different training sets. Therefore, our robust federated learning framework has the following advantages: (i) it encourages the clients to consistently produce the optimal prediction for perturbed examples, thereby leading to a better generalization performance; (ii) local adversarial training on perturbed examples allows to learn a robust local model, and thus a robust global model could be aggregated from clients. Theoretically, we could still have another alternative robust federated training strategy: min w k n k X i=1 max xk i 2⌦(x k i ) L(f (x k i ; w k ), t k i ) 8k 2 {1, 2, • • • , K} where the perturbed training examples of each client k is generated on local devices from D k instead of transmitted from the server. This min-max formula is similar to (Madry et al., 2018; Tramèr et al., 2018) where the inner maximization problem synthesizes the adversarial counterparts of clean examples, while the outer minimization problem finds the optimal model parameters over perturbed training examples. Thus, each local robust model is trained individually, nevertheless, poisoning attacks on device will largely increase the computational cost and memory usage. Meanwhile, it only considers the client-specific loss and is still vulnerable against adversarial examples with increasing communication rounds. Both phenomena are observed in our experiments (see Fig. 4 and Fig. 5 ).

4. ALGORITHM 4.1 BIAS-VARIANCE ATTACK

We first consider the maximization problem in Eq. ( 3) using bias-variance based adversarial attacks. It aims to find the adversarial example x (from the original example x) that would produce large bias and variance values w.r.t. clients' local models. Specifically, perturbation constraint x 2 ⌦(x) forces the adversarial example x to be visually indistinguishable w.r.t. x. Here we consider the wellstudied l 1 -bounded adversariesfoot_2 (Goodfellow et al., 2015; Madry et al., 2018; Tramèr et al., 2018) such that ⌦(x) := {x ||x x|| 1  ✏} for a perturbation magnitude ✏. Furthermore, we propose to consider the following two gradient-based algorithms to generate adversarial examples. Bias-variance based Fast Gradient Sign Method (BV-FGSM): Following FGSM (Goodfellow et al., 2015) , it linearizes the maximization problem in Eq. (3) with one-step attack as follows. xBV FGSM := x + ✏ • sign (r x (B(x; w 1 , • • • , w K ) + V (x; w 1 , • • • , w K ))) Bias-variance based Projected Gradient Descent (BV-PGD): PGD can be considered as a multistep variant of FGSM (Kurakin et al., 2017) and might generate powerful adversarial examples. This motivated us to derive a BV-based PGD attack: xl+1 BV PGD := Proj ⌦(x) xl + ✏ • sign r xl B(x l ; w 1 , • • • , w K ) + V (x l ; w 1 , • • • , w K ) (7) where xl is the adversarial example at the l th step with the initialization x0 = x and Proj ⌦(x) (•) projects each step onto ⌦(x). Remark. The proposed framework could be naturally generalized to any gradient-based adversarial attack algorithms where the gradients of bias B(•) and variance V (•) w.r.t. x are tractable when estimated from finite training sets. Compared with the existing attack methods (Carlini & Wagner, 2017; Goodfellow et al., 2015; Kurakin et al., 2017; Moosavi-Dezfooli et al., 2016) , our loss function the adversary aims to optimize is a linear combination of bias and variance, whereas existing work mainly focused on attacking the overall classification error that considers bias only. The following theorem states that bias B(•) and variance V (•) as well as their gradients over input x could be estimated using the clients' models. Theorem 4.1. Assume that L(•, •) is the cross-entropy loss function, then, the empirical estimated main prediction y m for an input example (x, t) has the following closed-form expression: y m (x; w 1 , • • • , w K ) = 1 K P K k=1 f D k (x; w k ). Furthermore, the empirical bias and variance, as well as their gradients over an input x are estimated as follows: B(x; w 1 , • • • , w K ) = 1 K K X k=1 L(f D k (x; w k ), t); V (x; w 1 , • • • , w K ) = L(y m , y m ) = H(y m ) Here, H(y m ) = P C j=1 y (j) m log y (j) m is the entropy of the main prediction y m and C is the number of classes. Easily, we can have their gradients in terms of the bias and variance as r x B(x; w 1 , • • • , w K ) = 1 K P K k=1 r x L(f D k (x; w k ), t) and r x V (x; w 1 , • • • , w K ) = 1 K P K k=1 P C j=1 (log y (j) m + 1)r x f (j) D k (x; w k ). Details of the proof is elaborated in A.2. In addition, we also consider the case where L(•, •) is the MSE loss function. But the gradients of MSE's bias and variance are much more computational demanding comparing with the concise formulas that cross-entropy ends up with. More comparisons are illustrated in Appendix A.5.1. for each client k 2 S r in parallel do 7: w r k , f D k , r x f D k ClientUpdate(w r 1 G , Ds , D s , k) 8: end for end for 7: end for 9: Ds BVAttack({f D k , r x f D k }|k 2 S 8: Calculate f D k (x; w r k ), r x f D k (x; w) 8x 2 D s 9: return w, f D k (x; w r k ), r x f D k (x; w) Algorithm 3 BVAttack({f D k , r x f D k }|k 2 S r ) 1: Initialize Ds = ; 2: for (x, t) 2 D s do 3: Estimate the gradients r x B(x) and r x V (x) using Theorem 4.1 4: Calculate x using Eq. ( 6) or ( 7) and add to Ds 5: end for 6: return Ds 4.2 FED BVA We present a novel robust federated learning algorithm with our proposed bias-variance attacks, named Fed BVA. Following the framework defined in Eq. (3) and Eq. ( 4 The proposed algorithm is summarized in Alg. 1. Given the server's D s and clients' training data {D k } K k=1 as input, the output is a robust global model on the server. In this case, the clean server data D s will be shared to all the clients. First, it initializes the server's model parameter w G and perturbed data Ds , and then assigns to the randomly selected clients (Steps 4-5). Next, each client optimizes its own local model (Steps 6-8) with the received global parameters w G as well as its own clean data D k , and uploads the updated parameters as well as the gradients of local model on each shared server example back to the server. At last, the server generates the perturbed data Ds (Step 9) using the proposed bias-variance attack algorithm (see Alg. 3) with aggregations (model parameter average, bias gradients average, and variance gradients average) in the similar manner as FedAvg (McMahan et al., 2017) . These aggregations can be privacy secured if additive homomorphic encryption (Acar et al., 2018) is applied.

5.1. SETTINGS

In this section, we evaluate the adversarial robustness of our proposed algorithm on four benchmark data sets: MNISTfoot_3 , Fashion-MNISTfoot_4 , CIFAR-10foot_5 and CIFAR-100 6 . The baseline models 4) -( 6). Fed Bias, Fed Variance, Fed BVA: Our proposed methods where the asymmetrical transmitted perturbed data is generated using the gradients of bias-only attack, variance-only attack, and bias-variance attack, respectively. ( 7). EAT: Ensemble adversarial training (Tramèr et al., 2018) , where each client performs local adversarial training using Eq. ( 5), and their model updates are aggregated on server using FedAvg. For fair comparisons, all baselines are modified to the asymmetrical communications setting (FedAvg and EAT have clean D s received), and all their initializations are set to be the same. ( 8). EAT+Fed BVA: A combination of baselines ( 6) and ( 7). Note that baselines ( 7) and ( 8) have high computational requirements on client devices, and are usually not preferred in real scenarios. For the defense model, we use a 4-layer CNN model for MNIST and Fashion-MNIST, and VGG9 architecture for CIFAR-10 and CIFAR-100. Regarding blackbox attacks, we apply ResNet18 (He et al., 2016) , VGG11 (Simonyan & Zisserman, 2015) , Xception (Chollet, 2017) , and Mo-bileNetV2 (Sandler et al., 2018) for CIFAR data, and provide a variety of models for MNIST and Fashion-MNIST by following the design of (Tramèr et al., 2018) . The training is performed using the SGD optimizer with fixed learning rate of 0.01 and momentum of value 0.9. The trade-off coefficient between bias and variance is set to = 0.01 for all experiments. All hyper-parameters of federated learning are presented in Table 5 in the Appendix. We empirically demonstrate that these hyper-parameter settings are preferable in terms of both training accuracy and robustness (see the details of Fig. 6 -Fig. 14 in the Appendix). To evaluate the robustness of our federated learning algorithm against adversarial attacks, except for the clean model training, we perform FGSM (Goodfellow et al., 2015) , PGD (Kurakin et al., 2017) with 10 and 20 steps towards the aggregated server model on the D test . Following (Tramèr et al., 2018; Wang et al., 2019) , the maximum perturbations allowed are ✏ = 0.3 on MNIST and Fashion-MNIST, and ✏ = 16 255 on CIFAR-10 and CIFAR-100 for both threat and defense models. For IID sampling, the data is shuffled and uniformly partitioned into each client; For non-IID setting, data is divided into 2F • K shards based on sorted labels, then assigns each client with 2 shards. Thereby, each client will have data with at most two classes.

5.2. RESULT ANALYSIS

To analyze the properties of our proposed Fed BVA framework, we present two visualization plots on MNIST using a trained CNN model where the bias and variance are both calculated on the training examples. In Fig. 1 , we visualize the extracted gradients using adversarial attack from bias, variance, and bias-variance. Notice that the gradients of bias and variance are similar but with subtle differences in local pixel areas. However, according to Theorem 4.1, the gradient calculation of these two are quite different: bias requires the target label as input, but variance only needs the model output and main prediction. From another perspective, we also investigate the bias-variance magnitude relationship with varying model complexity. As shown in Fig. 2 , with increasing model complexity (more convolutional filters in CNN), both bias and variance decrease. This result is different from the double-descent curve or bell-shape variance curve claimed in (Belkin et al., 2019; Yang et al., 2020) . The reasons are twofold: First, their bias-variance definitions are from the MSE regression decomposition perspective, whereas our decomposition utilizes the concept of main prediction, and the generalization error is decomposed from the classification perspective; Second, their implementations only evaluate the bias and variance using training batches on one central model and thus is different from the definition which requires the variance to be estimated from multiple sub-models (in our scenario, client models). The convergence plot of all baselines is presented in Fig. 3 . We observe that FedAvg has the best convergence, and all robust training will have a slightly higher loss upon convergence. This matches the observations in (Madry et al., 2018) which state that training performance may be sacrificed in order to provide robustness for small capacity networks. For the model performance shown in Fig. 4 , we observe that the aggregation of federated learning is vulnerable to adversarial attacks since both FedAvg and EAT have decreased performance with an increasing number of server-client communications. Other baselines that utilized the asymmetrical communications have increasing robustness with more communication rounds although only a small number of perturbed examples (n s = 64) are transmitted. We also observe that when communication rounds reach 40, Fed BVA starts to outperform EAT while the latter is even more resource-demanding than Fed BVA (shown in Fig. 5 , where the pie plot size represents the running time). Overall, bias-variance based adversarial training via asymmetric communication is both effective and efficient for robust federated learning. For the comprehensive experiments in Table 1 and Table 2 , it is easy to verify that our proposed model outperforms all other baselines regardless of the source of the perturbed examples (i.e., locally generated like EAT+Fed BVA or asymmetrically transmitted from the server like Fed BVA). Comparing with standard robust federated learning FedAvg AT, the performance of Fed BVA against adversarial attacks still increases 4% 13% and 2% 9% on IID and non-IID settings respectively, although Fed BVA is theoretically suitable for the cases that clients have IID samples. In Table 3 , we observe a similar trend where Fed BVA outperforms FedAvg AT on CIFAR-10 and CIFAR-100 (with 0.2% 10% increases) when defending different types of adversarial examples. Comparing with strong local adversarial training baseline EAT, we also observe a maximum 13% accuracy increase when applying its bias-variance oriented baseline EAT+Fed BVA. Overall, the takeaway is that without local adversarial training, using a bias-variance based robust learning framework will almost always outperform other baselines for defending FGSM and PGD attacks. When local adversarial training is allowed (e.g., client device has powerful computation ability), using bias-variance robust learning with local adversarial training will mostly have the best robustness. We also conducted various additional experiments in Appendix A. 

6. RELATED WORK

Adversarial Machine Learning: While machine learning models have achieved remarkable performance over clean inputs, recent work (Goodfellow et al., 2015) showed that those trained models are vulnerable to adversarially chosen examples by adding the imperceptive noise to the clean inputs. In general, the adversarial robustness of centralized machine learning models have been explored from the following aspects: adversarial attacks (Carlini & Wagner, 2017; Athalye et al., 2018; Zhu et al (Madry et al., 2018; Carlini et al., 2019; Tramèr et al., 2018) and interpretable adversarial robustness (Schmidt et al., 2018; Tsipras et al., 2018) . Federated Learning: Federated learning with preserved privacy (Konečnỳ et al., 2016; McMahan et al., 2017; Hard et al., 2018) and knowledge distillation (Chang et al., 2019; Jeong et al., 2018) has become prevalent in recent years. Meanwhile, the vulnerability of federated learning to backdoor attacks has also been explored by (Bagdasaryan et al., 2018; Bhagoji et al., 2019; Xie et al., 2019) . Following their work, multiple robust federated learning models (Fang et al., 2019; Pillutla et al., 2019; Portnoy & Hendler, 2020; Mostafa, 2019) are also proposed and studied. In this paper, we studied the federated learning's adversarial vulnerability after model deployment from the perspective of bias-variance analysis. This is in sharp contrast to the existing work that focused on the model robustness against the Byzantine failures. Bias-Variance Decomposition: Bias-variance decomposition (Geman et al., 1992 ) was originally introduced to analyze the generalization error of a learning algorithm. Then, a generalized biasvariance decomposition (Domingos, 2000; Valentini & Dietterich, 2004) was studied in the classification setting which enabled flexible loss functions (e.g., squared loss, zero-one loss). More recently, bias-variance trade-off was experimentally evaluated on modern neural network models (Neal et al., 2018; Belkin et al., 2019; Yang et al., 2020) .

7. CONCLUSION

In this paper, we proposed a novel robust federated learning framework, in which the aggregation incurred loss during the server's aggregation is dissected into a bias part and a variance part. Our approach improves the model robustness through adversarial training by supplying a few bias-variance perturbed samples to the clients via asymmetrical communications. Extensive experiments have been conducted where we evaluated its performance from various aspects on several benchmark data sets. We believe the further exploration of this direction will lead to more findings on the robustness of federated learning.



Our theoretical contribution mainly focuses on classification using neural networks with cross-entropy loss and mean squared loss.However, the proposed framework is generic to allow the use of other classification loss functions as well. This decomposition is based on the weighted sum of bias, variance, and noise. In general, t is a non-deterministic function(Domingos, 2000) of x when the irreducible noise is considered. Namely, if x is sampled repeatedly, different values of t will be observed. l1 robustness is surely not the only option for robustness learning. However, we use this standard approach to show the limitations of prior federated learning, and evaluate the improvements of our proposed framework. http://yann.lecun.com/exdb/mnist https://github.com/zalandoresearch/fashion-mnist https://www.cs.toronto.edu/ ˜kriz/cifar.html



Definition 2.1. (Adversarially Robust Federated Learning) Input: (1) A set of private training data {D k } K k=1 on K different clients; (2) Tiny amount of training data D s on the central server; (3) Learning algorithm f (•) and loss function L(•, •).

Client Update: Each client updates local model parameters w k by minimizing the empirical loss over its own training set; (2) Forward Communication: Each client uploads its model parameter update to the central server; (3) Server Update: It synchronously aggregates the received parameters; (4) Backward Communication: The global parameters are sent back to the clients. Our framework follows the same paradigm but with substantial modifications as below. Server Update. The server has two components: The first one uses FedAvg (McMahan et al., 2017) algorithm to aggregate the local models' parameters, i.e., w G = Aggregate(w 1 , • • • , w K ) = P K k=1 n k n w k where n = P K k=1 n k and w k is the model parameters in the k th client. Meanwhile, another component is designed to produce adversarially perturbed examples which could be induced by a poisoning attack algorithm for the usage of robust adversarial training.

Fed BVA 1: Input: K (number of clients, with local data sets {D k } K k=1 ); f (learning model), E (number of local epochs); F (fraction of clients selected on each round); B (batch size of local client); ⌘ (learning rate); D s (shared data set on server); ✏ (perturbation magnitude). 2: Initialization: Initialize w 0 G and Ds = ; 3: for each round r = 1, 2, • • • do 4: m = max(F • K, 1)

), key components of our algorithm are (1) bias-variance attacks for generating adversarial examples on the server, and (2) adversarial training using poisoned server examples together with clean local examples on each client. Therefore, we optimize these two objectives by producing the adversarial examples Ds and updating the local model parameters w iteratively.

Figure 1:Visualizations of bias, variance, bias+variance, and perturbed images for MNIST.

Figure 2: Bias-variance curve w.r.t. the CNN model complexity on MNIST.

Figure 3: Convergence on Fashion-MNIST(PGD-20attack)

5 which includes: (1) Comparison of efficiency and effectiveness of Fed BVA using cross-entropy loss and MSE loss; (2) Comparison of single-step Fed BVA and multi-step Fed BVA in terms of the generation of Ds ; (3) Three training scenarios of Fed BVA that use client-specific adversarial examples or universal adversarial examples; (4) Ablation study in terms of the number of shared perturb examples n s , optimizer's momentum, and the number of local epochs E; (5) Blackbox attacking transferability between various models on all four data sets under multiple settings.

where x k i , t k i , and n k are the features, label, and number of training examples in the k th

±0.001 0.669 ±0.009 0.576 ±0.005 0.267 ±0.014 0.980 ±0.002 0.491 ±0.067 0.475 ±0.057 0.158 ±0.074 FedAvg AT 0.988 ±0.000 0.802 ±0.001 0.745 ±0.014 0.512 ±0.042 0.974 ±0.005 0.649 ±0.066 0.615 ±0.045 0.363 ±0.066 Fed Bias 0.986 ±0.000 0.812 ±0.009 0.788 ±0.021 0.583 ±0.036 0.971 ±0.004 0.679 ±0.040 0.627 ±0.078 0.394 ±0.103 Fed Variance 0.985 ±0.001 0.803 ±0.007 0.779 ±0.014 0.572 ±0.019 0.973 ±0.005 0.684 ±0.004 0.622 ±0.049 0.395 ±0.049 Fed BVA 0.986 ±0.001 0.818 ±0.003 0.804 ±0.009 0.613 ±0.020 0.969 ±0.002 0.705 ±0.009 0.664 ±0.013 0.469 ±0.031 EAT 0.981 ±0.000 0.902 ±0.001 0.907 ±0.001 0.811 ±0.004 0.972 ±0.002 0.789 ±0.016 0.721 ±0.018 0.415 ±0.035 EAT+Fed BVA 0.980 ±0.001 0.901 ±0.006 0.910 ±0.004 0.821 ±0.013 0.965 ±0.005 0.811 ±0.020 0.831 ±0.013 0.670 ±0.014 Accuracy of MNIST under white-box attacks in IID and non-IID settings ±0.001 0.300 ±0.021 0.072 ±0.016 0.036 ±0.016 0.804 ±0.013 0.193 ±0.036 0.061 ±0.015 0.017 ±0.003 FedAvg AT 0.866 ±0.001 0.490 ±0.021 0.170 ±0.014 0.139 ±0.011 0.730 ±0.023 0.445 ±0.065 0.136 ±0.044 0.087 ±0.042 Fed Bias 0.862 ±0.001 0.505 ±0.015 0.199 ±0.007 0.159 ±0.003 0.709 ±0.025 0.460 ±0.038 0.149 ±0.067 0.115 ±0.054 Fed Variance 0.862 ±0.002 0.496 ±0.012 0.201 ±0.012 0.157 ±0.017 0.719 ±0.036 0.499 ±0.081 0.188 ±0.025 0.120 ±0.038 Fed BVA 0.862 ±0.003 0.528 ±0.016 0.210 ±0.023 0.180 ±0.027 0.710 ±0.045 0.495 ±0.030 0.141 ±0.021 0.093 ±0.028 EAT 0.860 ±0.005 0.773 ±0.029 0.191 ±0.012 0.103 ±0.013 0.791 ±0.012 0.597 ±0.033 0.071 ±0.050 0.027 ±0.023 EAT+Fed BVA 0.838 ±0.009 0.715 ±0.011 0.357 ±0.024 0.226 ±0.006 0.735 ±0.020 0.632 ±0.015 0.164 ±0.035 0.106 ±0.039

Accuracy of Fashion-MNIST under white-box attacks in IID and non-IID settings ±0.003 0.288 ±0.001 0.206 ±0.001 0.074 ±0.005 0.741 ±0.003 0.166 ±0.012 0.049 ±0.004 0.032 ±0.003 FedAvg 0.890 ±0.002 0.225 ±0.022 0.207 ±0.004 0.062 ±0.008 0.730 ±0.003 0.161 ±0.009 0.113 ±0.009 0.035 ±0.006 FedAvg AT 0.890 ±0.003 0.280 ±0.021 0.295 ±0.006 0.099 ±0.014 0.707 ±0.003 0.162 ±0.006 0.064 ±0.007 0.048 ±0.003 Fed Bias 0.890 ±0.004 0.280 ±0.018 0.297 ±0.011 0.103 ±0.012 0.702 ±0.002 0.163 ±0.005 0.165 ±0.007 0.061 ±0.003 Fed Variance 0.889 ±0.001 0.267 ±0.014 0.276 ±0.006 0.092 ±0.009 0.710 ±0.007 0.161 ±0.005 0.157 ±0.010 0.045 ±0.016 Fed BVA 0.889 ±0.003 0.286 ±0.013 0.301 ±0.003 0.104 ±0.012 0.709 ±0.003 0.163 ±0.007 0.165 ±0.008 0.062 ±0.005 EAT 0.833 ±0.003 0.596 ±0.003 0.667 ±0.007 0.561 ±0.002 0.661 ±0.001 0.267 ±0.002 0.206 ±0.002 0.188 ±0.001 EAT+Fed BVA 0.833 ±0.003 0.598 ±0.002 0.668 ±0.001 0.564 ±0.003 0.657 ±0.002 0.272 ±0.003 0.332 ±0.003 0.211 ±0.002

Accuracy of CIFAR-10 and CIFAR-100 under white-box attacks 2019), defense (or robust model training)

