ROBUST FEDERATED LEARNING WITH MAJORITY AD-VERSARIES VIA PROJECTION-BASED RE-WEIGHTING

Abstract

Most robust aggregators for distributed or federated learning assume that adversarial clients are the minority in the system. In contrast, this paper considers the majority adversary setting. We first show that a filtering method using a few trusted clients can defend against many standard attacks. However, a new attack called Mimic-Shift can circumvent simple filtering. To this end, we develop a re-weighting strategy that identifies and down-weights the potential adversaries under the majority adversary regime. We show that our aggregator converges to a neighborhood around the optimum under the Mimic-Shift attack. Empirical results further show that our aggregator achieves negligible accuracy loss with a majority of adversarial clients, outperforming strong baselines.

1. INTRODUCTION

Federated learning (FL) is a leading framework for collaboratively training a machine learning (ML) model over local datasets. The decentralized nature of FL systems has raised concerns about vulnerability -as adversaries can connect to an FL system like other benign users and corrupt the ML model while evading detection by standard means (Kairouz et al., 2021) . To this end, there is growing literature on the adversarial robustness of FL (Blanchard et al., 2017; Chen et al., 2018; Xie et al., 2019b; Rajput et al., 2019b; Xie et al., 2020; Karimireddy et al., 2021a; 2022; He et al., 2022b) , particularly where adversaries can upload malicious updates. Most existing defenses assume that the adversarial clients are the minority in the system (Blanchard et al., 2017; Chen et al., 2018; Rajput et al., 2019b; Karimireddy et al., 2021a; He et al., 2022b) . However, in a federated scenario, the decentralized nature means that it is relatively straightforward for the adversary to be the majority and thus break existing defenses. We call such an adversary the "majority adversary". Our work joins a growing literature on robustness with majority adversaries, e.g., Xie et al. (2019b; 2020) , motivated by noted practical vulnerabilities. Although Shejwalkar et al. (2021) argue that the number of registered clients in a production system (e.g., GBoard) may be too large for the adversary to compromise a majority of them, they neglect the client availability issue in FL. In particular, Kairouz et al. (2021) suggests that, at any given time, only a subset (< 1%) of clients are available for the server. Such a low client availability allows the adversary to become the majority and overwhelm the server utilizing compromised networked devices (e.g., IoT devices) in a similar way as the common distributed denial-of-service (DDoS) attack (Specht & Lee, 2003; Bonguet & Bellaïche, 2017) . Some other settings such as crowd-sourced training (Ryabinin & Gusev, 2020) (a.k.a. volunteer computing) are perhaps even more vulnerable to majority adversaries because the crowd-sourcing systems do not implement access control -allowing the adversary to connect an arbitrary number of clients as volunteers. We consider the adversarial robustness of federated learning against a class of attacks where an adversary aims to decrease the accuracy of the trained ML model by uploading malicious updates. In particular, we are interested in Mimic-type attacks (Karimireddy et al., 2022) , as is discussed later in this section. A key assumption in our setup is the existence of a few trusted clients, e.g., with secure hardware support. We call these trusted clients "reference clients". In practice, the number of reference clients could be as small as two in each round. Similar approaches have been considered in existing works (Xie et al., 2019b; 2020) . One option for secure hardware is the trusted execution environment (TEE) (Pinto & Santos, 2019) , which guarantees that the program is not Byzantine. TEE is so far commercialized (e.g., on Google Pixel (GoogleBlog), Apple iPhone (AppleSupport), Sam-sung phones (SamsungDeveloper)). Recent works (Mo et al., 2021) bring TEE support to federated learning systems. We propose a combination of defenses -filtering and projection-based re-weighting. Our first defense is a filtering method that constructs a spherical accept region and excludes the updates outside it. The center of the accept region is an average update from a few trusted clients with secure hardware support. The radius of the accept region is the sample variance of reference updates times a scaling factor. Although the filtering method is effective against several standard attacks (e.g., sign flipping attack, Gaussian attack), this filtering is easy to circumvent. Building on the recently proposed Mimic attack (Karimireddy et al., 2022) , shown to break many existing defenses, we develop an improved attack method called Mimic-Shift. Mimic-Shift clients send malicious updates which slightly shift away from benign updates and mislead the aggregated updates away from the expected update. The slight shift makes Mimic-Shift hard to detect, and the calibrated shifting direction can corrupt the aggregated model. To perform the Mimic-Shift attack, we consider a man-in-the-middle (MITM) adversary capable of intercepting the message between the clients and the server. Computer security researchers have extensively studied the MITM adversary, but existing encryption solutions can be too expensive for resource-constrained client devices. For example, the AES (advanced encryption standard) encryption only has a throughput of around 50 MB/s even with a powerful desktop CPU (Gleeson et al., 2014) . With a 1 GB moderate size neural network, the encryption takes more than 20 seconds on the client-side, draining the computational resources and increasing the client dropout rate. Our second defense is a projection-based re-weighting method to deal with the Mimic-Shift attack under a majority adversary regime. The main idea is to measure the influence of each update on the aggregated update, then down-weight the updates with high influence. Specifically, we compute the scalar projection of the aggregated update on each client's update. The intuition is that the majority adversarial clients can significantly mislead the aggregated update, resulting in a large scalar projection. Note that the filtering and re-weighting methods complement each other because the re-weighting defense does not deal with the aforementioned standard attacks, as is discussed in Section 4. We further provide some theoretical analysis of our methods under the Mimic-Shift attack. First, false-positives in filtering can eliminate a benign update and perturb the aggregated update. Regarding this concern, we show that the false positive rate decreases quickly w.r.t. the number of reference clients and the scaling factor of the accept radius, suggesting that our filtering method can work with a few reference clients and a conservative scaling factor. For the re-weighting phase, the probability of malicious updates having larger scalar projections than benign updates increases as the adversary takes more shares in a system, complementing existing results (He et al., 2022b ) that guarantee more robustness as the share of the adversary decreases. Additionally, we discuss the performance of our method under a conventional minority adversary setting. Finally, we show that, in a convex setting, our method converges with a rate of O( 1 √ T ) to a neighborhood of the optimum. Our contributions are summarized as follows: • We develop the Mimic-Shift attack and show that Mimic-Shift circumvents many defense methods in federated learning. • We develop a two-stage defense using filtering and re-weighting to defend against a broad class of attacks. • We theoretically analyze our strategy and outline conditions under which it helps. Empirical results on FEMNIST (Caldas et al., 2018) , CelebA (Liu et al., 2015) and Shakespeare (McMahan et al., 2017) datasets show that our aggregator recovers a near-optimal model under a majority adversary setting with Mimic-Shift attack, outperforming existing methods by a large margin. Also, our method only loses up to 2.4% accuracy under conventional minority adversary settings. Additional empirical results demonstrate that our method is robust to a broad class of attacks, including standard Gaussian and sign-flipping attacks as well as an improved Mimic-Shift-Var attack.

2. RELATED WORK

Many existing works on Byzantine robust machine learning assume the adversary is the minority in the system, based on clustering (Blanchard et al., 2017 ), median (Yin et al., 2018) , voting (Bernstein et al., 2019 ), bucketing (Karimireddy et al., 2022) , and robust estimation (Data & Diggavi, 2021) . However, these methods can fail once the adversary becomes the majority. A few papers (Xie et al., 2019b; 2020) leverage a validation dataset to improve Byzantine tolerance to arbitrary numbers of adversarial clients but constructing a global validation dataset in a federated scenario with local private datasets is infeasible. Other redundancy-based methods with more robustness guarantees do not apply to federated learning because they assume a centralized setting with control over the data allocation on each node (Chen et al., 2018; Rajput et al., 2019a) . Recently, a clipping-based method (Karimireddy et al., 2021b; He et al., 2022b) showed success against minority adversary in a federated learning setting. Although it is relatively straightforward to combine the clipping with our reference clients, we find that the training is unstable, as is discussed in Section 6. Our paper focuses on model poisoning attacks, which aim to decrease the accuracy of the trained ML model by uploading malicious updates (He et al., 2022b) . Other attacks, including data poisoning (Steinhardt et al., 2017; Wang et al., 2021) and backdoor (Wang et al., 2020; Xie et al., 2021) , are beyond the scope of this work. We focus only on training the centralized model. Additional considerations such as personalization are known to be handled effectively using robust centralized training, followed by local fine-tuning Li et al. (2021) , thus are beyond the scope of this paper.

3. PROBLEM SETUP

We assume a federated learning system where N benign clients collaboratively train a ML model f : X -→ Y with d-dimensional parameter ζ coordinated by a server. The N benign clients include N R reference (trusted) clients. There are N ′ adversarial clients who aim to corrupt the ML model during training. The i th , i ∈ [1, ..., N + N ′ ], client has n i data samples, being benign for i ∈ [1, ..., N ] or being adversarial for i ∈ [N + 1, ..., N + N ′ ]. The federated learning is conducted in T rounds. In round t ∈ [1, ..., T ], the server broadcasts a model parameterized by ζ t-1 to each client. We omit the subscript t while focusing on one round. Then, the i th client optimizes ζ t-1 with their local data samples and report ζ t,i to the server. We define pseudo-gradient g t,i = ζ t-1ζ t,i being the difference between the locally optimized model and the broadcasted model from the previous round. Note, for simplicity, that we will often use the term "gradient" to refer to the updates. Once all the gradients are reported in, the server aggregates the gradients and produce a new model with parameters ζ t using the following rule: ζ t = ζ t-1 - N +N ′ i=1 ni N +N ′ i=1 ni g t,i . Our goal is to minimize a risk function over the benign clients: F (ζ) = N i=1 ni N i=1 ni F i (ζ) = N i=1 ni N i=1 ni E Di [ℓ(f (x; ζ), y)], where ℓ : R × Y -→ R is a loss function.

3.1. THREAT MODEL

Following the standard practice (Blanchard et al., 2017) , we consider an omniscient man-in-themiddle (MITM) adversary that knows g i , ∀i ∈ {1, ..., N } and can tell which clients are reference users, e.g., by observing the device identifier in the message. An omniscient adversary helps explore the limits of the defense. However, an omniscient adversary is not mandatory for our Mimic-Shift attack, which can work with partial information as Section 6 will show. The MITM adversary also owns a majority of clients in a system and adopts an attack from the Mimic family (Karimireddy et al., 2022) . Specifically, we assume the following Mimic-Shift attack. Mimic-Shift At each round, the MITM adversary first intercepts the gradients from benign clients. Then the adversary computes the average reference gradient ḡR = N R i=1 n R i N R i=1 n R i • g Ri from the reference users indexed by R i and the average benign gradient ḡ = N i=1 ni N i=1 ni •g i from all benign users, including the reference users. Then, all the adversarial clients report g ′ = ḡR + (ḡ R -ḡ) to the server. The Mimic-Shift attack is effective because it tries to push the aggregated gradient away from the expected gradient. The reason is that ḡ is estimated with more clients and data samples. A simple concentration argument suffices to show that ḡ will be closer to E[ḡ] than ḡR with high probability, resulting in a ḡR -ḡ pointing away from E[ḡ]. Empirical results in Section 6 show that Mimic-Shift decreases the accuracy twice as much as Mimic (Karimireddy et al., 2022) , which sets g ′ = g i for an i ∈ {1, ..., N }. The Mimic-Shift attack is also difficult to detect because the malicious g ′ has the same distance to the reference ḡR as the benign ḡ. We will see how this property lets Mimic-Shift bypass distance-based defenses in Section 6.

4. METHOD

Our robust aggregator has two phases, filtering and re-weighting. The filtering phase, applied first, excludes the gradients outside the accept region defined by the reference gradients, defending against standard attacks. Then, the re-weighting phase further down-weights the potential malicious gradients within the accepting region of the filtering phase, targeting the Mimic-Shift attack.

4.1. PHASE 1: FILTERING

In the filtering phase (Algorithm 1), we first compute the sample mean of reference gradient m R = N R i=1 1 N R •g Ri , making a guess of where the good gradients might be. Next, we set m R as the center of a spherical accept region. Then, we compute the sample variance of the reference gradients, s R = N R i=1 ∥g R i -m R ∥ N R -1 . The sample variance s R is further scaled up by a tunable hyper-parameter c. The product c • s R is the radius of the spherical accept region. Finally, we remove all the updates outside the spherical accept region. This filtering is effective against attacks that do not know where the expected gradient is (e.g., Gaussian attack) or do not carefully calibrate the adversarial gradient (e.g., sign flipping attack), as is shown in Section 6. Algorithm 1 Filtering.

Input:

A set of reference gradients, {g Ri | i ∈ {1, ..., N R }}; A set of reported gradients, {g i | i ∈ {1, ..., N + N ′ }}; A hyper-parameter c; Aggregator: 1: Compute the sample mean of reference gradients, m R := N R i=1 1 N R • g Ri ; 2: Compute the sample variance of reference gradients, s R := N R i=1 ∥g R i -m R ∥ N R -1 ; 3: return {g i | i ∈ {1, ..., N + N ′ } ∧ ∥g i -m R ∥ ≤ c • s R }; 4.2 PHASE 2: RE-WEIGHTING Suppose a majority adversary misleads the aggregated gradient. In that case, the malicious gradients likely have a stronger influence on the aggregated gradient compared to benign gradients. Here, we measure the influence via scalar projections. The re-weighting defense is outlined in Algorithm 2. The scaling weights are designed to augment benign gradients and down-weight adversarial updates. This is implemented using a monotonic re-scalingfoot_0 of the scalar projection between the aggregate and the gradients (step 4). The monotonic re-scaling amplifies the difference between the scalar projections, making large projections larger. Therefore, the potential malicious gradients would be down-weighted more. However, the monotonic re-scaling may also lead to an over-up-weighting on certain benign gradients, which are far from ḡ * . such up-weighting leads to instability in the training process. Thus, we also include a clipping operator (step 6). The clipping addresses the over-up-weighting issue by specifying a bound. An additional benefit of using the power function and clipping is that they prompt a set of uniform weights on benign gradients, stabilizing the training. If we set k to be sufficiently large such that the maximum s i is greater than N F • ( N F i=1 s i -max i∈{1,...,N F } s i ), we have s i = τ, ∀i ̸ = arg max i∈{1,...,N F } s i . In practice, we set k to 10 and τ to 0.6. Both hyper-parameters generalize well to the three datasets in our experiments. Note that the filtering phase complements the re-weighting phase because standard attacks can corrupt the scalar projections via uploading 0 or flipping the sign. For a class of inner product-based attacks (Xie et al., 2019a) , an additional clipping operator that prevent the scalar projection s i from being negative can help. Algorithm 2 Re-weighting. A set of filtered gradients, i.e., the results of Algorithm 1, {g i | i ∈ {1, ..., N F }}; Two hyper-parameters k and τ ; Aggregator: 1: Compute the aggregated gradients of all users, ḡ * := N F i=1 ni N F i=1 ni g i ; 2: Compute a scalar projection of ḡ * on each g i , s i := ḡ * •gi ∥gi∥ ; 3: Normalize the vector s = [s 1 , ..., s N F ] such that ∥s∥ 1 = N F ; 4: Take the k th power of each s i , s := [s k 1 , ..., s k N F ]; 5: Normalize the vector s = [s 1 , ..., s N F ] such that ∥s∥ 1 = N F ; 6: Clip the values smaller than τ in s, s i := max(s i , τ ), ∀i ∈ {1, ..., N F }; 7: Normalize the vector s = [s 1 , ..., s N F ] such that ∥s∥ 1 = N F ; 8: Re-weight each g i using s i , g i := gi si , ∀i ∈ {1, ..., N F }; 9: return ḡj := N F i=1 ni N F i=1 ni g i ;

5. THEORETICAL ANALYSIS

We first show that the probability of not filtering out a benign gradient increases at a rate of O(1 -1 N 2 R ) w.r.t. the number of reference clients N R and O (1 -1 c 2 ) 2 w.r.t. the scaling factor c in the accept radius. Then, we discuss conditions for the probability of down-weighting the malicious gradients to increase as the adversary owns more clients in the system. Additional discussion considers how our strategy may still help even if the aforementioned conditions do not hold, providing insights to certain of empirical results. Finally, we study the convergence of our method under a convex setting. Before proceeding, we outline some additional assumptions. To simplify tedious notation, we assume all users have the same number of samples because the difference in sample size can be merged into the difference between gradients. Assumption 1. Uniform sample size: n i = n j , ∀i ̸ = j. (1) Assumption 2. The gradients follow a hierarchical distribution in all rounds: Stage 1 : µ i ∼ P (µ), σ i ∼ P (σ), ∀i ∈ {1, ..., N }, Stage 2 : g i ∼ N (µ i , σ i ; γ + ), ∀i ∈ {1, ..., N }, where Stage 1 is a distribution where the random variable µ has finite expected value E[µ] and finite non-zero variance Var[µ]. Stage 2 is truncated isotropic Gaussian distribution with a truncation threshold γ + on the L 2 norm. Assumption 3. The variance of gradients is bounded: σ i ≤ σ + , ∀i ∈ {0, 1, ..., N }. Assumption 4. The sample variance of reference gradients is greater than 0: 0 < s - R ≤ s R . Assumption 5. The L 2 norm of expected and estimated gradients is bounded: In Case 2 and Case 3, there are points between the two contours that have the same distance to E[µ] but are not always inside accept region. 0 < γ -≤ ∥µ i ∥ ≤ γ + , ∀i ∈ {0, 1, ..., N }, 0 < γ -≤ ∥g i ∥ ≤ γ + , ∀i ∈ {0, 1, ..., N }.

5.1. FILTERING (STAGE 1)

The filtering phase removes all the gradients that are more than c•s R away from the reference m R in terms of Euclidean distance. However, the data distribution D i may vary across clients in a federated learning setting. Such non-i.i.d.ness increases the risk of filtering out a benign gradient. We start our analysis by considering the expected gradients: Lemma 6. Suppose there are N R reference clients among N clients, and the sample variance of gradients is s R , under Assumption 2, let c ≥ 2 • ∥E[µ]-μ∥ s R . Then, for any i ∈ [1, ..., N ], with probability at most 4•Var[µ] 2 c 2 •s 2 R , we have ∥µ i -μR ∥ ≥ c • s R , where μR = R i=1 1 N R • µ Ri . Remark 7. Condition c ≥ 2 • ∥E[µ]-μ∥ Var [µ] guarantees that all µ i s outside the accept region are at least c•s R 2 away from E[µ], as Figure 1 shows, enabling concentration argument. Otherwise, bounding the probability of µ i appears outside the accept region is hard without additional assumptions on P (µ). Lemma 6 suggests that the probability of filtering out a benign gradients decreases at a rate of O( 1 c 2 ). The following lemma further shows the probability of violating the assumed condition c ≥ 2 • ∥E[µ]-μ∥ s R decreases at a rate of O( 1 N 2 R ). Lemma 8. With Assumptions 2 and 5, for a fixed accept radius c • s R , with probability at most 4•Var[µ] 2 N 2 R •c 2 •s 2 R , we have c ≤ 2 • ∥E[µ]-μ∥ s R . The following theorem combines and extends the result on expected gradients to estimated gradients. Theorem 9. Under Assumptions 2 and 3, with probability at least 1 (2•σ + • √ 2π) d • (1 -4•Var[µ] 2 c 2 •s 2 R ) • (1 - 4•Var[µ] 2 N 2 R •c 2 •s 2 R ), we have ∥g i -m R ∥ ≤ c • s R . Theorem 9 shows that the risk of filtering out a benign gradient decreases quickly w.r.t. c and N R , enabling our strategy of using a small number of reference clients and a conservative c.

5.2. RE-WEIGHTING (STAGE 2)

Suppose the adversary adopts the Mimic-Shift attack and bypasses the filtering defense. We would like to figure out the probability of assigning a higher scalar projection to the malicious gradient g ′ than a benign gradient g i , i ∈ {1, ..., N }. Theorem 10. Suppose all the gradients pass the filtering phase. Let the aggregated gradient be ḡ * = w • ḡ + w ′ • g ′ , where (w, w ′ ) = N i=1 ni N +N ′ i=1 ni , N +N ′ i=N +1 ni N +N ′ i=1 ni . Let θ be the angle between ḡ and Figure 2 : Two cases where the adversary mislead the aggregated gradient ḡ * by different degree. ḡ is benign. g ′′ is a mirror of malicious g ′ w.r.t. ḡ * , along which the scalar projection equals to s ′ . g ′ , θ * be the angle between ḡ * and g ′ . Assume w w ′ ≤ ∥g ′ ∥-cos θ 2 •∥ḡ∥ ∥ḡ∥-cos θ 2 •∥g ′ ∥ such that θ * ≤ θ -θ * . Then, under Assumptions 1-5, let γ - θ * = ∥γ -∥ • tanh(θ -2 • θ * ) , with probability at least 1 (2•σ + • √ 2π) d • γ - θ * α=0 -1 -Var[µ] α 2 • 1 -Var[µ] (γ - θ * -α) 2 dα, we have s ′ ≥ s i , ∀i ∈ {1, ..., N }. Theorem 10 suggests that if 2 • θ * ≤ θ, the probability of down-weighting malicious gradients more than benign gradients (i.e., s ′ > s i ) increases as the adversary owns more clients and samples in a system (i.e. w ′ ↑). The main idea is: the minimum amount of deviation between g i and ḡ for having s i > s ′ increases as w ′ increases because w ′ ↑⇝ θ * ↓. This specific deviation is shown in Figure 2a as the dark dash line with length at least ∥γ -∥ • tanh(θ -2 • θ * ). Then, applying concentration arguments yields the result.

5.2.1. ADDITIONAL DISCUSSION

Theorem 10 only deals with the case where ḡ * has a smaller angle with g ′ than ḡ. However, in practice, we find that our method also achieves a decent accuracy even the condition 2 • θ * ≤ θ does not hold. To gain some insight, suppose there is a probability density function f of benign gradients. Then, the probability of having s i > s ′ , ∀i ∈ {1, ..., N } equals the integral of f over a cone, which is defined by spinning g ′ around ḡ * as Figure 2b shows. In a federated scenario, the estimated g i concentrates around its own µ i , which is not necessarily close to ḡ or μ, resulting in a small integral of f over the aforementioned cone.

5.3. CONVERGENCE ANALYSIS

So far, our analysis has focused on the robustness of one update. The following extends the single step analysis to a convergence analysis. Before showing the theorem, we present a lemma that quantifies the impact of false-positive filtering. Lemma 11. Suppose F : N ×R d - → N ×{0, 1} is a filtering function, let mask M = F(g 1 , ..., g N ), N ′ F = 1 -∥M ∥ 1 , ĝ = 1 ∥M ∥1 • N i=1 M i • g i , and δ = ĝ -ḡ. Then, with Assumption 5, we have ∥δ∥ ≤ 2•N ′ F N • γ + . In the following theorem, we use assumed conditions to ease the reading. Later , we shall connect the assumed conditions to Theorems 9 and 10 and further support the assumed conditions with empirical results in Appendix C. Theorem 12. Suppose the malicious gradient g ′ t is filtered out with probability 0, ∀t ∈ {1, ..., T }, and down-weighted by s ′ t ≥ 1. Assume at most N ′ F benign gradient are filtered out at each iteration. Define an aggregated gradient ḡ * t = w • ĝt + w ′ • g ′ t s ′ t = w • (μ t + δ t + ϵ t ) + w ′ • g ′ t s ′ t , where ĝt and  : Z - → R is convex. Let C = w • 2•N ′ F N • γ + 2 + 2 • w • γ + 2 + w ′ • γ + 2 , C ′ = 2 • w • C + C 2 + w • γ + , and η t = D √ T •C ′ , under Assumptions 2 and 5, we have: E[F ( ζ)] -F (ζ * ) ≤ (4 • C ′ + 1) • D 2 • w • √ T + 2 • D T • T t=1 ∥δ t ∥ + w ′ w • s ′ t • ∥g ′ t ∥ + ∥ϵ t ∥ . The convergence analysis shows that our method converges to a neighborhood around the optimum, whose size shrinks with fewer false-positive filtering (Theorem 9) and more down-weighting (tuning k and τ in Algorithm 2) with higher probability (Theorem 10) on the malicious gradients. Converging to a neighborhood is common in previous noisy gradient descent studies (Wang et al., 2021; He et al., 2022a) . Our result also suggests scaling up the learning rate as the adversary owns more clients. In practice, this scaling is handled by our re-weighting phase where the benign gradients are down-weighted by scalars less than 1.

6. EXPERIMENTS

We first compare the Mimic-Shift attack with the Mimic attack, showing that Mimic-Shift is more effective and the improved effectiveness remains with a non-omniscient adversary (Mimic-Shift-Par). Then, we evaluate our defense strategy against the Mimic-Shift attack where our strategy outperforms strong baselines.

Additional Results

Appendix C provides more results. We evaluate our defense via a nonomniscient Mimic-Shift-Par attack, an improved Mimic-Shift-Var attack as well as other standard attacks (e.g., Gaussian and sign-flipping). There are also additional experiment with fewer clients and various attack strengths, plots of the re-weighting vector s as well as convergence, and studies on defense hyper-parameters.

6.1. SETUP

We use three datasets, FEMNIST (FM) (Caldas et al., 2018) , CelebA (CA) (Liu et al., 2015) , and Shakespeare (SS) (McMahan et al., 2017) , with realistic non-i.i.d. partitions (Caldas et al., 2018) implemented in the FedML library (He et al., 2020) . The number of benign clients ranges from 143 to 500. We select 2 reference clients and 28 other benign clients at each round. The number of adversarial clients is adjusted accordingly. We report the detailed setup in Appendix B, including various hyper-parameters. We employ seven baselines, including federated averaging (FedAvg) (Cao et al., 2021) . Appendix B lists the details of these baselines. The "Oracle" aggregator operates with benign gradients and serves as a reference. Five attacks are considered in our experiments, including Mimic-Shift-Var, particularly designed as a strong attack for our proposed defense. Gaussian Draw a random update g ′ i from an isotropic Gaussian distribution N (0, 200). Sign-flipping Flip the sign of the estimated gradient g ′ i = -g i and report g ′ i to the server. Mimic-Shift Report g ′ = ḡR + (ḡ R -ḡ) to the server, as is shown in Section 3.1. Mimic-Shift-Par Randomly eavesdrop 20% clients per round and draw two clients as reference. Mimic-Shift-Var Mirror the local update using ḡR and report g ′ i = ḡR + (ḡ R -ḡi ) to the server.

6.2. MIMIC-TYPE ATTACK COMPARISON

Table 1 shows the accuracy of FedAvg aggregator under Mimic and Mimic-Shift attack. Here, the adversary owns 80% of the system, and the Mimic adversary mimics the first client in the system. Mimic-Shift attack constantly outperforms Mimic and is 116% more effective on average. Such advantages are preserved under a non-omniscient adversary. (.198↓ ) .812 ± .001 (.054↓) .182 ± .001 (.182↓) .145↓ Note: Variance is rounded up.

6.3. DEFENSE AGAINST MIMIC-SHIFT ATTACK

This experiment considers three settings where the adversary owns 0% -80% of a system. The hyper-parameters of each aggregator are selected based on the 80% adversary setting and directly applied to the other settings. Table 2 shows the accuracy of our strategy and baselines under the Mimic-Shift attack. Under a majority adversary setting, our method outperforms all the baselines by a large margin. The reason is that the median-based (CM) method picks the malicious gradient once the adversary becomes the majority. Clustering-based (Krum and Krum-B) methods suffers from a similar issue. Other reference client-assisted distance-based re-weighting strategy (CClip-R) and filtering strategy (Zeno ′ ) is not effective because Mimic-Shift carefully calibrates the malicious gradients so that they are as close to the reference gradients as the benign gradients. Using only reference clients (Ref) outperforms existing robust aggregators but suffers from low client utilization. We also find that our proposed defense yields the best accuracy under a minority adversary setting, as discussed in Section 5.2.1. Under a no adversary setting, our strategy weights the benign gradients nearly uniformly without interfering with the training. Our method occasionally outperforms the oracle. We hypothesize that the improved results can come from the up-weighting of the underrepresented clients (Li et al., 2020) , whose influence score is small. 



In the description and experiments, we use a power function. Optimizing the choice of monotonic rescaling is left for future work.



Figure 1: Three cases with different hyper-parameters c. The red dot is E[µ] and the blue dot is μR . The solid circle represents the accept region with radius c • s R . The dash circle and dash dot circle(s) are two spherical contours around E[µ]. Condition c ≥ 2 • ∥E[µ]-μ∥ s R holds in Case 1, where all the points outside the accept region are more than c•s R 2 away from E[µ]. In Case 2 and Case 3, there are points between the two contours that have the same distance to E[µ] but are not always inside accept region.

(a) 2 • θ * ≤ θ case. (b) General case.

t follow the definitions in Lemma 11, ϵ t ∼ 1 N • N i=1 N (0, σ t,i ) as Assumption 2 shows. Let ζ ∈ Z be the model parameter, ζ * be the optimum, and ζ = 1 T T t=1 ζ t . Assume sup ζ∈Z ∥ζ∥ ≤ D and F

Accuracy of FedAvg Aggregator under Attack with 80% Adversary

Accuracy of Aggregators Under Mimic-Shift Attack Adv % Data Oracle Ours FedAvg Ref CM Krum Krum-B CClip-R Zeno ′ paper shows two methods for improving the adversarial robustness of federated learning under a majority adversary regime. Empirical results in various settings and against a broad class of attacks demonstrate the proposed methods' effectiveness. Additional theoretical analysis is conducted under the Mimic-Shift attack regime, showing conditions under which the proposed method helps. Further exploring the limitations of learning with majority adversaries is a good next step. Dong Yin, Yudong Chen, Ramchandran Kannan, and Peter Bartlett. Byzantine-robust distributed learning: Towards optimal statistical rates. In Jennifer Dy and Andreas Krause (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 5650-5659. PMLR, 10-15 Jul 2018. URL https://proceedings. mlr.press/v80/yin18a.html.

