SYNERGISTIC NEUROMORPHIC FEDERATED LEARNING WITH ANN-SNN CONVERSION FOR PRIVACY PROTEC-TION

Abstract

Federated Learning (FL) has been widely researched for the growing public data privacy issues, where only model parameters, instead of private data, are communicated. However, recent studies debunk the privacy protection of FL, showing that private data can be leaked from the communicated gradients or parameter updates. In this paper, we propose a framework called Synergistic Neuromorphic Federated Learning (SNFL) that enhances privacy during FL. Before uploading the updates of the client model, SNFL first converts clients' Artificial Neural Networks (ANNs) to Spiking Neural Networks (SNNs) via calibration algorithms. In a way that not only loses almost no accuracy but also encrypts the client model's parameters, SNFL manages to obtain a more performant model with high privacy. After the aggregation of various SNNs parameters, the server distributes the parameters back to the clients. This design offers a smooth convergence to continue the model training under the ANN architecture. The proposed framework is demonstrated to be private, introducing a lightweight overhead as well as yielding prominent performance boosts. Extensive experiments with different kinds of datasets have demonstrated the efficacy and the practicability of our method. In most of our experimental IID and not extreme Non-IID scenarios, the SNFL technique has significantly enhanced the model performance. For instance, SNFL improves the accuracy of FedAvg on Tiny-ImageNet by 13.79%. Besides, the original image cannot be reconstructed after 280 iterations of attacks with the SNFL method, whereas it can be reconstructed after just 70 iterations with FedAvg.

1. INTRODUCTION

Recent advancements in machine learning, particularly deep learning, rely heavily on large data sets to obtain decent inference performance. Due to the growing demand for data, it is now necessary to feed models with information from multiple entities. However, this transfer, exchange, and trade of data among entities may violate the General Data Protection Regulation (GDPR) and get punished by the Act (Wachter, 2018) , posing an unprecedented challenge to the field of machine learning. Federated learning (McMahan et al., 2017) then emerges and flourishes as a privacy-preserving approach by training a shared model collaboratively while keeping data locally. Despite that the data are stored locally, clients that join the federated learning need to transmit their local gradients to the server to update the shared model. Recent studies Zhu & Han (2020) ; Zhao et al. (2020) ; Huang et al. (2021) have revealed that sensitive local data could be leaked from these transmitted local gradients via model inversion attack Zhu & Han (2020) . To defend against such kind of attack and prevent privacy leakage, defense strategies including differential privacy (Geyer et al., 2017) , secure multi-party computation (Byrd & Polychroniadou, 2020) , and MixUp (Zhang et al., 2017) have been developed. In exchange for privacy, the cost is then either severe computational overheads (Hardy et al., 2017) or unavoidable accuracy losses (Kim et al., 2021) . What's the intrinsic source of privacy in these defense strategies? If we consider this question from an information theory perspective, it is indeed the asymmetry of entropy in the encryption and decryption steps for clients and servers when partial encryption information is kept locally only. From this standpoint, as long as an encryption method is capable of inevitability between clients and servers while still allowing for effective aggregation, it would be feasible to improve the privacy for federated learning. Recent progression in neuromorphic computing, especially the conversion from traditional artificial neural networks (ANNs) to spiking neural networks (SNNs) (Deng & Gu, 2020) , provides a pair of source ANN and target SNN that both achieve high accuracy, with the source ANN not recoverable from the resulted SNN (Li et al., 2021b) . This property fits naturally with the demand for privacy protection in federated learning. Indeed, if we train ANNs on clients and only send the converted SNNs with partial parameters to the server for aggregation, we can then expect to obtain a feasible privacy-protected FL algorithm with an effective parameter-sharing paradigm. In addition, such an ANN-SNN conversion is lightweight and performance-preserving (or even performance-improving) by careful design. Fig. 1 illustrates the pipeline of our proposed method. Besides the natural feasibility of SNNs (Esser et al., 2016; Kim et al., 2019) , this synergistic framework also brings two additional benefits that are special for federated learning. First, in contrast to existing noise injection methods (e.g., differential privacy Geyer et al. (2017) ), our ANN-SNN conversion process is optimized to improve performance by fine-tuning SNN's weights rather than trading off performance drop versus noise level. As a result, our method is able to achieve even better performance against standard federated learning. Second, the SNN emits discrete spikes and is not differentiable, thus the induced synergistic FL could be more robust to small perturbations and adversarial attacks like white-box attacks (Liang et al., 2021) . Our contributions are summarized as follows: • Innovation/Privacy: We design a federated learning framework where the server and clients run two different models in a privacy-preserving manner as a new solution. To the best of our knowledge, our work is one of the first to train different types of neural network models on server and clients. • Accuracy: Compared to the conventional approach, extensive experiments validate SNFL can deliver similar or superior accuracy relative to other common methods. • Effectiveness: Based on the SNFL framework, we analyze the backdoor attack and develop a method to simply detect it through abnormal SNN thresholds.

2.1. FEDERATED LEARNING (FL)

In federated learning, each client computes a model update, i.e. gradient, on its local data. While sharing gradients was assumed to leak little information about the client's private data, recent papers (Zhu & Han, 2020; Zhao et al., 2020; Huang et al., 2021) devised "gradient inversion attack" in which an attacker listening to one client's communications with the server can begin to reconstruct the client's private data. To defend against this, methods such as gradient clipping (Sun et al., 2019) , perturbing gradients (Zhu & Han, 2020) , and robust aggregation (Blanchard et al., 2017; Goryczka & Xiong, 2015; Yin et al., 2018) are commonly used.

2.2. SPIKING NEURAL NETWORK (SNN)

Conventionally, there are two distinct routes to obtain a deep SNN (Deng et al., 2020) :(1) direct training SNN from scratch, and (2) converting a pretrained ANN to SNN. In this work, we mainly focus on the conversion-based method. ANN-SNN conversion directly reuses the features learned in ANN to obtain a performant SNN. However, it requires a trade-off between inference latency and task performance. Data-based normalization (Diehl et al., 2016) and threshold balancing (Sengupta et al., 2018) are the basic methods of ANN-to-SNN conversion. Then Rueckauer et al. (2016) ; Han et al. (2020) propose the soft mechanism to reduce information loss by membrane potential reset. Recently, Deng & Gu (2020) analyze the conversion error and propose a shift method to reduce it by half. Li et al. (2020) propose a light pipeline and an advanced pipeline, which apply layer-wise calibration algorithms to modify the network parameters to diminish the conversion error, significantly reducing the required simulation length. We adopt the layer-wise calibration algorithm for high-performance and low-latency SNN in the SNFL framework.

3. PRELIMINARIES

In this section, we briefly introduce the concept and the baseline method for Federated Learning (FL). We also point out the privacy issue in FL, which can lead to the leakage of user data. Federated Learning (FL). FL enables mobile devices to collaboratively learn a shared prediction model while keeping all the training data on device, decoupling the ability to do machine learning from the need to store the data in the cloud. Formally, assuming we have K clients, the optimization objective in FL is to solve the following empirical risk minimization problem: min θ L(θ) = 1 K K i=1 ℓ i (θ), where ℓ i (θ) = ℓ (xi,yi)∼Di (x i , y i ; θ). Here, θ is the model parameters vector and ℓ i (θ) denotes the loss function of θ evaluated on the i-th client's dataset D i . (x i , y i ) is the input-label pair on D i . The goal of this objective is to achieve minimum average loss on each client. We assume {D i } K i=1 are randomly sampled from D train , where both IID sampling and Non-IID sampling are considered in this work. To ensure the minimization of average loss on all clients while not sharing the local input data, the clients will upload their model parameters to the server periodically. A communication round is used for client upload, server aggregation, and server distribution. Here we introduce the FedAvg (McMahan et al., 2017)  θ (r) = θ (r-1) + 1 K K i=1 ∆θ (r) i , after which the server distributes the aggregated parameters (θ (r) ) to clients. • Upon receiving the aggregated parameters from server, the clients start their own local learning using the private datasets, i.e., {D i } K i=1 , which creates new model parameters update for the next round ∆θ (r+1) i , given by ∆θ (r+1) i = η i ∇ θ (r) ℓ (xi,yi)∼Di (x i , y i , θ (r) ), where η i is the learning rate in local learning. The clients usually update multiple iterations with gradient descent (we only show one update in above equation). Gradient-Inversion Attack. Proposed in Zhu & Han (2020) , the Deep Leakage from Gradients (DLG) attack can utilize the gradient information to reverse the private local data and the label information, that is, given the model parameters update ∆θ i one can obtain the similar (x i , y i ) pairs. This is done by mimicking the real gradient (also the parameters update, see Eq. ( 3)). Formally, DLG first randomly initialize a dummy input and a dummy label (x ′ i , y ′ i ), and optimize them by minimizing the discrepancy between real gradients and current gradients, given by min x ′ i ,y ′ i ||∇ θ ℓ(x ′ i , y ′ i ) -∇ θ ℓ(x i , y i )|| 2 F + αR(x ′ ), where ∇ θ ℓ(x i , y i ) is the real gradient uploaded to the server. R(x ′ ) is an image prior loss function with α as the coefficient. The current gradient ∇ θ ℓ(x ′ i , therefore, can be made as close as possible to the real gradient and generate corresponding input data and label. So far, some defensive methods have been proposed. For example, Gradient Pruning (Zhu & Han, 2020) , Gradient Noise (Zhu & Han, 2020) , and Mixup (Zhang et al., 2017) aim to provide less or distributed information in gradients. However, these methods sacrifice task performance for better privacy. In this paper, we seek a method for preserving privacy during federated learning while not jeopardizing its accuracy.

4. METHODOLOGY

In this section, we introduce our method-combining both Artificial Neural Networks (ANNs) and Spiking Neural Networks (SNNs) for privacy-preserving federated learning.

4.1. SPIKING NEURAL NETWORKS

Compared to artificial neurons i.e., ReLU: max(0, x), spiking neurons are biologically-inspired, where each neuron maintains a variable dubbed membrane potential v. Here, we describe the dynamics using the iterative expression of the Integrate-and-Fire (IF) neuron model (Liu & Wang, 2001) , which is favorable for ANN-SNN conversion regime (Rueckauer et al., 2016; Han et al., 2020) . Formally, at time step t, the IF neuron receives the pre-synaptic input, and then charges the membrane potential, given by v(t) = v(t) + I(t), s(t) = V th if v(t + 1) ≥ V th 0 otherwise , v(t + 1) = v(t) -s(t), where I(t) is the pre-synaptic input calculated by the weights W and the spike s from last layer. As long as the membrane potential exceeds the firing threshold V th , the neuron will elicit a spike s, otherwise, it will stay silent. For fired neurons, the membrane potential will be reset by subtraction, i.e., the third term in Eq. ( 5). Noting that the threshold V (l) th differs between layers, the server can further apply weight normalization (Diehl et al., 2016) to convert the output {0, V (l) th } to a binary spike {0, 1}.

4.2. SYNERGISTIC NEUROMORPHIC FEDERATED LEARNING

In the conversion from a source ANN to a target SNN, the discrepancy of output activation between the two networks will accumulate layer-by-layer, resulting in significantly different output at the final output layer. To address this issue, a layer-wise parameter calibration technique (Li et al., 2021b ) is proposed to adjust the SNN parameters so that its activation frequency gets close to the activation in the source ANN. Mathematically, denote the average spike rate over time in SNN is s, we can write the conversion error as e = as, where a is the activation in source ANN. For each channel c, the average activation is then calculated as µ c (a) = 1 wh w i=1 h j=1 x c,i,j , where w, h are the width and height of the feature. The bias calibration (BC) algorithm computes the spatial mean of the error term, given by µ c (e) = µ c (a) -µ c (s). (6) Afterwards, µ c (e) can be added to the c-th channel of bias term b in SNNs. When calculating µ c (e), we need to estimate µ c (a) and µ c (s) based on a small calibration dataset (e.g. 128 images), which is not accessible on the server set. Given that the small calibration dataset is a subset of the client's private dataset and differs between clients, the attacker will be unable to recover every client's µ c (e). Our framework can take advantage of this algorithm to encrypt the gradient information, therefore improving privacy in FL. The overall pipeline (Algo. 1) in our SNFL is described as follows: Client Encryption. Before uploading the parameter updates to the server, the clients convert the ANN models into SNN models. The conversion first replaces all the ReLU neurons into IF neurons. Due to there is no corresponding module in SNN for BN layer, Rueckauer et al. (2017) propose to absorb the BN parameters to the weight and bias, which can be represented as W S ← W A γ σ , b S ← β + (b A -µ) γ σ , where W A , b A are the weight and bias of ANN model, W S , b S are the weight and bias of SNN model, µ, σ are the running mean and standard deviation and γ, β are the transformation parameters of the Batch Normalization (BN) layer. Then, all clients run bias calibration algorithm to update the bias parameters for SNN. Note that each local BC process will infer its own local model using its private dataset to record some activation, i.e., a i and si for the i-th client's ANN and SNN. The bias parameters can then be calibrated as b i S ′ ← b i S + µ(a i ) -µ(s i ). Because the curious server couldn't recover µ(a i ) -µ(s i ), and that couldn't recover b S from b ′ S , it couldn't possibly recover b A . The BC algorithm has two advantages: (1) the uploaded SNN has higher accuracy since its parameters are calibrated, and (2) the uploaded parameters are not the same as the original ANNs, which prevents leakage from gradients. In the sharing step, clients send parameters to the server, which include the weight (W i S ), bias (b i S ′ ), threshold (V i ) of SNN models as well as BN layer (γ i , σ i , β i , µ i ) of ANN models. Server Aggregation. In the global communication round, the server will receive the encrypted parameters from clients. Then, on the server side, we apply FedAvg (McMahan et al., 2017) (cf. Eq. ( 2)) to aggregate the massive clients updates. The server averages parameters uploaded by all clients to obtain the averaged SNN model (W S S , b S S , V S ) and BN layer parameters (γ S , σ S , β S , µ S ), where the subscript S/A indicates that the parameters are from the SNN or ANN model and the superscript S indicates that the model is owned by the server. Thus, the aggregated model is also SNN. Since SNN has discrete spikes, it might be less sensitive to a small amount of random noise, which leads to the robustness of SNN (Venkatesha et al., 2021) . In practice, we find that the aggregation of SNNs loses less task performance than that of ANNs in most cases. Note that clients do not require the aggregated SNN's parameters. So the server has to take an additional step while processing the W A from each client to obtain the parameters that clients require. The server gets W i A ← W i S σ i γ i from the i-th client and then applies (cf. Eq. ( 2)) to aggregate all W A from clients to obtain W S A , which is the parameter required by clients. In the sharing step, the server sends W S A , γ S , σ S , β S , µ S to each client. Server Distribution. Following the FedAvg setting, clients use these parameters updated by the server to recreate the ANN model and continue to train the ANN using the local dataset. Note that clients will not receive the updates of bias parameters from the server; rather, they continue to use their original ANN bias (before calibration) for the next round of training. On the client-side, they lose less information compared to FedAvg since the bias parameters are kept intact, which helps them learn a better local model. On the server-side, it always uses SNN model for evaluation.

5. EXPERIMENTS

In FL, it is frequently vital to consider the presence of semi-honest (honest-but-curious) adversaries for the sake of privacy protection. The adversary is honest in the sense that he/she faithfully follows the collaborative learning protocol, but he/she may be curious about the training data of other participants. On the one hand, given the presence of semi-honest partners, private data must be kept as secure as possible, while a certain amount of information must be transferred across parties for the sake of learning utility. In this section, we conduct experiments to demonstrate the benefits of SNFL in protecting privacy and, in most cases, improving accuracy.

5.1. PRIVACY

The parameters that the server can get are for Client c ← 0 to K do Aggregate the parameters of f SL using FedAvg and obtain f SG 13: ∆W i S (r) = W i S (r) -W i S (r -1), ∆W i A (r) = W i A (r) - W i A (r -1), ∆b i (r) ′ = b i (r) ′ -b i (r -1) ′ , Send f SG parameters to the clients 14: end for Procedure:ANN-TO-SNN () 1: for all i = 1, 2, . . . , p-th layers in the ANN do 2: Collect input data x (i) ,output data x (i+1) in one batch 4) on LeNet (LeCun et al., 2015) with batch size 1 and optimize for 280 iterations. We use CIFAR10 (Krizhevsky et al., 2010) to evaluate the attack and defense performance. The leaking process is visualized in Fig. 2 (a) . This attack can recover every image from ANN model gradients. However, when the model is converted into SNN, the attack is rendered ineffective. This is because, as mentioned in 4.2, SNN replaces the differentiable ReLU neurons into non-differentiable IF neurons. For SNN, there is no gradient. Attack on ANN model converted from SNN model: We also consider a case where an attacker ignores the bias calibration and forcibly converts the SNN model to an ANN model. We employ a more sophisticated gradient inversion attack (Huang et al., 2021) on ResNet20 (Sengupta et al., 2018) with a more realistic setting in which the attacker is unaware of the exact batch size. AN N 1 → SN N → AN N 2. The difference between AN N 2 and AN N 1 is the value of bias. As shown in Fig. 2 (b ), the images reconstructed by ANN2 are more blurry than the images reconstructed by ANN1.

5.2. ACCURACY

Implementation Details: We perform experiments on widely adopted benchmarks CIFAR10 (Krizhevsky et al., 2010) , CIFAR100 (Krizhevsky et al., 2010) , and Tiny-ImageNet constructed from ImageNet (Russakovsky et al., 2015) . To simulate federated learning scenario, we randomly split the training set of each dataset into N parties, and assign one training party to each client. Namely, each client owns its local training set. We are interested in different partitions: IID and Non-IID, where the overall label distribution across clients is the same in the IID setting, whereas class proportions and the number of data points of each client are unbalanced in the Non-IID setting. Especially, for the Non-IID setting, we impose data shift as follows (Li et al., 2022) varies across clients. We sample q ∼ Dir N (β) and allocate a q i proportion of the total data samples to client i. We should note that if β is set to a smaller value, then the partition is more unbalanced. we use q ∼ Dir N (β) to denote such a partitioning strategy. For all the experiments, we use ResNet20 (Sengupta et al., 2018) architecture and SGD optimizer with a weight decay of 1e-5 and momentum of 0.9. The adopted learning is 0.1, which is multiplied by 0.1 at communication rounds 61 and 96. We set the total global communication rounds E g at 100 and train each client for E l = 5 epochs in every global communication round. The simulation length T of SNN model is 256. Validation in the IID Case: Since a real-world federated system involves many devices, a federated learning model must be scalable with the number of devices. In this experiment, we verify the cases of 5, 10, and 15 clients, respectively. As shown in Table . 5.1, we present the test accuracy on all datasets before and after applying SNFL. From a horizontal perspective, it can be observed that applying SNN-conversion increases accuracy for all baseline methods, even with an accuracy gain of up to 13.79%. This is particularly inspiring because SNPL requires no modification to the original federated training process. One can easily get considerable accuracy profits by simply post-processing the trained global model. Comparing the accuracy gains of different methods after applying SNFL and whole data calibration, we find that FedProx and MOON have the greatest improvement. From the vertical, the accuracy of baseline methods using SNFL drops more slowly as the number of clients increases. For example, on CIFAR100, when 5 clients participate, the accuracy of S-MOON is 0.71% higher than MOON, while the difference between S-MOON and MOON increases to 4.46% when 15 clients participate. This reflects the SNFL benefits that are better suited to the federated learning situation with a large number of clients. Validation in the Non-IID Case: A key challenge in FL is the Non-IID data among the parties. As shown in Table . 5.2, although accuracy decreases to varying degrees as β decreases, SNFL can still increase the performance of model, with MOON and Fed seeing the most benefit. For instance, on CIFAR10, when β = 0.5, S-FedProx is 3.91% higher than FedProx and S-MOON is 2.43% higher than MOON.

5.3. BIAS ANALYSIS

Compared to FedAvg, SNFL has two different operations on bias. (1) In the conversion between ANN and SNN, one client uses its local private data to calibrate the SNN model's bias. (2) When sending back server's model to clients, standard FL returns all parameters of server's model including bias, but the server of SNFL keeps bias as a private key and does not share it with clients. Since both clients and server use bias as their private key, SNFL secures the privacy of all parties to some extent. To investigate the effect of the above two operations on bias, we design four ablation studies (Table . 5.2). It can be seen that the extra BC operations have no significant effect on the final server accuracy when clients use their own saved bias instead of the server's bias. There is a slight increase in accuracy if the server does not share its bias with clients. In other words, we are protecting privacy while maintaining the performance of the model, rather than trading precision for privacy as in previous studies. Then, we simulate an honest but curious server, which attempts to infer one client ANN's full parameters. In this experiment, we used 10 clients with IID and Non-IID (q ∼ Dir N (0.5)). Since the malicious server cannot directly access the bias of ANN in SNFL, it can only calculate a new bias using the bias of SNN and ANN's BN parameters. Fig. 3 (A ) shows the L2 distance between the true bias and the inferred bias. In the IID case, the disparity between them is much greater than in Non-IID case, showing BC needs more huge bias shift (more huge difference between ANN and SNN) when layer-by-layer calibration in the IID case.

5.4. BACKDOOR ATTACK AND DETECTION

In federated learning, backdoor attack (Bagdasaryan et al., 2020) attempts to cause the model to make wrong judgments about data with a certain characteristic (trigger), but the model does not affect on the main task. In other words, the attacked model still exhibits high accuracy on the test dataset, but its output will be different from the output of the clean model when input activates the backdoor trigger. Since the server cannot access the client's training data in federated learning, it is difficult to determine whether the global model has been poisoned through data detection. In recent paper (Bhagoji et al., 2019) , the server identifies the abnormal client based on clustering or mean detection on the model weight and bias. However, these detection methods are cumbersome since the number of parameters in ANN is enormous, e.g., the number of parameters in ResNet20 has reached over 11 million. For SNFL, the server adopts the SNN model, which has a special type of parameter called "threshold". As shown in Fig. 3 B and C, we find that the thresholds (only 20 thresholds in ResNet20, including the converted pooling layer) of poisoned clients and normal clients are significantly different. The distance between the normal clients' threshold set and the cluster center is less than 17.62, whereas the distance between the poisoned client and the cluster center is more than 103.53, making the distinction very clear.

6. CONCLUSION

In this paper, we propose that SNFL, a simple FL framework, protects privacy while improving model accuracy in most cases. To the best of our knowledge, this is the first paper that allows clients and server to use different types of neural network models. SNFL can be thought of as a lightweight encryption component add-on for any global federated objective. We investigate the root cause of its privacy from both theoretical and experimental perspectives and conduct experiments to demonstrate that the framework does not jeopardize the model's performance. Empirical results demonstrate that SNFL can result in both more privacy and more accurate models compared with the strong baseline. Our work suggests several interesting directions for future studies, such as exploring the applicability of SNFL to other attacks and its ability to migrate to other algorithms.

A APPENDIX

A.1 IMPACT OF NUMBER OF CLIENTS The total number of clients is one aspect of scalability that influences the performance of a federated learning system. To study the impact of client count, we use the case with CIFAR10 in IID setting. We observe, in Table . A.1, that there is a sharp drop as the number of clients increases in standard FL. However, in SNFL, the server model accuracy lowers just little as show in Because real-world networks are inherently unstable, assuming updates communication would be effective from all selected devices is impracticable. Hence, the model needs to be robust enough to handle devices that fail to communicate updates. These devices are referred to as stragglers. In this section, we analyze the impact of stragglers on the performance of final SNN model. We use a case with a total of N = 60 clients. In each round, we randomly select P N clients upload parameters to server. We consider different levels of probabilities and summarize in Table . A.2. We found that when the number of stragglers decreased, the accuracy of FedAvg increased first and subsequently declined to 57.62%, whereas the accuracy of S-FedAvg increased amazingly consistently to 90.31%. The forwarding pass in SNN is repeated for T steps to get the final result, where the final result is the expectation of the ultimate layer's output across T steps. This allows the flexibility of adjusting T to balance between the latency and accuracy of SNNs for different application scenarios. We conduct experiments on CIFAR10 with different simulation length T , as shown in Table . A.3. We discovered that increasing T improved SNN accuracy to a certain extent. However, as T increases to a certain point, its influence on accuracy decreases. 



Figure 1: Right: Workflow of Synergistic Neuromorphic Federated Learning with ANN-SNN Conversion (SNFL). Left: Each client communicates parameters ∇W A generated by the model trained on private local data. The attacker updates randomized dummy input and label to minimize the gradient distance ||∇W A -∇W ′ ||. When the optimization is complete, the attacker can obtain the training set from the client. However, in SNFL, the client's model has been converted and calibrated to SNN before communication.

communication for client-server update. • In the r-th communication round, each client i uploads it's local model parameters change ∆θ

The ervser aggregates the local model updates from all participating clients, given by

e (i+1) = x (i+1) -s(i+1) 6: Calibrate bias term b(i) ← b(i) + µ(e i+1) 7: end for 8: output Converted SNN model Attack on SNN model: We employ gradient inversion attack (Zhu & Han, 2020) (cf.) Eq. (

Figure 2: Results of gradient inversion attack. The left half shows the attack on ANN model and SNN model, respectively. The right half shows the attack on ANN model and ANN model reconverted from SNN model.

FedAvg (McMahan et al., 2017): it involves multiple local random gradient updates on the client nodes, followed by server model averaging updates. (2) FedProx(Li et al., 2020): it improves the local objective based on FedAvg and introduces an additional L 2 regularization term in the local objective function to limit the distance between the local model and the global model. A hyper-parameter µ is introduced to control the weight of the L 2 regularization. (3) MOON (Li et al., 2021a): it corrects the local updates by maximizing the agreement of representation learned by the current local model and the representation learned by the global model. (4) SOLO: each client uses local private data to train its own model without communicating with others.

Figure 3: Bias and threshold analysis. (A) The L2 distance between the true bias and the inferred bias. (B) Scatter image for the thresholds of the last two layers. 2/10 clients are attack clients. (C) The thresholds comparison between the normal clients and the poisoned clients. The dataset is CIFAR10.

Fig. A.1.

Figure 1: Impact of Number of Clients-CIFAR10

Synergistic Neuromorphic Federated Learning (SNFL) Input:Set of K clients with local datasets; B is the local minibatch size, E L is the number of local epochs, E G is the number of Global communication round, and η is the learning rate. Parameter: f SG is the global SNN model of server, f SL is the local SNN model of client, f AL is the local ANN model of client Output: Well-trained model f SG Algorithm: MAIN 1: Initialize global SNN model f SG and local ANN model f AL with random weights 2: for round m ← 0 to E G do

Accuracy(%) on CIFAR10, CIFAR100, and TinyImageNet test dataset with IID. "P" means the number of participating clients. "Client Accuracy" shows the mean and variance of all clients' ANN models. "Server Accuracy" is the accuracy of the server model."S-" means applying SNFL.

Accuracy(%) on CIFAR10 and CIFAR100 with different degrees of Non-IID. There are 10 participating clients.

The impact of bias. "✓/✗" denotes whether to do bias calibration (BC), or whether the server shares its bias with clients (Bias Back). "T=" means the simulation length T of SNN model.

Impact of Number of Clients-CIFAR10 38±0.06 94.71±0.06 94.59±0.06 95.40±0.05 94.80±0.08 95.26±0.04 10 92.02±0.11 93.09±0.07 91.39±0.08 95.11±0.09 92.90±0.07 94.73±0.09 15 88.62±0.18 91.10±0.08 84.76±0.10 95.01±0.12 89.50±0.10 94.36±0.08 60 57.01±0.18 90.14±0.02 73.46±0.05 90.88±0.02 59.68±0.22 90.57±0.038

The accuracy with various client drop probabilities.

The accuracy with different SNN simulation steps.

