WAFFLE: WEIGHT ANONYMIZED FACTORIZATION FOR FEDERATED LEARNING

Abstract

In domains where data are sensitive or private, there is great value in methods that can learn in a distributed manner without the data ever leaving the local devices. In light of this need, federated learning has emerged as a popular training paradigm. However, many federated learning approaches trade transmitting data for communicating updated weight parameters for each local device. Therefore, a successful breach that would have otherwise directly compromised the data instead grants whitebox access to the local model, which opens the door to a number of attacks, including exposing the very data federated learning seeks to protect. Additionally, in distributed scenarios, individual client devices commonly exhibit high statistical heterogeneity. Many common federated approaches learn a single global model; while this may do well on average, performance degrades when the i.i.d. assumption is violated, underfitting individuals further from the mean and raising questions of fairness. To address these issues, we propose Weight Anonymized Factorization for Federated Learning (WAFFLe), an approach that combines the Indian Buffet Process with a shared dictionary of weight factors for neural networks. Experiments on MNIST, FashionMNIST, and CIFAR-10 demonstrate WAFFLe's significant improvement to local test performance and fairness while simultaneously providing an extra layer of security.

1. INTRODUCTION

With the rise of the Internet of Things (IoT), the proliferation of smart phones, and the digitization of records, modern systems generate increasingly large quantities of data. These data provide rich information about each individual, opening the door to highly personalized intelligent applications, but this knowledge can also be sensitive: images of faces, typing histories, medical records, and survey responses are all examples of data that should be kept private. Federated learning (McMahan et al., 2017) has been proposed as a possible solution to this problem. By keeping user data on each local client device and only sharing model updates with the global server, federated learning represents a possible strategy for training machine learning models on heterogeneous, distributed networks in a privacy-preserving manner. While demonstrating promise in such a paradigm, a number of challenges remain for federated learning (Li et al., 2019) . As with centralized distributed learning settings (Dean et al., 2012) , many federated learning algorithms focus on learning a single global model. However, due to variation in user characteristics or tendencies, personal data are highly likely to exhibit significant statistical heterogeneity. To simulate this, federated learning algorithms are commonly tested in non-i.i.d. settings (McMahan et al., 2017; Smith et al., 2017; Li & Wang, 2019; Peterson et al., 2019) , but data are often equally represented across clients and ultimately a single global model is typically learned. As is usually the case for one-size-fits-all solutions, while the model may perform acceptably on average for many users, some clients may see very poor performance. Questions of fairness (Mohri et al., 2019; Li et al., 2020) may arise if performance is compromised for individuals in the minority in favor of the majority. Another challenge for federated learning is security. Data privacy is the primary motivation for keeping user data local on each device, rather than gathering it in a centralized location for training. In traditional distributed learning systems, data are exposed to additional vulnerabilities while being transmitted to and while residing in the central data repository. In lieu of the data, many federated learning approaches require clients to send weight updates to train the aggregated model. However, the threat of membership inference attacks (Shokri et al., 2017; Nasr et al., 2019) or model inversion (Fredrikson et al., 2015; Zhu et al., 2019) mean that private data on each device can still be compromised if federated learning updates are intercepted or if the central server is breached. We propose Weight Anonymized Factorization for Federated Learning (WAFFLe), leveraging Bayesian nonparametrics and neural network weight factorization to address these issues. Rather than learning a single global model, we learn a dictionary of rank-1 weight factor matrices. By selecting and weighting these factors, each local device can have a model customized to its unique data distribution, while sharing the learning burden of the weight factors across devices. We employ the Indian Buffet Process (Ghahramani & Griffiths, 2006) as a prior to encourage factor sparsity and reuse of factors, performing variational inference to infer the distribution of factors for each client. While updates to the dictionary of factors are transmitted to the server, the distribution capturing which factors a client uses are kept local. This adds an extra insulating layer of security by obfuscating which factors a client is using, hindering an adversary's ability to perform membership inference attacks or dataset reconstruction. We perform experiments on MNIST (LeCun et al., 1998) , FMNIST (Xiao et al., 2017), and CIFAR-10 (Krizhevsky, 2009) in settings exhibiting strong statistical heterogeneity. We observe that the model customization central to WAFFLe's design leads to higher performance for each client's local distribution, while also being significantly fairer across all clients. Finally, we perform membership inference (Shokri et al., 2017) and model inversion (Fredrikson et al., 2015) attacks on WAFFLe, showing that it is much harder to expose user data than with FedAvg (McMahan et al., 2017) .

2. METHODOLOGY

2.1 LEARNING A SHARED DICTIONARY OF WEIGHT FACTORS Single Global Model Consider N client devices, with the i th device having data distribution D i , which may differ as a function of i. In many distributed learning settings, a single global model is learned and deployed to all N clients. Thus, assuming a multilayer perceptron (MLP) architecturefoot_0 with layers = 1, ..., L, the set of weights θ = {W } L =1 is shared across all clients. To satisfy the global objective, θ is learned to minimize the loss on average across all clients. This is the approach of many federated learning approaches. For example, FedAvg (McMahan et al., 2017) minimizes the following objective: min θ L (θ) = N i=1 p i L i (θ) where L i (θ) := E xi∼Di [l i (x i ; θ)] is the local objective function, N is the number of clients, and p i ≥ 0 is the weight of each device i. However, given statistical heterogeneity, such a one-size-fits-all



While we restrict our discussion to fully connected layers here for simplicity, this can be generalized to other types of layers as well. See Appendix A for 2D convolutional layers.



Figure 1: In WAFFLe, the clients share a global dictionary of rank-1 weight factors {W a , W b }. Each client uses a sparse diagonal matrix Λ i , specifying the combination of weight factors that constitute its own personalized model. Neither the client data D i nor factor selections Λ i leave the local device.

