FEDMIX: APPROXIMATION OF MIXUP UNDER MEAN AUGMENTED FEDERATED LEARNING

Abstract

Federated learning (FL) allows edge devices to collectively learn a model without directly sharing data within each device, thus preserving privacy and eliminating the need to store data globally. While there are promising results under the assumption of independent and identically distributed (iid) local data, current state-of-the-art algorithms suffer from performance degradation as the heterogeneity of local data across clients increases. To resolve this issue, we propose a simple framework, Mean Augmented Federated Learning (MAFL), where clients send and receive averaged local data, subject to the privacy requirements of target applications. Under our framework, we propose a new augmentation algorithm, named FedMix, which is inspired by a phenomenal yet simple data augmentation method, Mixup, but does not require local raw data to be directly shared among devices. Our method shows greatly improved performance in the standard benchmark datasets of FL, under highly non-iid federated settings, compared to conventional algorithms.

1. INTRODUCTION

As we enter the era of edge computing, more data is being collected directly from edge devices such as mobile phones, vehicles, facilities, and so on. By decoupling the ability to learn from the delicate process of merging sensitive personal data, Federated learning (FL) proposes a paradigm that allows a global neural network to learn to be trained collaboratively from individual clients without directly accessing the local data of other clients, thus preserving the privacy of each client (Konečný et al., 2016; McMahan et al., 2017) . Federated learning lets clients do most of the computation using its local data, with the global server only aggregating and updating the model parameters based on those sent by clients. One of the standard and most widely used algorithm for federated learning is FedAvg (McMahan et al., 2017) , which simply averages model parameters trained by each client in an element-wise manner, weighted proportionately by the size of data used by clients. FedProx (Li et al., 2020b) is a variant of FedAvg that adds a proximal term to the objective function of clients, improving statistical stability of the training process. While several other methods have been proposed until recently (Mohri et al., 2019; Yurochkin et al., 2019; Wang et al., 2020) , they all build on the idea that updated model parameters from clients are averaged in certain manners. Although conceptually it provides an ideal learning environment for edge devices, the federated learning still has some practical challenges that prevent the widespread application of it (Li et al., 2020a; Kairouz et al., 2019) . Among such challenges, the one that we are interested in this paper is the heterogeneity of the data, as data is distributed non-iid across clients in many real-world settings; in other words, each local client data is not fairly drawn from identical underlying distribution. Since each client will learn from different data distributions, it becomes harder for the model to be trained efficiently, as reported in (McMahan et al., 2017) . While theoretical evidence on the convergence of FedAvg with non-iid case has recently been shown in (Li et al., 2020c) , efficient algorithms suitable for this setting have not yet been developed or systematically examined despite some efforts (Zhao et al., 2018; Hsieh et al., 2020) . In order to mitigate the heterogeneity across clients while protecting privacy, we provide a novel yet simple framework, mean augmented federated learning (MAFL), in which each client exchanges the updated model parameters as well as its mashed (or averaged) data. MAFL framework allows the trade-off between the amount of meaningful information exchanged and the privacy across clients, depending on several factors such as the number of data instances used in computing the average. We first introduce a naive approach in our framework that simply applies Mixup (Zhang et al., 2018) between local data and averaged external data from other clients to reduce a myopic bias. Here, we go further in our framework and ask the following seemingly impossible question: can only averaged data in our framework that has lost most of the discriminative information, bring the similar effect as a global Mixup in which clients directly access others' private data without considering privacy issues? Toward this, we introduce our second and more important approach in our framework, termed Federated Mixup (FedMix), that simply approximates the loss function of global Mixup via Taylor expansion (it turns out that such approximation only involves the averaged data from other clients!). Figure 1 briefly describes the concept of our methods. We validate our method on standard benchmark datasets for federated learning, and show its effectiveness against the standard federated learning methods especially for non-iid settings. In particular, we claim that FedMix shows better performance and smaller drop in accuracy with more heterogeneity or fewer clients update per communication round, further increasing difficulty of federated learning. Our contribution is threefold: • We propose a simple framework for federated learning that averages and exchanges each local data. Even naive approach in this framework performing Mixup with other clients' mashed data shows performance improvement over existing baselines on several settings. 



Figure 1: Brief comparisons of Mixup strategies in FL and MAFL. (a) Global Mixup: Raw data is exchanged and directly used for Mixup between local and received data, which violates privacy. (b) Local Mixup: Mixup is only applied within client's local data. (c) NaiveMix: Under MAFL, Mixup is performed between local data and received averaged data. (d) FedMix: Under MAFL, our novel algorithm approximates Global Mixup using input derivatives and averaged data. In addition to non-iid problem, another important issue is that updating model parameters individually trained by each client is very costly and becomes even heavier as the model complexity increases. Some existing works Smith et al. (2016); Sattler et al. (2019) target this issue to decrease the amount of communications while maintaining the performance of FedAvg. A more practical approach to reduce communication cost is to selectively update individual models at each round, rather than having all clients participate in parameter updates. This partial participation of clients per round hardly affects test performance in ideal iid settings but it can exacerbate the heterogeneity of weight updates across clients and as a result, the issue of non-iid(McMahan et al., 2017).

We further develop a novel approximation for insecure global Mixup accessing other clients' local data, and find out that Taylor expansion of global Mixup only involves the averaged data from other clients. Based on this observation, we propose FedMix in our framework approximating global Mixup without accessing others' raw data.• We validate FedMix on several FL benchmark datasets especially focusing on non-iid data settings where our method significantly outperforms existing baselines while still preserving privacy with minimal increases in communication cost.

