SMART MULTI-TENANT FEDERATED LEARNING

Abstract

Federated learning (FL) is an emerging distributed machine learning method that empowers in-situ model training on decentralized edge devices. However, multiple simultaneous training activities could overload resource-constrained devices. In this work, we propose a smart multi-tenant FL system, MuFL, to effectively coordinate and execute simultaneous training activities. We first formalize the problem of multi-tenant FL, define multi-tenant FL scenarios, and introduce a vanilla multitenant FL system that trains activities sequentially to form baselines. Then, we propose two approaches to optimize multi-tenant FL: 1) activity consolidation merges training activities into one activity with a multi-task architecture; 2) after training it for rounds, activity splitting divides it into groups by employing affinities among activities such that activities within a group have better synergy. Extensive experiments demonstrate that MuFL outperforms other methods while consuming 40% less energy. We hope this work will inspire the community to further study and optimize multi-tenant FL.

1. INTRODUCTION

Federated learning (FL) (McMahan et al., 2017) has attracted considerable attention as it enables privacy-preserving distributed model training among decentralized devices. It is empowering growing numbers of applications in both academia and industry, such as Google Keyboard (Hard et al., 2018) , medical imaging analysis (Li et al., 2019; Sheller et al., 2018) , and autonomous vehicles (Zhang et al., 2021a; Posner et al., 2021) . Among them, some applications contain multiple training activities for different tasks. For example, Google Keyboard includes query suggestion (Yang et al., 2018) , emoji prediction (Ramaswamy et al., 2019), and next-world prediction (Hard et al., 2018) ; autonomous vehicles relates to multiple computer vision (CV) tasks, including lane detection, object detection, and semantic segmentation (Janai et al., 2020) . However, multiple simultaneous training activities could overload edge devices (Bonawitz et al., 2019) . Edge devices have tight resource constraints, whereas training deep neural networks for the aforementioned applications is resource-intensive. As a result, the majority of edge devices can only support one training activity at a time (Liu et al., 2019) ; multiple simultaneous federated learning activities on the same device could overwhelm its memory, computation, and power capacities. Thus, it is important to navigate solutions to well coordinate these training activities. A plethora of research on FL considers only one training activity in an application. Many studies are devoted to addressing challenges including statistical heterogeneity (Li et al., 2020; Wang et al., 2020a ), system heterogeneity (Chai et al., 2020; Yang et al., 2021 ), communication efficiency (Karimireddy et al., 2020; Zhu et al., 2021) , and privacy issues (Bagdasaryan et al., 2020; Huang et al., 2021) . A common limitation is that they only focus on one training activity, but applications like Google Keyboard and autonomous vehicles require multiple training activities for different targets (Yang et al., 2018; Ramaswamy et al., 2019) . Multi-tenancy of an FL system is designed by Bonawitz et al. (2019) to prevent simultaneous training activities from overloading devices. However, it mainly considers differences among training activities, neglecting potential synergies. In this work, we propose a smart multi-tenant federated learning system, MuFL, to efficiently coordinate and execute simultaneous training activities under resource constraints by considering both synergies and differences among training activities. We first formalize the problem of multitenant FL and define four multi-tenant FL scenarios based on two variances in Section 3: 1) whether all training activities are the same type of application, e.g., CV applications; 2) whether all clients support all training activities. Then, we define a vanilla multi-tenant FL system that supports all scenarios by training activities sequentially. Built on it, we further optimize the scenario, where all training activities are the same type and all clients support all activities, by considering both synergies and differences among activities in Section 4. Specifically, we propose activity consolidation to merge training activities into one activity a multi-task architecture that shares common layers and has specialized layers for each activity. We then introduce activity splitting to divide the activity into multiple activities based on their synergies and differences measured by affinities between activities. We demonstrate that MuFL reduces the energy consumption by over 40% while achieving superior performance to other methods via extensive experiments on three different sets of training activities in Section 5. We believe that MuFL is beneficial for many real-world applications such as autonomous vehicles, voice assistance systems, and robotics (more examples in Appendix A). We summarize our contributions as follows: • We formalize the problem of multi-tenant FL and define four multi-tenant FL scenarios. To the best of our knowledge, we are the first work that investigates multi-tenant FL in-depth. • We propose MuFL, a smart multi-tenant federated learning system to efficiently coordinate and execute simultaneous training activities by proposing activity consolidation and activity splitting to consider both synergies and differences among training activities. • We establish baselines for multi-tenant FL and demonstrate that MuFL elevates performance with significantly less energy consumption via extensive empirical studies.

2. RELATED WORK

In this section, we first review the concept of multi-tenancy in cloud computing and machine learning. Then, we provide a literature review of multi-task learning and federated learning. Multi-tenancy of Cloud Computing and Machine Learning Multi-tenancy has been an important concept in cloud computing. It refers to the software architecture where a single instance of software serves multiple users (Chong & Carraro, 2006; Fehling et al., 2010) . Multi-tenant software architecture is one of the foundations of software as a service (SaaS) applications (Mietzner et al., 2008; Cai et al., 2013) . Recently, researchers have adopted this idea to machine learning (especially deep learning) training and inference. Specifically, some studies investigate how to share GPU clusters among multiple users to train deep neural networks (DNN) (Jeon et al., 2019; Zhao et al., 2020; Lao et al., 2021) , but these methods are for GPU clusters that have enormous computing resources, which are inapplicable to edge devices that have limited resources. Targeting on-device deep learning, some researchers define multi-tenant as processing multiple computer vision (CV) applications for multiple concurrent tasks (Fang et al., 2018; Jiang et al., 2018) . However, they focus on the multi-tenant on-device inference rather than training. On the contrary, we focus on multi-tenant federated learning (FL) training on devices, where the multi-tenancy refers to multiple concurrent FL training activities. Multi-task Learning Multi-task learning is a popular machine learning approach to learn models that generalize on multiple tasks (Thrun, 1995; Zhang & Yang, 2021) . A plethora of studies investigate parameter sharing approaches that share common layers of a similar architecture (Caruana, 1997; Eigen & Fergus, 2015; Bilen & Vedaldi, 2016; Nekrasov et al., 2019) . Besides, many studies employ new techniques to address the negative transfer problem (Kang et al., 2011; Zhao et al., 2018) among tasks, including soft parameter sharing (Duong et al., 2015; Misra et al., 2016) , neural architecture search (Lu et al., 2017; Huang et al., 2018; Vandenhende et al., 2019; Guo et al., 2020; Sun et al., 2020) , and dynamic loss reweighting strategies (Kendall et al., 2018; Chen et al., 2018; Yu et al., 2020) . Instead of training all tasks together, task grouping trains only similar tasks together. The early works of task grouping (Kang et al., 2011; Kumar & Daumé, 2012) et al., 2020; Fifty et al., 2021) , however, are unsuitable for our scenario because they focus on the inference efficiency, bypassing the intensive computation on training. Thus, we propose activity consolidation and activity splitting to group training activities based on their synergies and differences. Federated Learning Federated learning emerges as a promising privacy-preserving distributed machine learning technique that uses a central server to coordinate multiple decentralized clients to



are not adaptable to DNN. Recently, several studies analyze the task similarity (Standley et al., 2020) and task affinities (Fifty et al., 2021) for task grouping. In this work, we adopt the idea of task grouping to consolidate and split training activities. The state-of-the-art task grouping methods (Standley

