FEDERATED CONTINUAL LEARNING WITH WEIGHTED INTER-CLIENT TRANSFER

Abstract

There has been a surge of interest in continual learning and federated learning, both of which are important in deep neural networks in real-world scenarios. Yet little research has been done regarding the scenario where each client learns on a sequence of tasks from a private local data stream. This problem of federated continual learning poses new challenges to continual learning, such as utilizing knowledge from other clients, while preventing interference from irrelevant knowledge. To resolve these issues, we propose a novel federated continual learning framework, Federated Weighted Inter-client Transfer (FedWeIT), which decomposes the network weights into global federated parameters and sparse task-specific parameters, and each client receives selective knowledge from other clients by taking a weighted combination of their task-specific parameters. FedWeIT minimizes interference between incompatible tasks, and also allows positive knowledge transfer across clients during learning. We validate our FedWeIT against existing federated learning and continual learning methods under varying degrees of task similarity across clients, and our model significantly outperforms them with a large reduction in the communication cost.

1. INTRODUCTION

Continual learning (Thrun, 1995; Kumar & Daume III, 2012; Ruvolo & Eaton, 2013; Kirkpatrick et al., 2017; Schwarz et al., 2018) describes a learning scenario where a model continuously trains on a sequence of tasks; it is inspired by the human learning process, as a person learns to perform numerous tasks with large diversity over his/her lifespan, making use of the past knowledge to learn about new tasks without forgetting previously learned ones. Continual learning is a long-studied topic since having such an ability leads to the potential of building a general artificial intelligence. However, there are crucial challenges in implementing it with conventional models such as deep neural networks (DNNs), such as catastrophic forgetting, which describes the problem where parameters or semantic representations learned for the past tasks drift to the direction of new tasks during training. The problem has been tackled by various prior work (Kirkpatrick et al., 2017; Lee et al., 2017; Shin et al., 2017; Riemer et al., 2019) . More recent works tackle other issues, such as scalability or order-robustness (Schwarz et al., 2018; Hung et al., 2019; Yoon et al., 2020) . However, all of these models are fundamentally limited in that the models can only learn from its direct experience -they only learn from the sequence of the tasks they have trained on. Contrarily, humans can learn from indirect experience from others, through different means (e.g. verbal communications, books, or various media). Then wouldn't it be beneficial to implement such an ability to a continual learning framework, such that multiple models learning on different machines can learn from the knowledge of the tasks that have been already experienced by other clients? One problem that arises here, is that due to data privacy on individual clients and exorbitant communication cost, it may not be possible to communicate data directly between the clients or between the server and clients. Federated learning (McMahan et al., 2016; Li et al., 2018; Yurochkin et al., 2019) is a learning paradigm that tackles this issue by communicating the parameters instead of the raw data itself. We may have a server that receives the parameters locally trained on multiple clients, aggregates it into a single model parameter, and sends it back to the clients. Motivated by our intuition on learning from indirect experience, we tackle the problem of Federated Continual Learning (FCL) where we perform continual learning with multiple clients trained on private task sequences, which communicate their task-specific parameters via a global server. Figure 1 (a) depicts an example 1 1 2 3 4 5 6 7 8 9 ) will transmit the task-specific parameters to the global server, which will redistribute them to other hospitals for the local models to utilize. This allows all participants to benefit from the new task knowledge without compromising the data privacy. Yet, the problem of federated continual learning also brings new challenges. First, there is not only the catastrophic forgetting from continual learning, but also the threat of potential interference from other clients. Figure 1 (b) describes this challenge with the results of a simple experiment. Here, we train a model for MNIST digit recognition while communicating the parameters from another client trained on a different dataset. When the knowledge transferred from the other client is relevant to the target task (SVHN), the model starts with high accuracy, converge faster and reach higher accuracy (green line), whereas the model underperforms the base model if the transferred knowledge is from a task highly different from the target task (CIFAR-10, red line). Thus, we need to selective utilize knowledge from other clients to minimize the inter-client interference and maximize inter-client knowledge transfer. Another problem with the federated learning is efficient communication, as communication cost could become excessively large when utilizing the knowledge of the other clients, since the communication cost could be the main bottleneck in practical scenarios when working with edge devices. Thus we want the knowledge to be represented as compactly as possible. To tackle these challenges, we propose a novel framework for federated continual learning, Federated Weighted Inter-client Transfer (FedWeIT), which decomposes the local model parameters into a dense base parameter and sparse task-adaptive parameters. FedWeIT reduces the interference between different tasks since the base parameters will encode task-generic knowledge, while the task-specific knowledge will be encoded into the task-adaptive parameters. When we utilize the generic knowledge, we also want the client to selectively utilize task-specific knowledge obtained at other clients. To this end, we allow each model to take a weighted combination of the task-adaptive parameters broadcast from the server, such that it can select task-specific knowledge helpful for the task at hand. FedWeIT is communication-efficient, since the task-adaptive parameters are highly sparse and only need to be communicated once when created. Moreover, when communication efficiency is not a critical issue as in cross-silo federated learning (Kairouz et al., 2019) , we can use our framework to incentivize each client based on the attention weights on its task-adaptive parameters. We validate our method on multiple different scenarios with varying degree of task similarity across clients against various federated learning and local continual learning models. The results show that our model obtains significantly superior performance over all baselines, adapts faster to new tasks, with largely reduced communication cost. The main contributions of this paper are as follows: • We introduce a new problem of Federated Continual Learning (FCL), where multiple models continuously learn on distributed clients, which poses new challenges such as prevention of inter-client interference and inter-client knowledge transfer. • We propose a novel and communication-efficient framework for federated continual learning, which allows each client to adaptively update the federated parameter and selectively utilize the past knowledge from other clients, by communicating sparse parameters.



Figure 1: (a): Concept. A continual learner at a hospital which learns on sequence of disease prediction tasks may want to utilize relevant task parameters from other hospitals. FCL allows such inter-client knowledge transfer via the communication of task-decomposed parameters. (b): Challenge of FCL. Interference from other clients, resulting from sharing irrelevant knowledge, may hinder an optimal training of target clients (Red) while relevant knowledge from other clients will be beneficial for their learning (Green).

