PERSONALIZED FEDERATED HYPERNETWORKS FOR PRIVACY PRESERVATION IN MULTI-TASK REINFORCE-MENT LEARNING

Abstract

Multi-Agent Reinforcement Learning currently focuses on implementations where all data and training can be centralized to one machine. But what if local agents are split across multiple tasks, and need to keep data private between each? We develop the first application of Personalized Federated Hypernetworks (PFH) to Reinforcement Learning (RL). We then present a novel application of PFH to few-shot transfer, and demonstrate significant initial increases in learning. PFH has never been demonstrated beyond supervised learning benchmarks, so we apply PFH to an important domain: RL price-setting for energy demand response. We consider a general case across where agents are split across multiple microgrids, wherein energy consumption data must be kept private within each microgrid. Together, our work explores how the fields of personalized federated learning and RL can come together to make learning efficient across multiple tasks while keeping data secure.

1. INTRODUCTION

As Reinforcement Learning (RL) is brought to bear on pressing societal issues such as the green energy transition, the types of environments that RL must perform well in may display characteristics exotic to classical RL environments. Real applications at scale may require privacy guarantees which are not provided by modern multi-agent RL algorithms as they may train on privileged or corporate data (Lowe et al., 2017; Sunehag et al., 2017; Rashid et al., 2018) ; any app that personalizes an RL agent to individual users must take care to protect their privacy by not storing all their data in a central server. Real world applications will also likely feature heterogeneous tasks; every user, robot, energy system will have different traits that cannot be accounted for by "one size fits all" algorithms. As previous work in privacy-preserving RL (Qi et al., 2021; Wang et al., 2020c; Ren et al., 2019; Anwar & Raychowdhury, 2021) does not extend to personalized models, the competing goals of privacy and personalization must be accomplished at the other's expense. One approach toward privacy preservation by decentralizing data servers within supervised learning is federated learning (Shokri & Shmatikov, 2015) . Federated learning algorithms train a global model from gradient updates sent by individual clients training on their own data, which is never sent to the central server. An extension of federated learning technique is personalized federated learning using hypernetworks (PFH, Shamsian et al. (2021) ), which allows for behavior tailored to individual heterogeneous tasks by splitting the model into a global common component (i.e. the hypernetwork), and a local individual component (a local network generated by the hypernetwork), which is tailored to each client. This task specialization allows for learning common features together in the global component while allowing for learning client-specific knowledge in the local component. We present a novel application of PFH to RL in a realistic power systems setting that requires both privacy and heterogeneity in agents to accommodate diverse, sensitive environments. An RL controller optimizing hourly transactive energy pricing has been shown to optimize energy usage (Li & Hong, 2014; Spangher, 2021; Vázquez-Canteli et al., 2019; Agwan et al., 2021) by incentivizing consumers, at the scale of groups of buildings (microgrids) or office workers within buildings, to shift demand to different times of day. By guiding consumers to defer energy demands to hours when solar is especially active, it is possible to drop a building's carbon impact to 48% of normal operation through RL price-setting (Jang et al., 2022) , which could have massive implications for grid sustainability. However, RL can be extremely data hungry; prior transactive control attempts required about 80 years of training data (Agwan et al., 2021) . To increase the amount of data available, we consider multiple RL agents, each managing their own (slightly different) microgrid through energy prices and collecting data in parallel. This microgrid environment is a multi-task, multi-agent setup in which the management of each microgrid, through prices, constitutes a task. We characterize our problem as multi-agent because we have multiple RL agents optimizing a shared reward (total profit), and multi-task because optimization of profit in each of the different microgrids presents tasks that are related but also independent due to differences in size, number of batteries in each building, etc. We hypothesize we can accelerate training by incorporating data from multiple microgrids with different characteristics. Learning to set prices using data from multiple microgrids (source tasks) also opens the door to few-shot learning in new microgrids (target tasks), wherein we learn to generate near-optimal prices for a microgrid very quickly. However, energy data is data in which privacy concerns are paramount. It is our hope to contribute to privacy protection by aggregating learning, not data, to one central source. Not only would keeping data of buildings' energy consumption at one central location present a major privacy concern if this central machine is compromised, but message passing of the raw data could present an additional source of vulnerability. Although each microgrid might have access to the data of a few buildings at a time, the scale of damage would be much larger if data was stored in a central server across multiple microgrids. We now present a hypothetical setting in which our architecture would be useful. One could imagine a hacker being able to learn when the hypothetical company CovertAI trains their new 80 quintillion parameter language model CPT-4 from the energy consumption of CovertAI's compute warehouses. The hacker could sabotage power lines at the right moment to erase learning gains. They may then turn their attention to residential neighborhoods. Here, they could figure out when people are not home from the energy consumption of domestic buildings, timing a theft; they could also disaggregate energy signals to learn the appliances the homeowner has or glean sensitive health information if medical devices produce noticeable patterns in energy consumption. Applying PFH to the energy application remedies both of these competing issues. PFH takes privacy-preservation into account by design, and accounts for heterogeneous tasks by generating RL agents individualized to each microgrid's size, number of solar panels, batteries, etc. We demonstrate that PFH learns the underlying factors that define an environment by applying PFH to the microgrid price-setting problem, where we observe increases on the scale of millions of dollars in total microgrid profit (reward) over federated and local learning. We also demonstrate how PFH can be used for few-shot transfer learning for new local agents entering the system by reporting drastic training speed-ups (>100x) when transferring from source tasks to target tasks. Thus PFH drastically increases the feasibility of RL for energy price-setting. Methodologically, our paper is novel in its presentation of an adaptation of a state of the art privacypreserving algorithm to RL. To our knowledgefoot_0 , we are the first to explicitly apply personalized federated learning to multi-task, multi-agent RL when centralized learning and joint action-values are unavailable. Application-wise, our paper is also novel in its improvement in energy demand response across heterogeneous microgrids. We hope our work highlights an important microgrid environment to the RL community, helps establish the use of PFH within RL, and allows for RL to address problems where learning speed and privacy are fundamental.

2. DEFINITIONS

Energy Demand Response is a technique used by grid operators to incentivize consumers to shift demand to times when it is better for grid stability/climate emissions, such as when solar energy peaks. Demand response has the same function as grid-level batteries would in easing the volatility of wind and solar energy and is seen as an important tool in the energy transition (Albadi & El-Saadany, 2007) .



See Appendix A for a discussion of related work.

