COMMUNICATION IN MULTI-AGENT REINFORCEMENT LEARNING: INTENTION SHARING

Abstract

Communication is one of the core components for learning coordinated behavior in multi-agent systems. In this paper, we propose a new communication scheme named Intention Sharing (IS) for multi-agent reinforcement learning in order to enhance the coordination among agents. In the proposed IS scheme, each agent generates an imagined trajectory by modeling the environment dynamics and other agents' actions. The imagined trajectory is a simulated future trajectory of each agent based on the learned model of the environment dynamics and other agents and represents each agent's future action plan. Each agent compresses this imagined trajectory capturing its future action plan to generate its intention message for communication by applying an attention mechanism to learn the relative importance of the components in the imagined trajectory based on the received message from other agents. Numeral results show that the proposed IS scheme significantly outperforms other communication schemes in multi-agent reinforcement learning.

1. INTRODUCTION

Reinforcement learning (RL) has achieved remarkable success in various complex control problems such as robotics and games (Gu et al. (2017) 2019)). That is, a message-generation network is defined at each agent and connected to other agents' policies or critic networks through communication channels. Then, the message-generation network is trained by using the gradient of other agents' policy or critic losses. Typically, the message-generation network is conditioned on the current observation or the hidden state of a recurrent network with observations as input. Thus, the trained message encodes the past and current observation information to minimize other agents' policy or critic loss. It has been shown that due to the capability of sharing observation information, this kind of communication scheme has good performance as compared to communication-free MARL algorithms such as independent learning, which is widely used in MARL, in partially observable environments. In this paper, we consider the following further question for communication in MARL: "How to harness the benefit of communication beyond sharing partial observation." We propose intention of each agent as the content of message to address the above question. Sharing intention using communication has been used in natural multi-agent systems like human society.



; Mnih et al. (2013); Silver et al. (2017)). Multi-agent reinforcement learning (MARL) extends RL to multi-agent systems, which model many practical real-world problems such as connected cars and smart cities (Roscia et al. (2013)). There exist several distinct problems in MARL inherent to the nature of multi-agent learning (Gupta et al. (2017); Lowe et al. (2017)). One such problem is how to learn coordinated behavior among multiple agents and various approaches to tackling this problem have been proposed (Jaques et al. (2018); Pesce & Montana (2019); Kim et al. (2020)). One promising approach to learning coordinated behavior is learning communication protocol among multiple agents (Foerster et al. (2016); Sukhbaatar et al. (2016); Jiang & Lu (2018); Das et al. (2019)). The line of recent researches on communication for MARL adopts end-to-end training based on differential communication channel (Foerster et al. (2016); Jiang & Lu (2018); Das et al. (

