LEARNING TO COOPERATE AND COMMUNICATE OVER IMPERFECT CHANNELS

Abstract

Information exchange in multi-agent systems improves the cooperation among agents, especially in partially observable settings. This can be seen as part of the problem in which the agents learn how to communicate and to solve a shared task simultaneously. In the real world, communication is often carried out over imperfect channels and this requires the agents to deal with uncertainty due to potential information loss. In this paper, we consider a cooperative multi-agent system where the agents act and exchange information in a decentralized manner using a limited and unreliable channel. To cope with such channel constraints, we propose a novel communication approach based on independent Q-learning. Our method allows agents to dynamically adapt how much information to share by sending messages of different size, depending on their local observations and the channel properties. In addition to this message size selection, agents learn to encode and decode messages to improve their policies. We show that our approach outperforms approaches without adaptive capabilities and discuss its limitations in different environments.

1. INTRODUCTION

In multi-agent systems, cooperation and communication are closely related. Whenever a task requires agents with partial views to cooperate, the exchange of information about one's view and intent can help to reduce uncertainty and allows for more well-founded decisions. Communication allows agents to solve tasks more efficiently, and can even be necessary to achieve acceptable results (Singh et al., 2019) . As an example, consider a safety-critical autonomous driving scenario (Li et al., 2021) . By letting the cars exchange sensor data or abstract details about detected objects in the scene, occluded objects can be considered in the planning processes of all cars and reduce the risk of collisions. Multi-agent reinforcement learning (MARL) comprises learning methods for problems where multiple agents interact with a shared environment (Buşoniu et al., 2010; Hernandez-Leal et al., 2019) . The goal is to find an optimal policy for the agents that maximizes the outcome of their actions with respect to the environment's reward signal. Key challenges in MARL include non-stationarity (Papoudakis et al., 2019) , the credit assignment problem (Zhou et al., 2020) and partial observability (Oroojlooyjadid & Hajinezhad, 2019) . We focus on cooperative environments with partial observability. As communication is essential in cooperative environments, many works include a predefined information exchange between agents (Melo et al., 2011; Schneider et al., 2021) . Additionally, there is ongoing research to include learnable communication into MARL approaches. Pioneering work gave first empirical evidence that communication between agents can be learned with deep MARL (Foerster et al., 2016; Lowe et al., 2017; Sukhbaatar et al., 2016) . This enhances the performance on existing environments and allows to address a new class of problems that require communication between agents. Building upon these ideas, many researchers proposed methods to improve the performance and stability of these approaches (Gupta et al., 2020; Jiang & Lu, 2018; Li et al., 2021) . While related work investigates effects of using different fixed message sizes (Li et al., 2022) and multiple communication rounds (Das et al., 2019) , selectively sending messages (Singh et al., 2019) , and sending messages only to other agents in their proximity (Jiang & Lu, 2018) , most of these approaches are designed for communication channels without capacity limitations or message losses. Recent approaches started to investigate such settings, e.g. by learning central controllers for coordinated 1

