CHEAP TALK DISCOVERY AND UTILIZATION IN MULTI-AGENT REINFORCEMENT LEARNING

Abstract

By enabling agents to communicate, recent cooperative multi-agent reinforcement learning (MARL) methods have demonstrated better task performance and more coordinated behavior. Most existing approaches facilitate inter-agent communication by allowing agents to send messages to each other through free communication channels, i.e., cheap talk channels. Current methods require these channels to be constantly accessible and known to the agents a priori. In this work, we lift these requirements such that the agents must discover the cheap talk channels and learn how to use them. Hence, the problem has two main parts: cheap talk discovery (CTD) and cheap talk utilization (CTU). We introduce a novel conceptual framework for both parts and develop a new algorithm based on mutual information maximization that outperforms existing algorithms in CTD/CTU settings. We also release a novel benchmark suite to stimulate future research in CTD/CTU.

1. INTRODUCTION

Effective communication is essential for many multi-agent systems in the partially observable setting, which is common in many real-world applications like elevator control (Crites & Barto, 1998) and sensor networks (Fox et al., 2000) . Communicating the right information at the right time becomes crucial to completing tasks effectively. In the multi-agent reinforcement learning (MARL) setting, communication often occurs on free channels known as cheap talk channels. The agents' goal is to learn an effective communication protocol via the channel. The transmitted messages can be either discrete or continuous (Foerster et al., 2016) . Existing work often assumes the agents have prior knowledge (e.g., channel capacities and noise level) about these channels. However, such assumptions do not always hold. Even if these channels' existence can be assumed, they might not be persistent, i.e., available at every state. Consider the real-world application of inter-satellite laser communication. In the case, communication channel is only functional when satellites are within line of sight. This means positioning becomes essential (Lakshmi et al., 2008) . Thus, Without these assumptions, agents need the capability to discover where to best communicate before learning a protocol in realistic MARL settings. In this work, we investigate the setting where these assumptions on cheap talk channels are lifted. Precisely, these channels are only effective in a subset of the state space. Hence, agents must discover where these channels are before they can learn how to use them. We divide this problem into two sequential steps: cheap talk discovery (CTD) and cheap talk utilization (CTU). The problem is a strict generalization of the common setting used in the emergent communication literature with 1

