EXPLORING ZERO-SHOT EMERGENT COMMUNICA-TION IN EMBODIED MULTI-AGENT POPULATIONS Anonymous

Abstract

Effective communication is an important skill for enabling information exchange and cooperation in multi-agent settings. Indeed, emergent communication is now a vibrant field of research, with common settings involving discrete cheap-talk channels. One limitation of this setting is that it does not allow for the emergent protocols to generalize beyond the training partners. Furthermore, so far emergent communication has primarily focused on the use of symbolic channels. In this work, we extend this line of work to a new modality, by studying agents that learn to communicate via actuating their joints in a 3D environment. We show that under realistic assumptions, a non-uniform distribution of intents and a commonknowledge energy cost, these agents can find protocols that generalize to novel partners. We also explore and analyze specific difficulties associated with finding these solutions in practice. Finally, we propose and evaluate initial training improvements to address these challenges, involving both specific training curricula and providing the latent feature that can be coordinated on during training.

1. INTRODUCTION

The ability to communicate effectively with other agents is part of a necessary skill repertoire of intelligent agents and, by definition, can only be studied in multi-agent contexts. Over the last few years, a number of papers have studied emergent communication in multi-agent settings (Lazaridou et al., 2016; Havrylov & Titov, 2017; Cao et al., 2018; Bouchacourt & Baroni, 2018; Eccles et al., 2019; Graesser et al., 2019; Chaabouni et al., 2019; Lowe et al., 2019b) . This work typically assumes a symbolic (discrete) cheap-talk channel, through which agents can send messages that have no impact on the reward function or transition dynamics. A common task is the so called referential game, in which a sender observes an intent needing to be communicated to a listener via a message. In these cheap-talk settings, the solution space typically contains many equivalent but mutually incompatible (self-play) policies. For example, permuting bits in the channel and adapting the receiver policy accordingly would preserve payouts, but differently permuted senders and receivers are mutually incompatible. This makes it difficult for independently trained agents to utilize the cheap-talk channel at test time, a setting which is formalized as zero-shot (ZS) coordination (Hu et al., 2020) . In contrast, we study how gesture-based communication can emerge under realistic assumptions. Specifically, this work considers emergent communication in the context of embodied agents that learn to communicate through actuating and observing their joints in simulated physical environments. In other words, our setup is a referential game, where each message is a multi-step process that produces an entire trajectory of limb motion (continuous actions) in a simulated 3D world. Not only does body language play a crucial role in social interactions, but furthermore, zoomorphic agents, robotic manipulators, and prelingual infants are generally not expected to use symbolic language to communicate at all. From a practical point of view, it is clear that our future AI agents will need to signal and interpret the body language of other (human) agents, e.g., when self-driving cars decide whether it is safe to cross an intersection. With that, there has been work on the emergence of grounded language for robots (Steels et al., 2012; Spranger, 2016) . To the best of our knowledge however, we are first to explore deep reinforcement learning for emergent communication in the context of embodied agents using articulated motion.

