NEURAL AGENTS STRUGGLE TO TAKE TURNS IN BIDIRECTIONAL EMERGENT COMMUNICATION

Abstract

The spontaneous exchange of turns is a central aspect of human communication. Although turn-taking conventions come to us naturally, artificial dialogue agents struggle to coordinate, and must rely on hard-coded rules to engage in interactive conversations with human interlocutors. In this paper, we investigate the conditions under which artificial agents may naturally develop turn-taking conventions in a simple language game. We describe a cooperative task where success is contingent on the exchange of information along a shared communication channel where talking over each other hinders communication. Despite these environmental constraints, neural-network based agents trained to solve this task with reinforcement learning do not systematically adopt turn-taking conventions. However, we find that agents that do agree on turn-taking protocols end up performing better. Moreover, agents that are forced to perform turn-taking can learn to solve the task more quickly. This suggests that turn-taking may help to generate conversations that are easier for speakers to interpret.



1 INTRODUCTION • ★ ★ ■ • ★ Figure 1 : Illustration of our proposed game. Both agents can exchange utterances through a shared communication channel. At each step of the conversation, agents can decide to either speak or stay silent. However, information cannot be transmitted if both agents decide to speak at the same time. Natural conversations involve a rapid exchange of utterances where speakers coordinate on-the-fly to avoid talking over each other. This turn-taking phenomenon is ubiquitous across cultures (Stivers et al., 2009) and is even found in some forms of animal communication (Pika et al., 2018; Demartsev et al., 2018) . The ability to engage in spontaneous turn-taking develops early in humans, even before linguistic competence (Nguyen et al., 2021) and allows us to hold fluent conversations with very little downtime between utterances (Heldner & Edlund, 2010). In contrast, fluid turn-taking is difficult to replicate in artificial dialogue systems. Modern conversational agents often rely on explicit cues, for instance pressing "enter" in text-based chatbots, the use of specific wake-words (Gao et al., 2020) , or long silences of pre-determined length (Skantze, 2021). The goal of this paper is to provide a testbed for studying the conditions under which artificial agents may develop a turn-taking convention to resolve a cooperative task. We describe a simple two-player game where agents observe partial views on an object which they must reconstruct. Agents can exchange information by emitting symbols across a shared communication channel over multiple rounds. The game exhibits two

