REVISITING POPULATIONS IN MULTI-AGENT COMMUNICATION

Abstract

Despite evidence from sociolinguistics that larger groups of speakers tend to develop more structured languages, the use of populations has failed to yield significant benefits in emergent multi-agent communication. In this paper we reassess the validity of the standard training protocol and illustrate its limitations. Specifically, we analyze population-level communication at the equilibrium in sender-receiver Lewis games. We find that receivers co-adapt to senders they are interacting with, which limits the effect of the population. Informed by this analysis, we propose an alternative training protocol based on "partitioning" agents. Partitioning isolates sender-receiver pairs, limits co-adaptation, and results in a new global optimization objective where agents maximize (1) their respective "internal" communication accuracy and (2) their alignment with other agents. In experiments, we find that agents trained in partitioned populations are able to communicate successfully with new agents which they have never interacted with and tend to develop a shared language. Moreover, we observe that larger populations develop languages that are more compositional. Our findings suggest that scaling up to populations in multi-agent communication can be beneficial, but that it matters how we scale up.

1. INTRODUCTION

Uncovering the mechanisms that underlie our ability to communicate using language is an important stepping stone towards developing machine learning models that are capable of coordinating and interacting via natural language. Over the last few years, there has been increasing interest in simulating the emergence of language using artificial agents trained with reinforcement learning to communicate to achieve a cooperative task (Lazaridou & Baroni, 2020) . Typically, agents are trained to perform a variant of the Lewis signaling game (Lewis, 1969; Skyrms, 2010) wherein a sender emits a message describing an object and a receiver attempts to reconstruct the object based on the description. This line of work has applications to semi-supervised learning. For example, agents that develop languages exhibiting universal properties of natural languages may be used as useful initialization for downstream tasks such as image captioning (Lazaridou et al., 2020) or representation learning (Dessì et al., 2021) . Most previous research has focused on communication between a single pair of agents. However, there is mounting evidence that the communication protocols developed in this restricted setting become highly specialized and exhibit properties that are at odds with those found in human languages (Bouchacourt & Baroni, 2018; Chaabouni et al., 2019) : for example agents are able to solve the task successfully while using languages that are not compositional (Kottur et al., 2017; Chaabouni et al., 2020) . These idiosyncrasies of the emergent languages can preclude their use in practical applications (Lazaridou et al., 2020) . As a possible solution, a growing body of work is advocating for scaling up the emergent communication literature to populations of more than two agents communicating simultaneously (Harding Graesser et al., 2019; Kim & Oh, 2021; Rita et al., 2022a; Chaabouni et al., 2022) . Indeed, there is substantial evidence within the language sciences that population dynamics shape the language structure Raviv et al. ( 2019); Nölle et al. (2020) . In spite of this fact, several negative results have been obtained, showing that training agents in population yield marginal benefits without explicit pressure towards e.g. population diversity (Rita et al., 2022a) or emulation mechanisms (Chaabouni et al., 2022) . In this paper, we call into question the way such populations are trained. By studying a simple referential game, we evaluate populations on two desirable features observed in natural language: • Agents are able to communicate with new partners within the same population (Gupta et al., 2021) • Larger populations tend to develop more structured languages (Nölle et al., 2020) . We provide evidence that populations of artificial agents do not always possess these features (as also attested by previous work, e.g. Kim & Oh (2021); Chaabouni et al. ( 2022)). To shed light on this phenomenon, we analyze the behaviour of agents in a population at the equilibrium ( §2). We find that with the standard training procedure, the functional form of the objective is the same as that of a single pair of agents, due to receivers co-adapting to their training partners. As our main contribution, we propose an alternative training procedure which partitions sender-receiver pairs and limits co-adaptation of receiver agents ( §3). We show that this new training paradigm maximizes a different objective at the population level. In particular, it explicitly promotes mutual-intelligibility across different agents. In experiments, we find that agents trained in partitioned populations are able to communicate successfully with new communication partners with which they have never interacted during training, and that languages spoken by various agents tend to be similar to one another ( §5). In addition, we observe that (1) languages developed in partitioned populations tend to be more compositional and (2) there is a population size effect whereby larger populations develop more structured languages ( §6). Our results show that there are multiple ways to generalize from single agent pairs to larger populations, and that these design choices matter when it comes to studying the emergent language.

2. COMMUNICATION GAME

We study communication in referential games, a variant of the Lewis signaling game (Lewis, 1969) proposed by Lazaridou et al. (2017) . The game proceeds as follows: during each round, a sender agent π observes an object x ∈ X (e.g., an arbitrary categorical entity, or a natural images) sampled from input space X according to distribution p and generates a message m ∼ π(• | x). Messages consist of variable length sequences of tokens picked from a discrete vocabulary V . Note that the tokens themselves are arbitrary and meaningless (typically they are represented as numbers from 1 to |V |). A receiver agent ρ then observes message m and must predict the original object from among a set of candidates C = {x, y 1 , . . . y |C-1| } containing x and |C| -1 distractors, where each distractor y is sampled uniformly without replacement from the input space excluding the original object, X \ {x}. Concretely, this is implemented by calculating a score f (y, m) for each candidate y and defining the probability of a candidate conditioned on the message ρ(• | m, C) as e f (x,m) y∈C f (y,m) . Based on the receiver's success, the sender agent receives a reward R(x, ρ(• | m, C)). In practice, both senders and receivers are implemented as neural networks π θ and ρ ψ with parameters θ and ψ estimated by gradient descent. The sender is trained to maximize its expected reward using the REINFORCE algorithm (Williams, 1992), while the receiver maximizes the expected log-likelihood of identifying the original object, log ρ ψ (x | m, C) (also known as the InfoNCE objective; Oord et al. ( 2018)). Denoting as E x∼p the expectation over x sampled from p, the corresponding training objectives are: J s (θ) = E x∼p E m∼π θ (•|x) E C∼p R(x, ρ ψ (• | m, C)) (1) J r (ψ) = E x∼p E m∼π θ (•|x) E C∼p log ρ ψ (x | m, C)

2.1. POPULATION LEVEL TRAINING

The two-player referential game can be generalized to larger populations of agents (Mordatch & Abbeel, 2018; Chaabouni et al., 2022) . In the most general case, we consider a population of N s senders and N r receivers that are linked by a bipartite communication graph G defining connections between senders and receiver (π θi , ρ ψj ) (Harding Graesser et al., 2019; Kim & Oh, 2021) . At training time, sender-receiver pairs are repeatedly sampled and trained to perform a round of the game. Importantly, only agent pairs that are connected in the communication graph are sampled. Throughout this paper, we will refer to this type of training as Standard training.

