INFORMATION-THEORETIC UNDERPINNINGS OF GEN-ERALIZATION AND TRANSLATABILITY IN EMERGENT COMMUNICATION

Abstract

Traditional emergent communication (EC) methods often fail to generalize to novel settings or align with representations of natural language. Here, we show how controlling the Information Bottleneck (IB) tradeoff between complexity and informativeness (a principle thought to guide human languages) helps address both of these problems in EC. Using VQ-VIB, a recent method for training agents while controlling the IB tradeoff, we find that: (1) increasing pressure for informativeness, which encourages agents to develop a shared understanding beyond task-specific needs, leads to better generalization to more challenging tasks and novel inputs; (2) VQ-VIB agents develop an EC space that encodes some semantic similarities and facilitates open-domain communication, similar to word embeddings in natural language; and (3) when translating between English and EC, greater complexity leads to improved performance of teams of simulated English speakers and trained VQ-VIB listeners, but only up to a threshold corresponding to the English complexity. These results indicate the importance of informational constraints for improving self-play performance and human-agent interaction.

1. INTRODUCTION

We wish to develop artificial agents that communicate in grounded settings, via communication that enables high task utility, generalizability to novel settings, and good human-agent cooperation. Emergent communication (EC) methods, wherein agents learn to communicate with each other in an unsupervised manner by maximizing a reward function, take a step towards this vision by producing agents that use grounded communication (Lowe et al., 2017; 2020; Lazaridou & Baroni, 2020) . While numerous EC methods have succeeded in training agents to communicate with each other to solve a particular task, they still fall short of the vision of generalizable and human-interpretable communication. For example, agents trained to discriminate between two types of images will fail to discriminate between sixteen images (Chaabouni et al., 2021b) , and messages often violate human expectations for meanings (Kottur et al., 2017) . In this work, we take steps towards addressing these limitations by building on the informationtheoretic EC approach of Tucker et al. (2022) . This approach connects EC with the Information-Bottleneck (IB, Tishby et al., 1999) framework for semantic systems (Zaslavsky et al., 2018; Zaslavsky, 2020) , via the vector-quantized variational Information Bottleneck (VQ-VIB) neural architecture (Tucker et al., 2022) . Specifically, VQ-VIB agents are trained to optimize a tradeoff between maximizing utility (how well they perform a task), maximizing informativeness (how well a listener can infer a speaker's meaning, independently of any downstream task), and minimizing communicative complexity (roughly the number of bits allocated for communication). While previous EC methods typically focus on task-specific utility maximization (Lowe et al., 2017) , there is broad empirical evidence suggesting that human languages are guided by the IB informativeness-complexity tradeoff (Zaslavsky et al., 2018; 2019; 2021; 2022; Mollica et al., 2021) . Therefore, we hypothesize that taking into account informativeness could improve EC generalizability to novel settings while adjusting complexity could improve the translatability between EC and human languages. Results from our experiments support this hypothesis. First, we show that encouraging informativeness allows EC agents to generalize beyond their training distribution to handle more challenging 1

