SOCIAL NETWORK STRUCTURE SHAPES INNOVATION: EXPERIENCE SHARING IN RL WITH SAPIENS

Abstract

Human culture relies on innovation: our ability to continuously explore how existing elements can be combined to create new ones. Innovation is not solitary, it relies on collective search and accumulation. Reinforcement learning (RL) approaches commonly assume that fully-connected groups are best suited for innovation. However, human laboratory and field studies have shown that hierarchical innovation is more robustly achieved by dynamic social network structures. In dynamic settings, humans oscillate between innovating individually or in small clusters, and then sharing outcomes with others. To our knowledge, the role of social network structure on innovation has not been systematically studied in RL. Here, we use a multi-level problem setting (WordCraft), with three different innovation tasks to test the hypothesis that the social network structure affects the performance of distributed RL algorithms. We systematically design networks of DQNs sharing experiences from their replay buffers in varying structures (fullyconnected, small world, dynamic, ring) and introduce a set of behavioral and mnemonic metrics that extend the classical reward-focused evaluation framework of RL. Comparing the level of innovation achieved by different social network structures across different tasks shows that, first, consistent with human findings, experience sharing within a dynamic structure achieves the highest level of innovation in tasks with a deceptive nature and large search spaces. Second, experience sharing is not as helpful when there is a single clear path to innovation. Third, the metrics we propose, can help understand the success of different social network structures on different tasks, with the diversity of experiences on an individual and group level lending crucial insights.

1. INTRODUCTION

Unlike herds or swarms, human social networks solve different tasks with different topologies (Momennejad, 2022) . Human and computational studies show that properties of both the social network structure and task affect the abilities of groups to search collectively: social network structures with high connectivity are better suited for quick convergence in problems with clear global optima (Coman et al., 2016; Momennejad et al., 2019) , while partially-connected structures perform best in deceptive tasks, where acting greedily in the short-term leads to missing the optimal solution (Derex & Boyd, 2016; Lazer & Friedman, 2007; Cantor et al., 2021; Du et al., 2021; Adjodah et al., 2019) . Despite this evidence, works in distributed reinforcement learning (RL) have focused on fully-connected architectures (Mnih et al., 2016; Horgan et al., 2018; Espeholt et al., 2018; Nair et al., 2015; Christianos et al., 2020; Schmitt et al., 2019; Jaderberg et al., 2018) . Here, we test the performance of different social network structures in groups of RL agents that share their experiences in a distributed RL learning paradigm. We refer to such groups as multi-agent topologies, introduce SAPIENS, a learning framework for Structuring multi-Agent toPologies for Innovation through ExperieNce Sharingfoot_0 , and evaluate it on a deceptive task that models collective innovation. Innovations represent the expansion of an agent's behavioral repertoire with new problem-solving abilities and are, therefore, a necessary ingredient of continuous learning (Leibo et al., 2019) . They arise from tinkering, recombination and adoption of existing innovations (Solé et al., 2013; Derex & Boyd, 2016) and have been characterized as a type of combinatorial search constrained by semantics



We provide an implementation of SAPIENS and code to reproduce the simulations we report.1

