SOCIAL NETWORK STRUCTURE SHAPES INNOVATION: EXPERIENCE SHARING IN RL WITH SAPIENS

Abstract

Human culture relies on innovation: our ability to continuously explore how existing elements can be combined to create new ones. Innovation is not solitary, it relies on collective search and accumulation. Reinforcement learning (RL) approaches commonly assume that fully-connected groups are best suited for innovation. However, human laboratory and field studies have shown that hierarchical innovation is more robustly achieved by dynamic social network structures. In dynamic settings, humans oscillate between innovating individually or in small clusters, and then sharing outcomes with others. To our knowledge, the role of social network structure on innovation has not been systematically studied in RL. Here, we use a multi-level problem setting (WordCraft), with three different innovation tasks to test the hypothesis that the social network structure affects the performance of distributed RL algorithms. We systematically design networks of DQNs sharing experiences from their replay buffers in varying structures (fullyconnected, small world, dynamic, ring) and introduce a set of behavioral and mnemonic metrics that extend the classical reward-focused evaluation framework of RL. Comparing the level of innovation achieved by different social network structures across different tasks shows that, first, consistent with human findings, experience sharing within a dynamic structure achieves the highest level of innovation in tasks with a deceptive nature and large search spaces. Second, experience sharing is not as helpful when there is a single clear path to innovation. Third, the metrics we propose, can help understand the success of different social network structures on different tasks, with the diversity of experiences on an individual and group level lending crucial insights.

1. INTRODUCTION

Unlike herds or swarms, human social networks solve different tasks with different topologies (Momennejad, 2022) . Human and computational studies show that properties of both the social network structure and task affect the abilities of groups to search collectively: social network structures with high connectivity are better suited for quick convergence in problems with clear global optima (Coman et al., 2016; Momennejad et al., 2019) , while partially-connected structures perform best in deceptive tasks, where acting greedily in the short-term leads to missing the optimal solution (Derex & Boyd, 2016; Lazer & Friedman, 2007; Cantor et al., 2021; Du et al., 2021; Adjodah et al., 2019) . Despite this evidence, works in distributed reinforcement learning (RL) have focused on fully-connected architectures (Mnih et al., 2016; Horgan et al., 2018; Espeholt et al., 2018; Nair et al., 2015; Christianos et al., 2020; Schmitt et al., 2019; Jaderberg et al., 2018) . Here, we test the performance of different social network structures in groups of RL agents that share their experiences in a distributed RL learning paradigm. We refer to such groups as multi-agent topologies, introduce SAPIENS, a learning framework for Structuring multi-Agent toPologies for Innovation through ExperieNce Sharingfoot_0 , and evaluate it on a deceptive task that models collective innovation. Innovations represent the expansion of an agent's behavioral repertoire with new problem-solving abilities and are, therefore, a necessary ingredient of continuous learning (Leibo et al., 2019) . They arise from tinkering, recombination and adoption of existing innovations (Solé et al., 2013; Derex & Boyd, 2016) and have been characterized as a type of combinatorial search constrained by semantics dictating the feasible combinations of innovations (Solé et al., 2013; Derex & Boyd, 2016) . We adopt this definition: innovations are a type of collective search task with : a) a multi-level search space, where innovations arise out of recombination of existing ones (Hafner, 2021) b) rewards that increase monotonically with the level of innovation, in order to capture the human intrinsic motivation for progress (Solé et al., 2013) . Laboratory and field studies of human groups have shown that collective innovation is highly contingent on the social network structure (Momennejad, 2022; Migliano & Vinicius, 2022; Derex & Boyd, 2016) . The reason for this lies in the exploration versus exploitation dynamics of social networks. High clustering and long shortest path in partially-connected structures help maintain diversity in the collective at the benefit of exploration, while high connectivity quickly leads to conformity, which benefits exploitation (Lazer & Friedman, 2007) . Of particular interest are structures that achieve a balance in this trade-off: small-worlds are static graphs that, due a modular structure with long-range connections, achieve both high clustering and small shortest path (Watts & Strogatz, 1998) . Another example are dynamic structures, where agents are able to periodically change neighbors (Volz & Meyers, 2007) . These two families of graphs have the attractive property that they both locally protect innovations and quickly disseminate good solutions (Derex & Boyd, 2016) . Despite progress on multiple fronts, many open questions remain before we get a clear understanding of how social network structure shapes innovation. On the cognitive science side, computational and human laboratory studies of collective innovation are few and have studied a single task where two innovations are combined to create a new one (Derex & Boyd, 2016; Cantor et al., 2021) , while most works study other types of collective search that do not resemble innovation (Mason & Watts, 2012; Mason et al., 2008; Lazer & Friedman, 2007; Fang et al., 2010) . Furthermore, laboratory studies have collected purely behavioral data (Mason et al., 2008; Derex & Boyd, 2016) , while studies of collective memory have shown significant influence of social interactions on individual memory (Coman et al., 2016) , indicating that mnemonic data may be another good source of information. In distributed RL, studies are hypothesizing that the reason why groups outperform single agents not just in terms of speed, but also in terms of performance, is the increased diversity of experiences collected by heterogeneous agents but not explicitly measure it (Nair et al., 2015; Horgan et al., 2018) . In this case two steps seem natural: introducing appropriate metrics of diversity, and increasing it, not only through heterogeneity, but also through the social network topology. To achieve this we propose SAPIENS, a distributed RL learning framework for modeling a group of agents exchanging experiences according to a social network topology. We study instantiations of SAPIENS where multiple DQN learners (Mnih et al., 2013) share experience tuples from their replay buffers with their neighbors in different static and dynamic social network structures and compare them to other distributed RL algorithms (Mnih et al., 2016; Nair et al., 2015) . We employ Wordcraft (Jiang et al., 2020b) as a test-bed and design three custom tasks (Figures 1 and 2 ) covering innovation challenges of different complexity: (i) a task with a single innovation path to an easy-to-find global optimum. This type of task can be used to model a linear innovation structure, such as the evolution of the fork from knife to having, two-, three-and, eventually four tines. (Solé et al., 2013) and is not a deceptive task. (ii) a task with two paths that individually lead to local optima, but when combined, can merge toward the global optimum. Ispired from previous studies in cognitive science (Derex & Boyd, 2016; Cantor et al., 2021) this task we can capture innovations that were repurposed after their invention, such as Gutenberg's screw press leading to the print press (Solé et al., 2013) . (iii) a task with ten paths, only one of which leads to the global optimum, which captures search in vast spaces. In addition to the two deceptive tasks in Wordcraft, we also evaluate SAPIENS algorithms on a deceptive task implemented in a grid world. We empirically show that the performance of SAPIENS depends on the inter-play between social network structure and task demands. Dynamic structures perform most robustly, converging quickly in the easy task, avoiding local optima in the second task, and exploring efficiently in the third task. To interpret these findings, we propose and compute novel behavioral and mnemonic metrics that quantify, among others, the diversity of experiences. Contributions Our contributions are two-fold. From a cognitive science perspective, SAPIENS is, to our knowledge, the first computational study of hypotheses in human studies relating social network structure to collective innovation, that employs deep RL as the individual learning mechanism. Compared to the simple agent-based models employed by previous computational studies (Lazer & Friedman, 2007; Cantor et al., 2021; Mason & Watts, 2012) , deep RL offers three main advantages : i) it enables empirical experiments with more complex test-beds and larger search spaces



We provide an implementation of SAPIENS and code to reproduce the simulations we report.

