SIMILARITY-BASED COOPERATION

Abstract

As machine learning agents act more autonomously in the world, they will increasingly interact with each other. Unfortunately, in many social dilemmas like the one-shot Prisoner's Dilemma, standard game theory predicts that ML agents will fail to cooperate with each other. Prior work has shown that one way to enable cooperative outcomes in the one-shot Prisoner's Dilemma is to make the agents mutually transparent to each other, i.e., to allow them to access one another's source code (Rubinstein, 1998; Tennenholtz, 2004) -or weights in the case of ML agents. However, full transparency is often unrealistic, whereas partial transparency is commonplace. Moreover, it is challenging for agents to learn their way to cooperation in the full transparency setting. In this paper, we introduce a more realistic setting in which agents only observe a single number indicating how similar they are to each other. We prove that this allows for the same set of cooperative outcomes as the full transparency setting. We also demonstrate experimentally that cooperation can be learned using simple ML methods.

1. INTRODUCTION

As AI systems start to autonomously interact with the world, they will also increasingly interact with each other. We already see this in contexts such as trading agents (CFTC & SEC, 2010) , but the number of domains where separate AI agents interact with each other in the world is sure to grow; for example, consider autonomous vehicles. In the language of game theory, AI systems will play general-sum games with each other. For example, autonomous vehicles may find themselves in Game-of-Chicken-like dynamics with each other (cf. Fox et al., 2018) . In many of these interactions, cooperative or even peaceful outcomes are not a given. For example, standard game theory famously predicts and recommends defecting in the one-shot Prisoner's Dilemma. Even when cooperative equilibria exist, there are typically many equilibria, including uncooperative and asymmetric ones. For instance, in the infinitely repeated Prisoner's Dilemma, mutual cooperation is played in some equilibria, but so is mutual defection, and so is the strategy profile in which one player cooperates 70% of the time while the other cooperates 100% of the time. Moreover, the strategies from different equilibria typically do not cooperate with each other. A recent line of work at the intersection of AI/(multi-agent) ML and game theory aims to increase AI/ML systems' ability to cooperate with each other (Stastny et al., 2021; Dafoe et al., 2021; Conitzer & Oesterheld, 2022) . Prior work has proposed to make AI agents mutually transparent to allow for cooperation in equilibrium (McAfee 1984; Howard 1988; Rubinstein 1998, Section 10.4; Tennenholtz 2004; Barasz et al. 2014; Critch 2019; Oesterheld 2019b) . Roughly, this literature considers for any given 2-player normal-form game Γ the following program meta game: Both players submit a computer program, e.g., some neural net, to choose actions in Γ on their behalf. The computer program then receives as input the computer program submitted by the other player. Prior work has shown that the program meta game has cooperative equilibria in the Prisoner's Dilemma. Unfortunately, there are multiple obstacles to cooperation based on full mutual transparency. 1) While partial transparency is the norm, settings of full transparency are rare. For example, while GPT-3's architecture and training regime are public knowledge, the exact learned model is not. 2) Games played with full transparency in general have many equilibria, including ones that are much worse for some or all players than the Nash equilibria of the underlying game (see the folk theorems given by Rubinstein 1998, Section 10.4, and Tennenholtz 2004) . In particular, full mutual transparency can make the problem of equilibrium selection harder. 3) The full transparency setting poses challenges to modern ML methods. In particular, it requires at least one of the models to receive as input a model that has at least as many parameters as itself. Meanwhile, most modern

