SIMILARITY-BASED COOPERATION

Abstract

As machine learning agents act more autonomously in the world, they will increasingly interact with each other. Unfortunately, in many social dilemmas like the one-shot Prisoner's Dilemma, standard game theory predicts that ML agents will fail to cooperate with each other. Prior work has shown that one way to enable cooperative outcomes in the one-shot Prisoner's Dilemma is to make the agents mutually transparent to each other, i.e., to allow them to access one another's source code (Rubinstein, 1998; Tennenholtz, 2004) -or weights in the case of ML agents. However, full transparency is often unrealistic, whereas partial transparency is commonplace. Moreover, it is challenging for agents to learn their way to cooperation in the full transparency setting. In this paper, we introduce a more realistic setting in which agents only observe a single number indicating how similar they are to each other. We prove that this allows for the same set of cooperative outcomes as the full transparency setting. We also demonstrate experimentally that cooperation can be learned using simple ML methods.

1. INTRODUCTION

As AI systems start to autonomously interact with the world, they will also increasingly interact with each other. We already see this in contexts such as trading agents (CFTC & SEC, 2010) , but the number of domains where separate AI agents interact with each other in the world is sure to grow; for example, consider autonomous vehicles. In the language of game theory, AI systems will play general-sum games with each other. For example, autonomous vehicles may find themselves in Game-of-Chicken-like dynamics with each other (cf. Fox et al., 2018) . In many of these interactions, cooperative or even peaceful outcomes are not a given. For example, standard game theory famously predicts and recommends defecting in the one-shot Prisoner's Dilemma. Even when cooperative equilibria exist, there are typically many equilibria, including uncooperative and asymmetric ones. For instance, in the infinitely repeated Prisoner's Dilemma, mutual cooperation is played in some equilibria, but so is mutual defection, and so is the strategy profile in which one player cooperates 70% of the time while the other cooperates 100% of the time. Moreover, the strategies from different equilibria typically do not cooperate with each other. A recent line of work at the intersection of AI/(multi-agent) ML and game theory aims to increase AI/ML systems' ability to cooperate with each other (Stastny et al., 2021; Dafoe et al., 2021; Conitzer & Oesterheld, 2022) . Prior work has proposed to make AI agents mutually transparent to allow for cooperation in equilibrium (McAfee 1984; Howard 1988; Rubinstein 1998, Section 10.4; Tennenholtz 2004; Barasz et al. 2014; Critch 2019; Oesterheld 2019b) . Roughly, this literature considers for any given 2-player normal-form game Γ the following program meta game: Both players submit a computer program, e.g., some neural net, to choose actions in Γ on their behalf. The computer program then receives as input the computer program submitted by the other player. Prior work has shown that the program meta game has cooperative equilibria in the Prisoner's Dilemma. Unfortunately, there are multiple obstacles to cooperation based on full mutual transparency. 1) While partial transparency is the norm, settings of full transparency are rare. For example, while GPT-3's architecture and training regime are public knowledge, the exact learned model is not. 2) Games played with full transparency in general have many equilibria, including ones that are much worse for some or all players than the Nash equilibria of the underlying game (see the folk theorems given by Rubinstein 1998, Section 10.4, and Tennenholtz 2004). In particular, full mutual transparency can make the problem of equilibrium selection harder. 3) The full transparency setting poses challenges to modern ML methods. In particular, it requires at least one of the models to receive as input a model that has at least as many parameters as itself. Meanwhile, most modern Policy 2 successes of ML use models that are orders of magnitudes larger than the input. Consequently, we are not aware of successful projects on learning general-purpose models such as neural nets in the full transparency setting. π 2 : R → ∆(A 2 ) diff : ∆(A 1 ) R × ∆(A 2 ) R → R 2 (diff(π 1 , π 2 )) 1 (diff(π 1 , π 2 )) 2 Mixed strategy 1 σ 1 ∈ ∆(A 1 ) Mixed strategy 2 σ 2 ∈ ∆(A 2 ) (b) Contributions In this paper we introduce a novel variant of program meta games called difference (diff) meta games that enables cooperation in equilibrium while also addressing obstacles 1-3. As in the program meta game, we imagine that two players each submit a program or policy to instruct an agent to play a given game, such as the Prisoner's Dilemma. The main idea is that before choosing an action, the agents are given information about how similar the two players' policies are to each w.r.t. how they make the present decision. We formally introduce this setup in Section 3. For an informal illustration, see Figure 1a . Because it requires a much lower degree of mutual transparency, we find the diff meta game setup more realistic than the full mutual transparency setting. Thus, it addresses Obstacle 1 to cooperation based on full mutual transparency. Diff meta games can still have cooperative equilibria when the underlying base game does not. Specifically, in Prisoner's Dilemma-like games, there are equilibria in which both players submit policies that cooperate with similar policies and thus with each other. We call this phenomenon similarity-based cooperation (SBC). For example, consider the Prisoner's Dilemma as given in Table 1 for G = 3. (We study such examples in more detail in Section 3.) Imagine that the players can only submit threshold policies that cooperate if and only if the perceived difference to the opponent is at most θ i . As a measure of difference, the policies observe diff(θ 1 , θ 2 ) = |θ 1 -θ 2 | + N, where N is sampled independently for each player according to the uniform distribution over [0, 1]. For instance, if Player 1 submits a threshold of 1 /2 and Player 2 submits a threshold of 3 /4, then the perceived difference is 1 /4 + N. Hence, Player 1 cooperates with probability P( 1 /4 + N ≤ 1 /2) = 1 /4 and Player 2 cooperates with probability P( 1 /4 + N ≤ 3 /4) = 1 /2. It turns out that (θ 1 = 1, θ 2 = 1), which leads to mutual cooperation with probability 1, is a Nash equilibrium of the meta game. Intuitively, the only way for either player to defect more is to lower their threshold. But then |θ 1 -θ 2 | will increase, which will cause the opponent to defect more (at a rate of 1 /2). This outweighs the benefit of defecting more oneself. In Section 4, we prove a folk theorem for diff meta games. Roughly speaking, this result shows that merely observing a diff value is sufficient for enabling all the cooperative outcomes that full mutual transparency enables. Specifically, we show that for every individually rational strategy profile σ (i.e., every strategy profile that is better for each player than their minimax payoff), there is a diff function such that σ is played in an equilibrium of the resulting diff meta game. Next, we address Obstacle 2 to full mutual transparency -the multiplicity of equilibria. First, note that any given measure of similarity will typically only enable a specific set of equilibria, much smaller than the set of individually rational strategy profiles. For instance, in the above example, all equilibria are symmetric. In general, one would hope that similarity-based cooperation will



R → ∆(A 1 )

Figure 1: (a) Illustration of the diff meta game of a Prisoner's Dilemma. (b) A graphical representation of diff meta games (Definition 1). Nodes with two incoming nodes are determined by applying one of the parent nodes to the other.

