D3C: REDUCING THE PRICE OF ANARCHY IN MULTI-AGENT LEARNING

Abstract

Even in simple multi-agent systems, fixed incentives can lead to outcomes that are poor for the group and each individual agent. We propose a method, D3C, for online adjustment of agent incentives that reduces the loss incurred at a Nash equilibrium. Agents adjust their incentives by learning to mix their incentive with that of other agents, until a compromise is reached in a distributed fashion. We show that D3C improves outcomes for each agent and the group as a whole on several social dilemmas including a traffic network with Braess's paradox, a prisoner's dilemma, and several reinforcement learning domains.

1. INTRODUCTION

We consider a setting composed of multiple interacting artificially intelligent agents. These agents will be instantiated by humans, corporations, or machines with specific individual incentives. However, it is well known that the interactions between individual agent goals can lead to inefficiencies at the group level, for example, in environments exhibiting social dilemmas (Braess, 1968; Hardin, 1968; Leibo et al., 2017) . In order to resolve these inefficiencies, agents must reach a compromise. Any arbitration mechanism that leverages a central coordinatorfoot_0 faces challenges when attempting to scale to large populations. The coordinator's task becomes intractable as it must both query preferences from a larger population and make a decision accounting for the exponential growth of agent interactions. If agents or their designers are permitted to modify their incentives over time, the principal must collect all this information again, exacerbating the computational burden. A central coordinator represents a single point of failure for the system whereas one motivation for multi-agent systems research inspired by nature (e.g., humans, ants, the body, etc.) is robustness to node failures (Edelman and Gally, 2001) . Therefore, we focus on decentralized approaches. A trivial form of decentralized compromise is to require every agent to minimize group loss (maximize welfare). Leaving the optimization problem aside, this removes inefficiency, but similar to a mechanism with a central coordinator, requires communicating all goals between all agents, an expensive step and one with real consequences for existing distributed systems like wireless sensor networks (Kulkarni et al., 2010) where transmitting a signal saps a node's energy budget. There is also the obvious issue that this compromise may not appeal to an individual agent, especially one that is expected to trade its low-loss state for a higher average group loss. One additional, more subtle consequence of optimizing group loss is that it cannot distinguish between behaviors in environments with a group loss that is constant sum, for instance, in zero-sum games. But zero-sum games have rich structure to which we would like agents to respond. Electing a team leader (or voting on a decision) implies one candidate (decision) wins while another loses. Imagine two agents differ on their binary preference with each trying to minimize their probability of losing. A group loss is indifferent; we prefer the agents play the game (and in this, case argue their points). Design Criteria: We seek an approach to compromise in multi-agent systems that applies to the setting just described. The celebrated Myerson-Satterthwaite theorem (Arrow, 1970; Satterthwaite, 1975; Green and Laffont, 1977; Myerson and Satterthwaite, 1983) states that no mechanism exists that simultaneously achieves optimal efficiency (welfare-maximizing behavior), budget-balance (no taxing agents and burning side-payments), appeals to rational individuals (individuals want to opt-in to the mechanism), and is incentive compatible (resulting behavior is a Nash equilibrium). Given



For example, the VCG mechanism (Clarke, 1971).1

