A RISK-AVERSE EQUILIBRIUM FOR MULTI-AGENT SYSTEMS Anonymous

Abstract

In multi-agent systems, intelligent agents are tasked with making decisions that lead to optimal outcomes when actions of the other agents are as expected, whilst also being prepared for their unexpected behaviour. In this work, we introduce a novel risk-averse solution concept that allows the learner to accommodate low probability actions by finding the strategy with minimum variance, given any level of expected utility. We first prove the existence of such a risk-averse equilibrium, and propose one fictitious-play type learning algorithm for smaller games that enjoys provable convergence guarantees in games classes including zero-sum and potential. Furthermore, we propose an approximation method for larger games based on iterative population-based training that generates a population of riskaverse agents. Empirically, our equilibrium is shown to be able to reduce the utility variance, specifically in the sense that other agents' low probability behaviour is better accounted for by our equilibrium in comparison to playing other solutions. Importantly, we show that our population of agents that approximate a risk-averse equilibrium is particularly effective against unseen opposing populations, especially in the case of guaranteeing a minimum level of performance, which is critical to safety-aware multi-agent systems.

1. INTRODUCTION

Game Theory (GT) has become an important analytical tool in solving Machine Learning (ML) problems; the idea of "gamification" has become popular in recent years (Wellman, 2006; Lanctot et al., 2017) particularly in multi-agent systems research. The importance of risk-aversion in the single-agent decision making literature (Zhang et al., 2020; Mihatsch & Neuneier, 2002; Chow et al., 2017) is obvious, whilst there still exist many open questions in the current game theory research domain. This paper aims to add to the current research in the multi-agent strategic decision-making literature based on the notion of risk-aversion through the lens of a new equilibrium concept. One reason that risk-aversion is important is that multi-agent interaction is rife with strategic uncertainty; this is because performance doesn't solely depend on ones own action. It is rarely the case that one will have certainty over the execution and the strategy of the opponent in situations ranging from board games to economic negotiations (Calford, 2020) . This presents a dilemma for autonomous decision-makers in human-AI interaction as one can no longer rely on perfect execution or complete strategy knowledge. Therefore, an important issue is what happens when actors take dangerous low probability actions such that could be considered as mistakes. These issues in play can arise in an array of circumstances, from misunderstandings of reward structures to execution fatigue, leading to the execution of an unexpected pure strategy. Hedging against unexpected play is important for the agents as otherwise it can lead to large costs. As demonstrated in Fig. (1), a mistake in the execution of the pure-strategy Nash equilibrium (NE) could lead to both cars overtaking and crashing into each other, a negative yet critical outcome in multi-agent system. Traditional equilibrium solutions in GT (e.g. NE, Trembling Hand Perfect Equilibrium (THPE) (Bielefeld, 1988) ) lack the ability to handle this style of risk as either: 1) they assume strategies are executed perfectly, and/or, 2) large costs may be undervalued if there is a low probability attached to them. We address these by introducing a new framework for studying risk in multi-agent systems through mean-variance analysis. In our framework, strategies are evaluated both in terms of expected utility against the opponent, but also the potential utility variance if the opponent played

Stay in Lane

5, 5 0, 20 Overtake 20, 0 -50, -50

Risk Averse Equilibrium

Pure Strategy Nash Equilibrium Figure 1 : Cars are rewarded for reaching their destination. They are behind slow tractors but can stay in their lanes and arrive safely, but slowly. They can overtake to arrive quickly, but if the other also overtakes they will crash, leading to large negative payoffs. The Overtake strategy is high-risk, high-reward and susceptible to errors, and is selected under a Nash equilibrium. The Stay in Lane strategy is low-risk, low-reward with low variance and selected by our mean-variance RAE approach. low probability pure strategies. For example, the driving example in Fig. (1) describes a simple scenario where, due to the critical nature of wanting to avoid crashing, the benefits of overtaking may be entirely redundant with the possibility of low probability play leading to crashes. We summarise the contributions of our paper here: 1. We introduce a novel risk-averse equilibrium (RAE) based on mean-variance components of the available strategies. Our framework generalises the single-agent mean-variance decision framework to multi-agent settings. 2. We show that the RAE always exists in finite games, and that it is solvable in the class of games with the fictitious-play property. This, as we later show, unlocks a powerful array of computational methods for solving games. 3. We demonstrate that: 1) RAE is able to locate a minimum variance solution for any given level of utility 2) A by-product of RAE is that it can be used as a Nash equilibrium selection tool in the presence of a "risk-dominant" equilibrium 3) RAE is able to find a low risk strategy in a safety-sensitive autonomous driving environment.

2. RELATED WORK

There exists three relevant bodies of work, those works that empirically study the presence of riskaversion in humans, those that aim to develop new equilibrium frameworks and those that study how risk-averse agents alters classical game-theoretic results. On the empirical side, the first paper to show that humans prefer to bet on known probability devices, rather than on other human choices, suggesting strategic uncertainty avoidance (Camerer & Karjalainen, 1994) . Bohnet & Zeckhauser (2004) similarly found that subjects are more trusting in an objective randomisation device rather than other humans. Eichberger et al. (2008) found that more trust is placed in game theorists than "grannies" as the latter is a source of strategic ambiguity. Similar practices are noted in the game setting which more closely model multi-agent interactions, especially in the form of ambiguity aversion, such as those games outside of 3-color Ellsberg Urn tasks (Kelsey & Le Roux, 2015) , public goods and weakest link games (Kelsey & Le Roux, 2017), or in the presence of strategic complements and strategic substitutes (Kelsey & le Roux, 2018) . For an extensive survey of the experimental evidence, we refer readers to (Harrison & Rutström, 2008) . The equilibrium literature can be divided into three distinct sections. Harsanyi et al. (1988) introduced risk-dominant Nash equilibria (NE) (Nash, 1951) , which suggests that increasing levels of strategic ambiguity will lead to the equilibrium with the lowest deviation losses. Risk dominance is limited as it is restricted to the set of NE strategies, and therefore may be risk-dominant in comparison to other NE but not particularly risk-averse at all. Bielefeld (1988) set out the THPE which deals with strategic risk by accounting for off-equilibrium play. However, this is sensitive to strictly dominated strategies and, because all trembles happen with marginal probability large downside risk can be masked, (In Fig. 1 an error probability of 0.01 would only impact the utility function by 0.5, whereas we later show that our variance solution values the error at 47.6γ, where γ > 0 and non-marginal), which is problematic for safety-sensitive systems (e.g., autonomous driving). McKelvey & Palfrey (1995) utilises the Quantal Response Equilibrium (QRE) to introduce errors into strategy selection, but with lower percentages on big mistakes which also discounts the impact of large downside risk. We argue that QRE undervalues big costs which are particularly damaging in real-world settings, whereas

