PRINCIPAL TRADE-OFF ANALYSIS

Abstract

The focus on equilibrium solutions in games underemphasizes the importance of understanding their overall structure. A different set of tools is needed for learning and representing the general structure of a game. In this paper we illustrate "Principle Trade-off Analysis" (PTA), a decomposition method that embeds games into a low dimensional feature space and argue that the embeddings are more revealing than previously demonstrated. Here, we develop an analogy to Principal Component Analysis (PCA). PTA represents an arbitrary two-player zero-sum game as the weighted sum of pairs of orthogonal 2D feature planes. We show that each of the feature planes represent unique strategic trade-offs (cyclic modes) and truncation of the sequence provides insightful model reduction. We demonstrate the validity of PTA on a pair of games (Blotto, Pokemon). In Blotto, PTA identifies game symmetries, and specifies strategic trade-offs associated with distinct win conditions. These symmetries reveal limitations of PTA unaddressed in previous work. For Pokemon, PTA recovers clusters that naturally correspond to Pokemon types, correctly identifies the designed tradeoff between those types, and discovers a rock-paper-scissor (RPS) cycle in the Pokemon generation type -all absent any specific information except game outcomes.

1. INTRODUCTION

In recent years algorithms have achieved superhuman performance in a number of complex games such as Chess, Go, Shogi, Poker and Starcraft (Silver et al., 2018; Heinrich & Silver, 2016; Moravčík et al., 2017; Vinyals et al., 2019) . Despite impressive game play, enhanced understanding of the game is typically only achieved by additional analysis of the algorithms game play post facto (Silver, 2018) . Current work emphasizes the "policy problem", developing strong agents, despite growing demand for a task theory which addresses the "problem problem", i.e. what games are worth study and play (Omidshafiei et al., 2020; Clune, 2019) . A task theory requires a language that characterizes and categorizes games, namely, a toolset of measures and visualization techniques that evaluate and illustrate game structure. Summary visuals and measures are especially important for complex games where direct analysis is intractable. In this vain tournaments are used to sample the game and to empirically evaluate agents. The empirical analysis of tournaments has a long history, in sports analytics (Lewis, 2004; Bozóki et al., 2016) , ecology and animal behavior (Laird & Schamp, 2006; Silk, 1999) , and biology (Stuart-Fox et al., 2006; Sinervo & Lively, 1996) . While the primary interest in these cases is typically in ranking agents/players, tournament graphs also reveal significant information about the nature of the game being played (Tuyls et al., 2018) . This paper describes mathematical techniques for extracting useful information about the underlying game structure directly from tournament data. While these methods can be applied to the various contexts in which tournaments are already employed in machine learning (e.g., population based training), they open up a range of new research questions regarding the characterization of natural games, synthesis of artificial games (c.f. Omidshafiei et al. ( 2020)), game approximation via simplified dynamics, and the strategic perturbation of games. Fine structural characteristics of a tournament graph can be represented by low dimensional embeddings that map competitive relationships to embedded geometry. We review and expand on methods introduced by Balduzzi et al. ( 2018b), who proposed a canonical series of maps that provide a complete description of a sample tournament in terms of a sum of simple games, namely, disc games. PTA provides a simplified global understanding of a tournament compatible with a broad set of objectives beyond finding equilibrium solutions. Note that our objectives are as empirical as they are game theoretic. Empirical game theory, the study of games from actual game play data (e.g. sports analytics), studies games as they are played by a particular population, rather than by an idealised player. Thus, empirical game theory has its own, valid, objectives beyond finding equilibria or optimal players. Exclusive focus on optima ignores the global structure of a game as it is experienced by the majority of players. What decision dilemmas do they face? What game dynamics do they experience? What game space must they navigate in the process of optimization? How should they exploit a chosen opponent, population, or form teams? All of these questions are more easily addressed given a simplified global representation that isolates each important independent aspect of a game. PTA offers such a summary. Our contribution follows. First, we compare PCA (Pearson, 1901) to disc game embedding, and show that disc game embeddings inherit key algebraic properties responsible for the success of PCA. Based on this analogy, we propose PTA as a general technique for visualizing data arising from competitive tasks or pairwise choice tasks. Indeed, while we focus on games for their charisma, any data set representing a skew-symmetric comparison of objects is amenable to PTA. Via a series of examples, we show that PTA provides a much richer framework for analyzing trade-offs in games than previously demonstrated. Our examples exhibit a wide variety of strategic trade-offs that can be clearly visualized with PTA. Unlike previous work, we focus on the relation between embedding coordinates, which represent performance relations, and underlying agent attributes in order to elucidate the principal trade-offs responsible for cyclic competition in each game. Moreover, we consider the full information content of PTA by analyzing multiple leading disc games and by studying the decay in their importance. Important strategic trade-offs can arise in later disc games, so previous empirical work's focus on the leading disc game is myopic. These examples also raise conceptual limitations not addressed in previous work, thus outline future directions for development.

2. RELATED WORK

Our work builds directly on Balduzzi et al. (2018b) , which used the embedding approach to introduce a comprehensive agent evaluation scheme. Their scheme uses the real Schur form (PTA) in conjunction with the Hodge decomposition to overcome deficiencies in standard ranking models. Our work also compliments efforts to explore cyclic structures in competitive systems (Candogan et al., 2011; Strang et al., 2022b ), economics (Linares, 2009; May, 1954) , and tangentially as multiclass classification problems (Bilmes et al., 2001; Huang et al., 2006) . Cycles challenge traditional gradient methods and can slow training (Omidshafiei et al., 2020; Balduzzi et al., 2018a) . Moreover, cyclic structures in games are often intricate and difficult to disentangle, particularly among intermediate competitors. Games of skill frequently exhibit this "spinning top geometry" (Czarnecki et al., 2020) . By summarizing cyclic structures, PTA helps identify areas of the strategy space that cause difficulty during training, or should be targeted for diverse team design (Balduzzi et al., 2019; Garnelo et al., 2021) . Here, we show that PTA can identify fundamental trade-offs that summarize otherwise opaque cyclic structure. Trade-offs play an important role in decision tasks and evolutionary processes outside of games, so general tools that isolate and reify trade-offs are of generic utility (Omidshafiei et al., 2020; Tuyls et al., 2018) . In that sense, our attempt to visualize game structure is in line with generic data visualization efforts, which aim to convert complicated data into elucidating graphics (c.f. Healy (2018); Garnelo et al. ( 2021)).

3.1. FUNCTIONAL FORM GAMES

A two-player zero-sum functional form game, is defined by an attribute space Ω ⊆ R T and an evaluation function f that returns the advantage of one agent over another given their attributes. Agents in the game can be represented by their attribute vectors x, y ∈ Ω, the entries of which could represent agent traits, strategic policies, weights in a neural net governing their actions, or more generally, any attributes that influence competitive behavior. The function f is of the form f : Ω × Ω → R. The value f (x, y), quantifies the advantage of agent x over y with a real number. The evaluation function must be fair, that is, the advantage of one competitor over another should not depend on the order they are listed in. Consequently, f must be skew symmetric, f (x, y) = -f (y, x) (Strang et al., 2022b) . If f (x, y) > 0 we say that x beats y and the outcome is a tie if

