FEINT IN MULTI-PLAYER GAMES

Abstract

This paper introduces the first formalization, implementation and quantitative evaluation of Feint in Multi-Player Games. Our work first formalizes Feint from the perspective of Multi-Player Games, in terms of the temporal, spatial and their collective impacts. The formalization is built upon Non-transitive Active Markov Game Model, where Feint can have a considerable amount of impacts. Then, our work considers practical implementation details of Feint in Multi-Player Games, under the state-of-the-art progress of multi-agent modeling to date (namely Multi-Agent Reinforcement Learning). Finally, our work quantitatively examines the effectiveness of our design, and the results show that our design of Feint can (1) greatly improve the reward gains from the game; (2) significantly improve the diversity of Multi-Player Games; and (3) only incur negligible overheads in terms of time consumption. We conclude that our design of Feint is effective and practical, to make Multi-Player Games more interesting.

1. INTRODUCTION

Game simulations, which only use Markov Game Model (Filar (1976) ) or its variants (Wampler et al. (2010) ; Kim et al. (2022) ), breed the needs for the diversity and the randomness to improve the game experiences. The trends of evolving more details into simulated games demand: ➊ the need for non-transitivity (i.e. there are no dominant gaming strategies), which allow players to dynamically change game strategies. In this way, the newly-incorporated strategies can maintain a high level of the diversity, which guarantee a high extent of unexploitability (Liu et al. (2021) ); and ➋ the strict requirements on temporal impacts (and its implications on spatial and collective impacts), since modern game simulations are highly time-sensitive ( Nota & Thomas (2020)). Therefore, new optimizations on these game models are expected to be elegant and easy-to-implement, to preserve the original spirits of these games. Our work first builds upon representative examples from the above two trends, by unifying two stateof-the-art progress of Multi-Player Games: ➊ we use Unified Behavioral and Response Diversity (described in Liu et al. ( 2021)), which exploits non-transitivity (i.e. no single dominant strategy in many complex games), to highlight the importance of the diversity in game policies. Moreover, we address the issue from their work, which fails to consider the intensity and future impacts from complex interactions among agents; and ➋ we incorporate Long-Term Behavior Learning (described in Kim et al. ( 2022)), which proposes Active Markov Game Model to emphasize the convoluted future impacts from complex interactions among agents. Based on the above two results, we unify them as a new model called Non-transitive Active Markov Game Model (NTAMGM), and use it throughout this work. This unification satisfies the need for a game model where (A) agents have intense and time-critical interactions; and (B) the design space of game policies is highly diverse. The definition of NTAMGM is described below. • Non-transitive Active Markov Game Model: We define a K-agent Non-transitive Active Markov Game Model as a tuple ⟨K, S, A, P, R, Θ, U ⟩: K = {1, ..., k} is the set of k agents; S is the state space; A = {A i } K i=1 is the set of action space for each agent, where there are no dominant actions; P performs state transitions of current state by agents' actions: P : S ×A 1 ×A 2 ×...×A K → P (S), where P (S) denotes the set of probability distribution over state space S; R = {R i } K i=1 is the set of reward functions for each agent; Θ = {Θ i } K i=1 is the set of policy parameters for each agent; and U = {U i } K i=1 is the set of policy update functions for each agent.  )), and our works begins by addressing the limitations of the derived version (denoted as the basic formalization of Feint) from these works. We find that: the basic formalization of Feint overlooks the complexity of potential impacts in Multi-Player Games, and therefore can not be generalized for Multi-Player Games. To this end, we deliver the first comprehensive formalization of Feint, by separating the complex impacts into ➊ the temporal dimension; ➋ the spatial dimension; and ➌ the collective impacts from these two dimensions. We also show that how the above components of our formalization can be synergistically put together. Based on the proposed formalization, we clear the implementation roadmap, under both Inference Learning and Reinforcement Learning models, to justify the applicability of our proposed formalization. To properly examine the benefits of our method, we first extensively build two complex scenarios, using Multi-Agent Deep Deterministic Policy Gradient (MADDPG Lowe et al. ( 2017)) and Multi-Agent Actor-attention Critic (MAAC Iqbal & Sha ( 2019)), with six agents in total. Then, we implement our formalization upon these two extensively-engineered scenarios. Our quantitative evaluations show that our formalization and implementations have great potential in practice. We first show that our work can make the game more interesting, via the following two metrics: for the Diversity Gain, our method can increase the exploitation of the search space by 1.98X, measured by the Exploitability metric; and for Gaming Reward Gain, our method can achieve 1.90X and 2.86X gains, when using MADDPG and MAAC respectively. We then show that our method only incur negligible overheads, by using per-episode execution time as the metric: our method only introduces less than 5% more for the time consumption. We conclude that our design of Feint is effective and practical, to make Multi-Player Games more interesting.

2.1. EXISTING MARL MODELS

Multi-Agent Reinforcement Learning (MARL) aims to learn optimal policies for agents in a multiagent environment, which consists of various agent-agent and agent-environment interactions. 



Based on the above assumption of Multi-Player Games, our goal is to incorporate Feint, a set of actions to mislead opponents, for strategic advantages in Multi-Player Games. Prior works simply incorporate Feint in the context ofTwo-Player Games (e.g. Wampler et al. (2010); Won et al.  (

Player Games. Only a limited amount of works tackle this issue.Wampler et al.  (2010)  is an early example to incorporate Feint as a proof-of-concept, which focuses on constructing animations for nuanced game strategies for more unpredictability from NPCs. More recently, Won et al. (2021a) uses a set of pre-defined Feint actions for the animation, which further serves under an optimized version of control strategy based on Online Reinforcement Learning (i.e. in animating combat scenes). However, these prior works (1) solely focus on Two-Player Games, which can not be effectively generalized to multi-player scenarios; and (2) lack an comprehensive exploration of potential implications from Feint actions in game strategies.

