MULTI-AGENT MULTI-GAME ENTITY TRANSFORMER

Abstract

Building large-scale generalist pre-trained models for many tasks is becoming an emerging and potential direction in reinforcement learning (RL). Research such as Gato and Multi-Game Decision Transformer have displayed outstanding performance and generalization capabilities on many games and domains. However, there exists a research blank about developing highly capable and generalist models in multi-agent RL (MARL), which can substantially accelerate progress towards general AI. To fill this gap, we propose Multi-Agent multi-Game ENtity TrAnsformer (MAGENTA) from the entity perspective as an orthogonal research to previous time-sequential modeling. Specifically, to deal with different state/observation spaces in different games, we analogize games as languages by aligning one single game to one single language, thus training different "tokenizers" and a shared transformer for various games. The feature inputs are split according to different entities and tokenized in the same continuous space. Then, two types of transformer-based model are proposed as permutation-invariant architectures to deal with various numbers of entities and capture the attention over different entities. MAGENTA is trained on Honor of Kings, Starcraft II micromanagement, and Neural MMO with a single set of transformer weights. Extensive experiments show that MAGENTA can play games across various categories with arbitrary numbers of agents and increase the efficiency of fine-tuning in new games and scenarios by 50%-100%. See our project page at https://sites.google.com/view/rl-magenta.

1. INTRODUCTION

In recent years, transformer-based models, as a solution to build large-scale generalist models, have made substantial progress in natural language processing (Brown et al., 2020; Devlin et al., 2018 ), computer vision (Dosovitskiy et al., 2020; Bao et al., 2021 ), graph learning Yun et al. (2019); Rong et al. (2020) , Furthermore, they are showing their potential in reinforcement learning (RL) (Reed et al., 2022; Lee et al., 2022; Wen et al., 2022) by modeling and solving sequential problems. However, there are few inherent challenges in building large-scale general RL agents. First, when the training environment and the test environment are the same, RL is inclined to overfit the training environments, while lacking generalizability to unknown environments. As a result, a model is often needed to be re-trained from scratch for a new task. Second, it is challenging for a single model to adapt for various environments with differences in the numbers of agents, states, observations, actions, and dynamics. Third, training from scratch normally suffers from expensive computational cost, especially for large-scale RL. For example, AlphaStar requires 16 TPUs to train for 14 days, and Honor of King (HoK) requires 19,600 core CPUs and 168 V100 GPUs to train for nearly half a month. Thus, building a general, reusable, and efficient RL model has been an increasingly important task for both industrial and non-industrial research. To this end, we investigate whether a single model, with a single set of parameters, can be trained by playing multiple multi-agent games in an online manner, which is a blank in current research after Gato (Reed et al., 2022) and MGDT (Lee et al., 2022) . We consider training on Honor of Kings (HoK), Starcraft II micromanagement (SMAC), and Neural MMO (NMMO), informally asking: Can models learn some general knowledge about games across various categories? In this paper, we answer this question by proposing Multi-Agent multi-Game ENtity TrAnsformer (MAGENTA). We consider this problem as a few-shot transfer learning with the hypothesis, where a 1

