ENTITY DIVIDER WITH LANGUAGE GROUNDING IN MULTI-AGENT REINFORCEMENT LEARNING

Abstract

We investigate the use of natural language to drive the generalization of policies in multi-agent settings. Unlike single-agent settings, the generalization of policies should also consider the influence of other agents. Besides, with the increasing number of entities in multi-agent settings, more agent-entity interactions are needed for language grounding, and the enormous search space could impede the learning process. Moreover, given a simple general instruction, e.g., beating all enemies, agents are required to decompose it into multiple subgoals and figure out the right one to focus on. Inspired by previous work, we try to address these issues at the entity level and propose a novel framework for language grounding in multi-agent reinforcement learning, entity divider (EnDi). EnDi enables agents to independently learn subgoal division at the entity level and act in the environment based on the associated entities. The subgoal division is regularized by agent modeling to avoid subgoal conflicts and promote coordinated strategies. Empirically, EnDi demonstrates the strong generalization ability to unseen games with new dynamics and expresses the superiority over existing methods.

1. INTRODUCTION

The generalization of reinforcement learning (RL) agents to new environments is challenging, even to environments slightly different from those seen during training (Finn et al., 2017) . Recently, language grounding has been proven to be an effective way to grant RL agents the generalization ability (Zhong et al., 2019; Hanjie et al., 2021) . By relating the dynamics of the environment with the text manual specifying the environment dynamics at the entity level, the language-based agent can adapt to new settings with unseen entities or dynamics. In addition, language-based RL provides a framework for enabling agents to reach user-specified goal states described by natural language (Küttler et al., 2020; Tellex et al., 2020; Branavan et al., 2012) . Language description can express abstract goals as sets of constraints on the states and drive generalization. However, in multi-agent settings, things could be different. First, the policies of others also affect the dynamics of the environment, while text manual does not provide such information. Therefore, the generalization of policies should also consider the influence of others. Second, with the increasing number of entities in multi-agent settings, so is the number of agent-entity interactions needed for language grounding. The enormous search space could impede the learning process. Third, sometimes it is unrealistic to give detailed instructions to tell exactly what to do for each agent. On the contrary, a simple goal instruction, e.g., beating all enemies or collecting all the treasuries, is more convenient and effective. Therefore, learning subgoal division and cultivating coordinated strategies based on one single general instruction is required. The key to generalization in previous works (Zhong et al., 2019; Hanjie et al., 2021) is grounding language to dynamics at the entity level. By doing so, agents can reason over the dynamic rules of all the entities in the environment. Since the dynamic of the entity is the basic component of the dynamics of the environment, such language grounding is invariant to a new distribution of dynamics or tasks, making the generalization more reliable. Inspired by this, in multi-agent settings, the influence of policies of others should also be reflected at the entity level for better generalization. In addition, after jointly grounding the text manual and the language-based goal (goal instruction) to environment entities, each entity has been associated with explicit dynamic rules and relationships

