HOW TO AVOID BEING EATEN BY A GRUE: STRUCTURED EXPLORATION STRATEGIES FOR TEXTUAL WORLDS

Abstract

Text-based games are long puzzles or quests, characterized by a sequence of sparse and potentially deceptive rewards. They provide an ideal platform to develop agents that perceive and act upon the world using a combinatorially sized natural language state-action space. Standard Reinforcement Learning agents are poorly equipped to effectively explore such spaces and often struggle to overcome bottlenecks-states that agents are unable to pass through simply because they do not see the right action sequence enough times to be sufficiently reinforced. We introduce Q*BERT , an agent that learns to build a knowledge graph of the world by answering questions, which leads to greater sample efficiency. To overcome bottlenecks, we further introduce MC!Q*BERT an agent that uses an knowledge-graph-based intrinsic motivation to detect bottlenecks and a novel exploration strategy to efficiently learn a chain of policy modules to overcome them. We present an ablation study and results demonstrating how our method outperforms the current state-of-the-art on nine text games, including the popular game, Zork, where, for the first time, a learning agent gets past the bottleneck where the player is eaten by a Grue.

1. INTRODUCTION

Text-adventure games such as Zork1 (Anderson et al., 1979) (Fig. 1 ) are simulations featuring language-based state and action spaces. Prior game playing works have focused on a few challenges that are inherent to this medium: (1) Partial observability the agent must reason about the world solely through incomplete textual descriptions (Narasimhan et al., 2015; Côté et al., 2018; Ammanabrolu & Riedl, 2019b) . (2) Commonsense reasoning to enable the agent to more intelligently interact with objects in its surroundings (Fulda et al., 2017; Yin & May, 2019; Adolphs & Hofmann, 2019; Ammanabrolu & Riedl, 2019a) . (3) A combinatorial state-action space wherein most games have action spaces exceeding a billion possible actions per step; for example the game Zork1 has 1.64 × 10 14 possible actions at every step (Hausknecht et al., 2020; Ammanabrolu & Hausknecht, 2020) . Despite these challenges, modern text-adventure agents such as KG-A2C (Ammanabrolu & Hausknecht, 2020), TDQN (Hausknecht et al., 2020), and DRRN (He et al., 2016) have relied on surprisingly simple exploration strategies such as -greedy or sampling from the distribution of possible actions. Most text-adventure games have relatively linear plots in which players must solve a sequence of puzzles to advance the story and gain score. To solve these puzzles, players have freedom to a explore both new areas and previously unlocked areas of the game, collect clues, and acquire tools needed to solve the next puzzle and unlock the next portion of the game. From a Reinforcement Learning perspective, these puzzles can be viewed as bottlenecks that act as partitions between different regions of the state space. We contend that existing Reinforcement Learning agents that are unaware of such latent structure and are thus poorly equipped for solving these types of problems. In this paper we introduce two new agents: Q*BERT and MC!Q*BERT, both designed with this latent structure in mind. The first agent, Q*BERT, improves on existing text-game agents that use knowledge graph-based state representations by framing knowledge graph construction during exploration as a question-answering task. To train Q*BERT's knowledge graph extractor, we introduce the Jericho-QA dataset for question-answering in text-games. We show that it leads to improved knowledge graph

