DIAGNOSING AND EXPLOITING THE COMPUTATIONAL DEMANDS OF VIDEOS GAMES FOR DEEP REINFORCE-MENT LEARNING

Abstract

Humans learn by interacting with their environments and perceiving the outcomes of their actions. A landmark in artificial intelligence has been the development of deep reinforcement learning (dRL) algorithms capable of doing the same in video games, on par with or better than humans. However, it remains unclear whether the successes of dRL models reflect advances in visual representation learning, the effectiveness of reinforcement learning algorithms at discovering better policies, or both. To address this question, we introduce the Learning Challenge Diagnosticator (LCD), a tool that separately measures the perceptual and reinforcement learning demands of a task. We use LCD to discover a novel taxonomy of challenges in the Procgen benchmark, and demonstrate that these predictions are both highly reliable and can instruct algorithmic development. More broadly, the LCD reveals multiple failure cases that can occur when optimizing dRL algorithms over entire video game benchmarks like Procgen, and provides a pathway towards more efficient progress.

1. INTRODUCTION

Gibson famously argued that "The function of vision is not to solve the inverse problem and reconstruct a veridical description of the physical world. [... It] is to keep perceivers in contact with behaviorally relevant properties of the world they inhabit" (reviewed in Warren 2021). The field of deep reinforcement learning (dRL) has followed Gibson's tenet since the seminal introduction of deep Q-networks (DQN) (Mnih et al., 2015) . DQNs rely on reward feedback to train their policies and perceptual systems at the same time to learn to play games from a tabula rasa. This end-to-end approach of training on individual environments and tasks has supported steady progress in the field of dRL, and newer reinforcement learning algorithms have yielded agents that achieve human or super-human performance in a variety of challenges -from Chess to Go and from Atari games to Starcraft (Mnih et al., 2015; Silver et al., 2017; 2018; Vinyals et al., 2019) . But Gibson also argued that the ecological niche of animals allows them to exploit task-agnostic mechanisms to simplify the perceptual or behavioral demands of important tasks, like how humans rely on optic flow for navigation (Warren, 2021). In the decades since Gibson's writings, it has been found that humans can efficiently find or learn bespoke perceptual features that aid performance on a single task (Li et al., 2004; Scott et al., 2007; Roelfsema et al., 2010; Emberson, 2017) , or they can exploit previously learned generalist representations and task abstractions that are useful across multiple tasks and environments (Wiesel & Hubel, 1963; Watanabe et al., 2001; Emberson, 2017; Lehnert et al., 2020; O'Reilly, 2001) . While there have been attempts at building similarly flexible dRL agents through meta-reinforcement learning (Frans et al., 2018; Xu et al., 2018; 2020; Houthooft et al., 2018; Gupta et al., 2018; Chelu et al., 2020; Pong et al., 2021) , these approaches ignore the complexities of perceptual learning and carry large computational burdens that limit them to simplistic scenarios. There is a pressing need for approaches to training dRL agents that can meet the computational demands of a wide variety of environments and tasks. One way to build generalist agents is to first reliably diagnose where the computational challenges of a given environment and task lie and adjust the agent to those demands. Is the perceptual challenge onerous? Is the reward signal for credit assignment especially sparse? Even partial answers to these questions are instructive for improving an agent, for instance, by pre-determining the extent to which 1

