MEMORY GYM: PARTIALLY OBSERVABLE CHALLENGES TO MEMORY-BASED AGENTS

Abstract

Memory Gym is a novel benchmark for challenging Deep Reinforcement Learning agents to memorize events across long sequences, be robust to noise, and generalize. It consists of the partially observable 2D and discrete control environments Mortar Mayhem, Mystery Path, and Searing Spotlights. These environments are believed to be unsolvable by memory-less agents because they feature strong dependencies on memory and frequent agent-memory interactions. Empirical results based on Proximal Policy Optimization (PPO) and Gated Recurrent Unit (GRU) underline the strong memory dependency of the contributed environments. The hardness of these environments can be smoothly scaled, while different levels of difficulty (some of them unsolved yet) emerge for Mortar Mayhem and Mystery Path. Surprisingly, Searing Spotlights poses a tremendous challenge to GRU-PPO, which remains an open puzzle. Even though the randomly moving spotlights reveal parts of the environment's ground truth, environmental ablations hint that these pose a severe perturbation to agents that leverage recurrent model architectures as their memory.

1. INTRODUCTION

Memory is a vital mechanism of intelligent living beings to make favorable decisions sequentially under imperfect information and uncertainty. One's immediate sensory perception may not suffice if information from past events cannot be recalled. Reasoning, imagination, planning, and learning are skills that may become unattainable. When developing autonomously learning decision-making agents, the agent's memory mechanism is required to maintain a representation of former observations to ground its next decision. Adding memory mechanisms as a recurrent neural network (Werbos, 1990) or a transformer (Vaswani et al., 2017) led to successfully learned policies in both virtual and real-world tasks. For instance, Deep Reinforcement Learning (DRL) methods master complex video games such as StarCraft II (Vinyals et al., 2019), and DotA 2 (Berner et al., 2019) . Examples of successes in real-world problems are dexterous in-hand manipulation (Andrychowicz et al., 2020) and controlling tokamak plasmas (Degrave et al., 2022) . In addition to leveraging memory, these tasks require vast amounts of computation resources and additional methods (e.g. domain randomization, incorporating domain knowledge, ect.), which make them undesirable for solely benchmarking an agent's ability to interact with its memory meaningfully. We propose Memory Gym as a novel and open source benchmark consisting of three unique environments: Mortar Mayhem, Mystery Path, and Searing Spotlights. These environments challenge memory-based agents to memorize events across long sequences, generalize, be robust to noise, and be sample efficient. By accomplishing the desiderata that we define in this work, we believe that Memory Gym has the potential to complement existing benchmarks and therefore accelerate the development of DRL agents leveraging memory. All three environments feature visual observations, discrete action spaces, and are notably not solvable without memory. This allows users to assure early on whether their developed memory mechanism is working or not. To fit the problem of sequential decision-making, agents have to frequently leverage their memory to solve the posed tasks by Memory Gym. Several related environments ask the agent only to memorize initial cues, which require infrequent agent-memory interactions. Our environments are smoothly configurable. This

availability

https: //github.com/MarcoMeter/drl-memory-gym/ 

