RAPID TASK-SOLVING IN NOVEL ENVIRONMENTS

Abstract

We propose the challenge of rapid task-solving in novel environments (RTS), wherein an agent must solve a series of tasks as rapidly as possible in an unfamiliar environment. An effective RTS agent must balance between exploring the unfamiliar environment and solving its current task, all while building a model of the new environment over which it can plan when faced with later tasks. While modern deep RL agents exhibit some of these abilities in isolation, none are suitable for the full RTS challenge. To enable progress toward RTS, we introduce two challenge domains: (1) a minimal RTS challenge called the Memory&Planning Game and (2) One-Shot StreetLearn Navigation, which introduces scale and complexity from real-world data. We demonstrate that state-of-the-art deep RL agents fail at RTS in both domains, and that this failure is due to an inability to plan over gathered knowledge. We develop Episodic Planning Networks (EPNs) and show that deep-RL agents with EPNs excel at RTS, outperforming the nearest baseline by factors of 2-3 and learning to navigate held-out StreetLearn maps within a single episode. We show that EPNs learn to execute a value iteration-like planning algorithm and that they generalize to situations beyond their training experience.

1. INTRODUCTION

An ideal AI system would be useful immediately upon deployment in a new environment, and would become more useful as it gained experience there. Consider for example a household robot deployed in a new home. Ideally, the new owner could turn the robot on and ask it to get started, say, by cleaning the bathroom. The robot would use general knowledge about household layouts to find the bathroom and cleaning supplies. As it carried out this task, it would gather information for use in later tasks, noting for example where the clothes hampers are in the rooms it passes. When faced with its next task, say, doing the laundry, it would use its newfound knowledge of the hamper locations to efficiently collect the laundry. Humans make this kind of rapid task-solving in novel environments (RTS) look easy (Lake et al., 2017), but as yet it remains an aspiration for AI. Prominent deep RL systems display some of the key abilities required, namely, exploration and planning. But, they need many episodes over which to explore (Ecoffet et al., 2019; Badia et al., 2020) and to learn models for planning (Schrittwieser et al., 2019) . This is in part because they treat each new environment in isolation, relying on generic exploration and planning algorithms. We propose to overcome this limitation by treating RTS as a meta-reinforcement learning (RL) problem, where agents learn exploration policies and planning algorithms from a distribution over RTS challenges. Our contributions are to: 1. Develop two domains for studying meta-learned RTS: the minimal and interpretable Mem-ory&Planning Game and the scaled-up One-Shot StreetLearn. 2. Show that previous meta-RL agents fail at RTS because of limitations in their ability to plan using recently gathered information.

