MODEL-BASED NAVIGATION IN ENVIRONMENTS WITH NOVEL LAYOUTS USING ABSTRACT 2-D MAPS

Abstract

Efficiently training agents with planning capabilities has long been one of the major challenges in decision-making. In this work, we focus on zero-shot navigation ability on a given abstract 2-D occupancy map, like human navigation by reading a paper map, by treating it as an image. To learn this ability, we need to efficiently train an agent on environments with a small proportion of training maps and share knowledge effectively across the environments. We hypothesize that model-based navigation can better adapt agent's behaviors to a task, since it disentangles the variations in map layout and goal location and enables longer-term planning ability on novel locations compared to reactive policies. We propose to learn a hypermodel that can understand patterns from a limited number of abstract maps and goal locations, to maximize alignment between the hypermodel predictions and real trajectories to extract information from multi-task off-policy experiences, and to construct denser feedback for planners by n-step goal relabelling. We train our approach on DeepMind Lab environments with layouts from different maps, and demonstrate superior performance on zero-shot transfer to novel maps and goals.

1. INTRODUCTION

If we provide a rough solution of a problem to an agent, can the agent learn to follow the solution effectively? In this paper, we study this question within the context of maze navigation, where an agent is situated within a maze whose layout has never been seen before, and the agent is expected to navigate to a goal without first training on or even exploring this novel maze. This task may appear impossible without further guidance, but we will provide the agent with additional information: an abstract 2-D occupancy map illustrating the rough layout of the environment, as well as indicators of its start and goal locations ("task context" in Figure 1 ). This is akin to a tourist attempting to find a landmark in a new city: without any further help, this would be very challenging; but when equipped with a 2-D map with a "you are here" symbol and an indicator of the landmark, the tourist can easily plan a path to reach the landmark without needing to explore or train excessively. Navigation is a fundamental capability of all embodied agents, both artificial and natural, and therefore has been studied under many settings. In our case, we are most concerned with zero-shot navigation in novel environments, where the agent cannot perform further training or even exploration of the new environment; all that is needed to accomplish the task is technically provided by the abstract 2-D map. This differs from the vast set of approaches based on simultaneous localization and mapping (SLAM) typically used in robot navigation (Thrun et al., 2005) , where the agent can explore and build an accurate occupancy map of the environment prior to navigation. Recently, navigation approaches based on deep reinforcement learning (RL) approaches have also emerged, although they often require extensive training in the same environment (Mirowski et al., 2017; 2018) . Some deep RL approaches are even capable of navigating novel environments with new layouts without further training; however, these approaches typically learn the strategy of efficiently exploring the new environment to understand the layout and find the goal, then exploiting that knowledge for the remainder of the episode to repeatedly reach that goal quickly (Jaderberg et al., 2017) . In contrast, since the solution is essentially provided to the agent via the abstract 2-D map, we require a more stringent version of zero-shot navigation, where it should not explore the new environment; instead, we expect the agent to produce a near-optimal path in its first (and only) approach to the goal.

