LEARNING SIMULTANEOUS NAVIGATION AND CON-STRUCTION IN GRID WORLDS

Abstract

We propose to study a new learning task, mobile construction, to enable an agent to build designed structures in 1/2/3D grid worlds while navigating in the same evolving environments. Unlike existing robot learning tasks such as visual navigation and object manipulation, this task is challenging because of the interdependence between accurate localization and strategic construction planning. In pursuit of generic and adaptive solutions to this partially observable Markov decision process (POMDP) based on deep reinforcement learning (RL), we design a Deep Recurrent Q-Network (DRQN) with explicit recurrent position estimation in this dynamic grid world. Our extensive experiments show that pre-training this position estimation module before Q-learning can significantly improve the construction performance measured by the intersection-over-union score, achieving the best results in our benchmark of various baselines including model-free and model-based RL, a handcrafted SLAM-based policy, and human players.

1. INTRODUCTION

Intelligent agents, from animal architects (e.g., mound-building termites and burrowing rodents) to human beings, can simultaneously build structures while navigating inside such a dynamically evolving environment, revealing robust and coordinated spatial skills like localization, mapping, and planning. Can we create artificial intelligence (AI) to perform similar mobile construction tasks? To handcraft such an AI using existing robotics techniques is difficult and non-trivial. A fundamental challenge is the tight interdependence of robot localization and long-term planning for environment modification. If GPS and techniques alike are not available (often due to occlusions), robots have to rely on simultaneous localization and mapping (SLAM) or structure from motion (SfM) for pose estimation. But mobile construction violates the basic static-environment assumption in classic visual SLAM methods, and even challenges SfM methods designed for dynamic scenes (Saputra et al., 2018) . Thus, we need to tackle the interdependence challenge to strategically modify the environment while efficiently updating a memory of the evolving structure in order to perform accurate localization and construction, as shown in Figure 1 . Deep reinforcement learning (DRL) offers another possibility, especially given its recent success in game playing and robot control. Can deep networks learn a generic and adaptive policy that controls the AI to build calculated structures as temporary localization landmarks which eventually evolve into the designed one? To answer this question, we design an efficient simulation environment with a series of mobile construction tasks in 1/2/3D grid worlds. This reasonably simplifies the environment dynamics and sensing models while keeping the tasks nontrivial, and allows us to focus on the aforementioned interdependence challenge before advancing to other real-world complexities. To show the tasks in grid worlds are still non-trivial and challenging, we benchmark the performance of several baselines, including human players, a handcrafted policy with rudimentary SLAM and planning, some model-free DRL algorithms which have achieved state-of-the-art performance in other learning tasks (see Table 1 for comparisons), and a model-based DRL using Deep Q-Networks (DQN) augmented with Monte Carlo tree search (MCTS) as planning mechanism. Although our Figure 1 : Challenge in mobile construction. An AI needs to navigate in an environment (square area) and build a structure according to a design (stacked cubes in a/e). The AI in (a-d) builds poorly because it learns localization and construction jointly from raw observations. Specifically, similar structures seen in (c) confuse the AI due to the wrong localization. In contrast, a better construction is achieved in (e-h) using a pre-trained localization even in this non-static environment. Loc Plan Env-Mod Env-Eval Robot Manipulation (Fan et al., 2018; Yang et al., 2019; Labbé et al., 2020; Li et al., 2020) Robot Locomotion (Duan et al., 2016) Visual Navigation (Zhu et al., 2017; Gupta et al., 2017; Mo et al., 2018; Zeng et al., 2020) Atari Table 1 : Mobile construction vs. existing learning tasks. Loc: robot localization. Plan: long-term planning. Env-Mod: environment structure modification. Env-Eval: evaluation of the accuracy of environment modifications. This shows the novelty of the task with fundamentally different features than typically benchmarked tasks, requiring joint efforts of localization, planning, and learning. tasks may seem similar to other grid/pixelized tasks such as Atari games (Mnih et al., 2013) , the results reveal the significantly worse performance of those baseline algorithms, especially modelfree DRL methods, than the human baseline. A recent study (Stooke et al., 2021) found that decoupling representation learning from RL policy learning can outperform the joint learning of the two in standard RL algorithms. Inspired by this, we propose to pre-train an explicit position estimation module using recurrent neural networks in the above DRL baselines. Our experiment results show that this proposed method outperforms other RL baselines. In summary, our contributions include: • a suite of novel and easily extensible learning tasks focusing on the interdependent localization and planning problem, which are released as open-source fast Gym environments; • a comprehensive benchmark of baseline methods, which demonstrates the learning challenge in these tasks; • an effective approach which combines DRQN with an explicit position estimation deep network, outperforming other baselines; • a detailed ablation study providing insights about the causes of the challenge, which could inspire future algorithms to solve the problem more effectively.

2. RELATED WORKS

RL baselines. With the great success of model-free RL methods in game-playing (Mnih et al., 2013; Silver et al., 2016) and robot control (Cheng et al., 2019; Zhang et al., 2015) , we consider a family



(Mnih et al., 2013) / Minecraft (Oh et al., 2016; Guss et al., 2019; Platanios et al., 2020) / First-Person-Shooting (Lample & Chaplot, 2017) Real-Time Strategy Games (Synnaeve et al., 2016; Jaderberg et al., 2019) Physical Reasoning Bapst et al. (2019); Bakhtin et al. (2019) Mobile Construction (Ours)

availability

https://ai4ce.github.io/SNAC/.

