DIFFERENTIABLE SPATIAL PLANNING USING TRANSFORMERS

Abstract

We consider the problem of spatial path planning. In contrast to the classical solutions which optimize a new plan from scratch and assume access to the full map with ground truth obstacle locations, we learn a planner from the data in a differentiable manner that allows us to leverage statistical regularities from past data. We propose Spatial Planning Transformers (SPT), which given an obstacle map learns to generate actions by planning over long-range spatial dependencies, unlike prior data-driven planners that propagate information locally via convolutional structure in an iterative manner. In the setting where the ground truth map is not known to the agent, we leverage pre-trained SPTs to in an end-to-end framework that has the structure of mapper and planner built into it which allows seamless generalization to out-of-distribution maps and goals. SPTs outperform prior stateof-the-art across all the setups for both manipulation and navigation tasks, leading to an absolute improvement of 7-19%.

1. INTRODUCTION

! " ! # Why learn to plan (explicitly)? • More effective than • reactive CNNs • implicit planning LSTMs • Differentiable: can learn perception end-to-end using only action supervision Our objective is to develop methods that can learn to plan from data. However, a natural question is why do we need learning for a problem which has stable classical solutions? There are two key reasons. First, classical methods do not capture statistical regularities present in the natural world, (for e.g., walls are mostly parallel or perpendicular to each other), because they optimize a plan from scratch for each new setup. This also makes analytical planning methods to be often slow at inference time which is an issue in dynamic scenarios where a more reactive policy might be required for fast adaptation from failures. A learned planner represented via a neural network can not only capture regularities but is also efficient at inference as the plan is just a result of forward-pass through the network. Second, a critical assumption of classical algorithms is that a global ground-truth obstacle space must be known to the agent ahead of time. This is in stark contrast to biological agents where cognitive maps are not pixel-accurate ground truth location of agents, but built through actions in the environment, e.g., rats build an implicit map of the environment incrementally through trajectories enabling them to take shortcuts (Tolman, 1948) . A learned solution could not only provides the ability to deal with partial, noisy maps and but also help build maps on the fly while acting in the environment by backpropagating through the generated long-range plans. 1



Figure 1: Spatial Path Planning: The raw observations (top left) and obstacles can be represented spatially via top-down map in navigation (left) and via configuration space in manipulation (right).The problem of path planning has been a bedrock of robotics. Given an obstacle map of an environment and a goal location in the map, the task is to output a shortest path to the goal location starting from any position in the map. We consider path planning with spatial maps. Building a top-down spatial map is common practice in robotic navigation as it provides a natural representation of physical space(Durrant-Whyte & Bailey, 2006). In fact, even robotic manipulation can also be naturally phrased via spatial map using the formalism of configuration spaces(Lozano-Perez, 1990), as shown in Figure 1. This problem has been studied in robotics for several decades, and classic goto planning algorithms involve Dijkstra et al. (1959), PRM (Kavraki et al., 1996), RRT (LaValle & Kuffner Jr, 2001), RRT* (Karaman & Frazzoli, 2011), etc.

