DIFFERENTIABLE SPATIAL PLANNING USING TRANSFORMERS

Abstract

We consider the problem of spatial path planning. In contrast to the classical solutions which optimize a new plan from scratch and assume access to the full map with ground truth obstacle locations, we learn a planner from the data in a differentiable manner that allows us to leverage statistical regularities from past data. We propose Spatial Planning Transformers (SPT), which given an obstacle map learns to generate actions by planning over long-range spatial dependencies, unlike prior data-driven planners that propagate information locally via convolutional structure in an iterative manner. In the setting where the ground truth map is not known to the agent, we leverage pre-trained SPTs to in an end-to-end framework that has the structure of mapper and planner built into it which allows seamless generalization to out-of-distribution maps and goals. SPTs outperform prior stateof-the-art across all the setups for both manipulation and navigation tasks, leading to an absolute improvement of 7-19%.

1. INTRODUCTION

! " ! # Why learn to plan (explicitly)? • More effective than • reactive CNNs • implicit planning LSTMs • Differentiable: can learn perception end-to-end using only action supervision The problem of path planning has been a bedrock of robotics. Given an obstacle map of an environment and a goal location in the map, the task is to output a shortest path to the goal location starting from any position in the map. We consider path planning with spatial maps. Building a top-down spatial map is common practice in robotic navigation as it provides a natural representation of physical space (Durrant-Whyte & Bailey, 2006) . In fact, even robotic manipulation can also be naturally phrased via spatial map using the formalism of configuration spaces (Lozano-Perez, 1990), as shown in Our objective is to develop methods that can learn to plan from data. However, a natural question is why do we need learning for a problem which has stable classical solutions? There are two key reasons. First, classical methods do not capture statistical regularities present in the natural world, (for e.g., walls are mostly parallel or perpendicular to each other), because they optimize a plan from scratch for each new setup. This also makes analytical planning methods to be often slow at inference time which is an issue in dynamic scenarios where a more reactive policy might be required for fast adaptation from failures. A learned planner represented via a neural network can not only capture regularities but is also efficient at inference as the plan is just a result of forward-pass through the network. Second, a critical assumption of classical algorithms is that a global ground-truth obstacle space must be known to the agent ahead of time. This is in stark contrast to biological agents where cognitive maps are not pixel-accurate ground truth location of agents, but built through actions in the environment, e.g., rats build an implicit map of the environment incrementally through trajectories enabling them to take shortcuts (Tolman, 1948) . A learned solution could not only provides the ability to deal with partial, noisy maps and but also help build maps on the fly while acting in the environment by backpropagating through the generated long-range plans. In theory, however, the optimal paths can be computed much more efficiently with total iterations that are on the order of number of obstacles rather than the map size. For instance, consider two points with no obstacle between, an efficient planner could directly connect them with interpolated distance. Nonetheless, this is possible only if the model can perform long-range reasoning in the obstacle space which is a challenge. In this work, our goal is to capture this long-range spatial relationship. Transformers (Vaswani et al., 2017) are well suited for this kind of computation as they treat the inputs as sets and propagate information across all the points within the set. Building on this, we propose Spatial Planning Transformers (SPT) which consists of attention heads that can attend to any part of the input. The key idea behind the design of the proposed model is that value can be propagated between distant points if there are no obstacles between them. This would reduce the number of required iterations to O(n O ) where n O is the number of obstacles in the map. Figure 2 shows a simple example where long-distance value propagation can cover the entire map within 3 iterations while local value propagation takes more than 5 iterations -this difference grows with the complexity of the obstacle space and map size. We compare the performance of SPTs with prior state-of-the-art learned planning approaches, VIN (Tamar et al., 2016) and GPPN (Lee et al., 2018) , across both navigation as well as manipulation setups. SPTs achieve significantly higher accuracy than these prior methods for the same inference time and show over 10% absolute improvement when the maps are large. Next, we turn to the case when the map is not known apriori. This is a practical setting when the agent either has access to a partially known map or just know it through the trajectories. In psychology, this is known as going from route knowledge to survey knowledege (Golledge et al., 1995) where animals aggregate the knowledge from trajectories into a cognitive map. We operationalize this setup by formulating an end-to-end differentiable framework, which in contrast to having a generic parametric policy learning (Glasmachers, 2017), has the structure of mapper and planner built into it. We first pre-train the SPT planner to capture a generic data-driven prior, and then backpropagate through it to learn a mapper that maps raw observations to obstacle map. This allows us to learn without requiring map supervision or interaction. Learned mapper and planner not only allow us to plan for new goal locations at inference but also generalize to unseen maps. Our end-to-end mapping and planning approach provides a unified solution for both navigation and manipulation. We perform thorough experiments in both manipulation and real-world navigation maps as well as manipulation. Our approach outperforms prior state-of-the-art by a margin on both mapping and planning accuracy without assuming access to the map at training or inference.



Figure 1: Spatial Path Planning: The raw observations (top left) and obstacles can be represented spatially via top-down map in navigation (left) and via configuration space in manipulation (right).

Figure 1. This problem has been studied in robotics for several decades, and classic goto planning algorithms involve Dijkstra et al. (1959), PRM (Kavraki et al., 1996), RRT (LaValle & Kuffner Jr, 2001), RRT* (Karaman & Frazzoli, 2011), etc.

Figure 2: Local vs Long-distance value propagation. Figure showing an example of number of iterations required to propagate distance values over a map using local and long-distance value propagation. The obstacle map and goal location shown on the left and the distance value predictions over 5 iterations is shown on the right (distance values increase from blue to yellow). Prior methods based on convolutional networks use local value propagation and require many iterations to propagate values accurately over the whole map (top right).Our method is based on long-distance value propagation between points without any obstacle between them. This type of value propagation can cover the whole map in 3 iterations in this example (bottom right).

