ABSTRACT-TO-EXECUTABLE TRAJECTORY TRANSLA-TION FOR ONE-SHOT TASK GENERALIZATION

Abstract

Training long-horizon robotic policies in complex physical environments is essential for many applications, such as robotic manipulation. However, learning a policy that can generalize to unseen tasks is challenging. In this work, we propose to achieve one-shot task generalization by decoupling plan generation and plan execution. Specifically, our method solves complex long-horizon tasks in three steps: build a paired abstract environment by simplifying geometry and physics, generate abstract trajectories, and solve the original task by an abstract-to-executable trajectory translator. In the abstract environment, complex dynamics such as physical manipulation are removed, making abstract trajectories easier to generate. However, this introduces a large domain gap between abstract trajectories and the actual executed trajectories as abstract trajectories lack low-level details and are not aligned frame-to-frame with the executed trajectory. In a manner reminiscent of language translation, our approach leverages a seq-to-seq model to overcome the large domain gap between the abstract and executable trajectories, enabling the low-level policy to follow the abstract trajectory. Experimental results on various unseen long-horizon tasks with different robot embodiments demonstrate the practicability of our methods to achieve one-shot task generalization. Videos and more details can be found in the supplementary materials and project page .

1. INTRODUCTION

Training long-horizon robotic policies in complex physical environments is important for robot learning. However, directly learning a policy that can generalize to unseen tasks is challenging for Reinforcement Learning (RL) based approaches (Yu et al., 2020; Savva et al., 2019; Shen et al., 2021; Mu et al., 2021) . The state/action spaces are usually high-dimensional, requiring many samples to learn policies for various tasks. One promising idea is to decouple plan generation and plan execution. In classical robotics, a high-level planner generates a abstract trajectory using symbolic planning with simpler state/action space than the original problem while a low-level agent executes the plan in an entirely physical environment Kaelbling & Lozano-Pérez (2013); Garrett et al. (2020b) . In our work, we promote the philosophy of abstract-to-executable via the learning-based approach. By providing robots with an abstract trajectory, robots can aim for one-shot task generalization. Instead of memorizing all the high-dimensional policies for different tasks, the robot can leverage the power of planning in the low-dimensional abstract space and focus on learning low-level executors. The two-level framework works well for classical robotics tasks like motion control for robot arms, where a motion planner generate a kinematics motion plan at a high level and a PID controller execute the plan step by step. However, such a decomposition and abstraction is not always trivial for more complex tasks. In general domains, it either requires expert knowledge (e.g., PDDL (Garrett et al., 2020b; a) ) to design this abstraction manually or enormous samples to distill suitable abstractions automatically (e.g., HRL (Bacon et al., 2017; Vezhnevets et al., 2017) ). We refer Abel (2022) for an in-depth investigation into this topic. On the other side, designing imperfect high-level agents whose state space does not precisely align with the low-level executor could be much easier and more flexible. High-level agents can be planners with abstract models and simplified dynamics in the simulator (by discarding some physical features, e.g., enabling a "magic" gripper Savva et al. ( 2019); Torabi et al. ( 2018)) or utilizing an existing "expert" agent such as humans or pre-trained agents on different manipulators. Though imperfect, their trajectories still contain meaningful information to guide the low-level execution of novel tasks. For example, different robots may share a similar procedure of reaching, grasping, and moving when manipulating a rigid box with different grasping poses. As a trade-off, executing their trajectories by the low-level executors becomes non-trivial. As will be shown by an example soon, there may not be a frame-to-frame correspondence between the abstract and the executable trajectories due to the mismatch. Sometimes the low-level agent needs to discover novel solutions by slightly deviating from the plan in order to follow the rest of the plan. Furthermore, the dynamics mismatch may require low-level agents to pay attention to the entire abstract trajectory and not just a part of it. To benefit from abstract trajectories without perfect alignment between high and low-level states, we propose TRajectory TRanslation (abbreviated as TR 2 ), a learning-based framework that can translate abstract trajectories into executable trajectories on unseen tasks at test time. The key feature of TR 2 is that we do not require frame-to-frame alignment between the abstract and the executable trajectories. Instead, we utilize a powerful sequence-to-sequence translation model inspired by machine translation (Sutskever et al., 2014; Bahdanau et al., 2014) to translate the abstract trajectories to executable actions even when there is a significant domain gap. This process is naturally reminiscent of language translation, which is well solved by seq-to-seq models. We illustrate the idea in a simple Box Pusher task as shown in Fig. 1 . The black agent needs to push the green target box to the blue goal position. We design the high-level agent as a point mass which can magically attract the green box to move along with it. For the high-level agent, it is easy to generate an abstract trajectory by either motion planning or heuristic methods. As TR 2 does not have strict constraints over the high-level agent, we can train TR 2 to translate the abstract trajectory, which includes the waypoints to the target, into a physically feasible trajectory. Our TR 2 framework learns to translate the magical abstract trajectory to a strategy to move around the box and push the box to the correct direction, which closes the domain gap between the high-and low-level agents. Our contributions are: (1) We provide a practical solution to learn policies for long-horizon complex robotic tasks in three steps: build a paired abstract environment (e.g., by using a point mass with magical grasping as the high-level agent), generate abstract trajectories, and solve the original task with abstract-to-executable trajectory translation. (2) The seq-to-seq models, specifically the transformer-based auto-regressive model (Vaswani et al., 2017; Chen et al., 2021; Parisotto et al., 2020) , free us from the restriction of strict alignment between abstract and executable trajectories, providing additional flexibility in high-level agent design, abstract trajectory generation and helps bridge the domain gap. (3) The combination of the abstract trajectory and transformer enables TR 2 to solve unseen long-horizon tasks. By evaluating our method on a navigation-based task and three manipulation tasks, we find that our agent achieves strong one-shot generalization to new tasks, while being robust to intentional interventions or mistakes via re-planning. Our method is evaluated on various tasks and environments with different embodiments. In all experiments, the method shows great improvements over baselines. We also perform real-world experiments on the Block Stacking task to verify the capability to handle noise on a real robot system. Please refer to the anonymous project page for more visualizations.

2. RELATED WORKS

One-Shot Imitation Learning Recent studies (Duan et al., 2017; Finn et al., 2017; Pathak et al., 2018; Yu et al., 2018; Zhou et al., 2019; Yu et al., 2018; Lynch & Sermanet, 2020; Stepputtis et al., 2020) have shown that it is feasible to teach a robot new skills using only a single demonstration, akin to how humans acquire a wide variety of abilities with a single demonstration. To achieve one-shot



Figure 1: Task in Box Pusher: move the green target box to the blue goal position. The arrows in map show how the agents move.

