EXAMPLE-BASED PLANNING VIA DUAL GRADIENT FIELDS Anonymous

Abstract

Path planning is one of the key abilities of an intelligent agent. However, both the learning-based and sample-based planners remain to require explicitly defining the task by manually designing the reward function or optimisation objectives, which limits the scope of implementation. Formulating the path planning problem from a new perspective, Example-based planning is to find the most efficient path to increase the likelihood of the target distribution by giving a set of target examples. In this work, we introduce Dual Gradient Fields (DualGFs), an offline-learning example-based planning framework built upon score matching. There are two gradient fields in DualGFs: a target gradient field that guides task completion and a support gradient field that ensures moving with physical constraints. In the learning process, instead of interacting with the environment, the agents are trained with two offline examples, i.e., the target gradients and support gradients are trained by target examples and support examples, respectively. The support examples are randomly sampled from free space, i.e., states without collisions. DualGF is a weighted mixture of the two fields, combining the merits of the two fields together. To update the mixing ratio adaptively, we further propose a fields-balancing mechanism based on Lagrangian-Relaxation. Experimental results across four tasks (navigation, tracking, particle rearrangement, and room rearrangement) demonstrate the scalability and effectiveness of our method. Our codes and demonstrations can be found at https://sites.google.com/view/dualgf.



In this paper, we consider a novel data-driven planning paradigm: Example-based planning, in which the user can specify the task by providing a set of target examples, rather than programming a taskspecific objective. Benefiting from such a paradigm, example-based planning can scale to various tasks, particularly tasks with implicit goals, i.e.specifying the task with a target distribution instead of a specific goal state. Besides, the agent needs to infer the environmental constraints to safely move in a physical world. Previous approaches either learn physical constraints from interacting with the



Figure 1: Our task setting. Left: The agent learns task specification from target examples and physical constraints from support examples during training. Right: The agent plans a path in novel conditions during the test phase. . Planning paths to reach a goal is a fundamental function of an intelligent agent (Russell, 2010) and has a wide range of real-world applications, such as navigation (Patle et al., 2019), object tracking (Zhong et al., 2019), and object rearrangement (King et al., 2016). Existing planning algorithms, whether samplingbased (LaValle et al., 1998a; Karaman & Frazzoli, 2011) or learning-based (Kulathunga, 2021; Yu et al., 2020; Tamar et al., 2016), need exhausted test-time sampling for searching a path or reward functions for learning. This severely limits the implementation scope of planning since many real-world tasks are hard to design the objectives/ reward with human priors, e.g., tidying up a house, or rearranging a desktop.

