DEXDEFORM: DEXTEROUS DEFORMABLE OBJECT MA-NIPULATION WITH HUMAN DEMONSTRATIONS AND DIFFERENTIABLE PHYSICS

Abstract

In this work, we aim to learn dexterous manipulation of deformable objects using multi-fingered hands. Reinforcement learning approaches for dexterous rigid object manipulation would struggle in this setting due to the complexity of physics interaction with deformable objects. At the same time, previous trajectory optimization approaches with differentiable physics for deformable manipulation would suffer from local optima caused by the explosion of contact modes from hand-object interactions. To address these challenges, we propose DexDeform, a principled framework that abstracts dexterous manipulation skills from human demonstration, and refines the learned skills with differentiable physics. Concretely, we first collect a small set of human demonstrations using teleoperation. And we then train a skill model using demonstrations for planning over action abstractions in imagination. To explore the goal space, we further apply augmentations to the existing deformable shapes in demonstrations and use a gradient optimizer to refine the actions planned by the skill model. Finally, we adopt the refined trajectories as new demonstrations for finetuning the skill model. To evaluate the effectiveness of our approach, we introduce a suite of six challenging dexterous deformable object manipulation tasks. Compared with baselines, DexDeform is able to better explore and generalize across novel goals unseen in the initial human demonstrations.

1. INTRODUCTION

The recent success of learning-based approaches for dexterous manipulation has been widely observed on tasks with rigid objects (OpenAI et al., 2020; Chen et al., 2022; Nagabandi et al., 2020) . However, a substantial portion of human dexterous manipulation skills comes from interactions with deformable objects (e.g., making bread, stuffing dumplings, and using sponges). Consider the three simplified variants of such interactions shown in Figure 1 . Folding in row 1 requires the cooperation of the front four fingers of a downward-facing hand to carefully lift and fold the dough. Bun in row 4 requires two hands to simultaneously pinch and push the wrapper. Row 3 shows Flip, an in-hand manipulation task that requires the fingers to flip the dough into the air and deform it with agility. In this paper, we consider the problem of deformable object manipulation with a simulated Shadow Dexterous hand (ShadowRobot, 2013) . The benefits of human-level dexterity can be seen through the lens of versatility (Feix et al., 2015; Chen et al., 2022) . When holding fingers together, the robot hands can function as a spatula to fold deformable objects (Fig. 1 , row 1). When pinching with fingertips, we can arrive at a stable grip on the object while manipulating the shape of the object (Fig. 1 , row 2). Using a spherical grasp, the robot hands are able to quickly squeeze the dough into a Figure 1 : We present a framework for learning dexterous manipulation of deformable objects, covering tasks with a single hand (Folding and Wrap, row 1-2), in-hand manipulation (Flip, row 3), and dual hands (Bun, Rope, Dumpling, row 4-6). Images in the rightmost column represent goals. folded shape (Fig. 1 , row 3). Therefore, it is necessary and critical to learn a manipulation policy that autonomously controls the robot hand with human-like dexterity, with the potential for adapting to various scenarios. Additionally, using a multi-fingered hand adds convenience to demonstration collection: (1) controlling deformable objects with hands is a natural choice for humans, resulting in an easy-to-adapt teleoperation pipeline. (2) there exists a vast amount of in-the-wild human videos for dexterous deformable object manipulation (e.g., building a sand castle, making bread). Vision-based teleoperation techniques can be employed for collecting demonstrations at scale (Sivakumar et al., 2022) . As with any dexterous manipulation task, the contact modes associated with such tasks are naturally complex. With the inclusion of soft bodies, additional difficulties arise with the tremendous growth in the dimension of the state space. Compared to the rigid-body counterparts, soft body dynamics carries infinite degrees of freedom (DoFs). Therefore, it remains challenging to reason over the complex transitions in the contact state between the fingers and the objects. Given the high dimensionality of the state space, the learning manipulation policy typically requires a large number of samples. With no or an insufficient amount of demonstrations, interactions with the environment are needed to improve the policy. Indeed, past works in dexterous manipulation have leveraged reinforcement learning (RL) approaches for this purpose (Rajeswaran et al., 2017; Chen et al., 2022) . However, the sample complexity of most RL algorithms becomes a limitation under the deformable object manipulation scenarios due to the large state space. Recent works have found trajectory optimization with the first-order gradient from a differentiable simulator to be an alternative solution for soft body manipulation (Huang et al., 2021; Li et al., 2022a; Lin et al., 2022) . However, the gradient-based optimizers are found to be sensitive to the initial conditions, such as contact points. It remains unclear how to leverage the efficiency of the gradient-based optimizer and overcome its sensitivity to initial conditions at the same time. In this work, we aim to learn dexterous manipulation of deformable objects using multi-fingered hands. To address the inherent challenges posed by the high dimensional state space, we propose DexDeform, a principled framework that abstracts dexterous manipulation skills from human demonstrations and refines the learned skills with differentiable physics. DexDeform consists of three components:

funding

https://sites.google.com/view/dexdeform 

