DIFFMIMIC: EFFICIENT MOTION MIMICKING WITH DIFFERENTIABLE PHYSICS

Abstract

Motion mimicking is a foundational task in physics-based character animation. However, most existing motion mimicking methods are built upon reinforcement learning (RL) and suffer from heavy reward engineering, high variance, and slow convergence with hard explorations. Specifically, they usually take tens of hours or even days of training to mimic a simple motion sequence, resulting in poor scalability. In this work, we leverage differentiable physics simulators (DPS) and propose an efficient motion mimicking method dubbed DiffMimic. Our key insight is that DPS casts a complex policy learning task to a much simpler state matching problem. In particular, DPS learns a stable policy by analytical gradients with ground-truth physical priors hence leading to significantly faster and stabler convergence than RL-based methods. Moreover, to escape from local optima, we utilize an Demonstration Replay mechanism to enable stable gradient backpropagation in a long horizon. Extensive experiments on standard benchmarks show that DiffMimic has a better sample efficiency and time efficiency than existing methods (e.g., DeepMimic). Notably, DiffMimic allows a physically simulated character to learn Backflip after 10 minutes of training and be able to cycle it after 3 hours of training, while the existing approach may require about a day of training to cycle Backflip. More importantly, we hope DiffMimic can benefit more differentiable animation systems with techniques like differentiable clothes simulation in future research. 1 

1. INTRODUCTION

Motion mimicking aims to find a policy to generate control signals for recovering demonstrated motion trajectories, which plays a fundamental role in physics-based character animation, and also serves as a prerequisite for many applications such as control stylization and skill composition. Although tremendous progress in motion mimicking has been witnessed in recent years, existing methods (Peng et al., 2018a; 2021) mostly adopt reinforcement learning (RL) schemes, which require alternatively learning a reward function and a control policy. Consequently, RL-based methods often take tens of hours or even days to imitate one single motion sequence, making their scalability notoriously challenging. In addition, RL-based motion mimicking highly relies on the quality of its designed (Peng et al., 2018a) or learned (Peng et al., 2021) reward functions, which further burdens its generalization for complex real-world applications. Recently, differential physics simulator (DPS) has achieved impressive results in many research fields, such as robot control (Xu et al., 2022) and graphics (Li et al., 2022) . Specifically, DPS treats physics operators as differentiable computational graphs, and therefore gradients from objectives (i.e., rewards) can be directly propagated through the environment dynamics to control policy functions. Instead of alternatively learning between reward functions and control policies, the control policy learning tasks can be resolved in a straightforward and efficient optimization manner with



Our code is available at https://github.com/diffmimic/diffmimic. Qualitative results can be viewed at https://diffmimic-demo-main-g7h0i8.streamlitapp.com/.1

