LEARNING CROSS-DOMAIN CORRESPONDENCE FOR CONTROL WITH DYNAMICS CYCLE-CONSISTENCY

Abstract

At the heart of many robotics problems is the challenge of learning correspondences across domains. For instance, imitation learning requires obtaining correspondence between humans and robots; sim-to-real requires correspondence between physics simulators and the real world; transfer learning requires correspondences between different robotics environments. This paper aims to learn correspondence across domains differing in representation (vision vs. internal state), physics parameters (mass and friction), and morphology (number of limbs). Importantly, correspondences are learned using unpaired and randomly collected data from the two domains. We propose dynamics cycles that align dynamic robot behavior across two domains using a cycle-consistency constraint. Once this correspondence is found, we can directly transfer the policy trained on one domain to the other, without needing any additional fine-tuning on the second domain. We perform experiments across a variety of problem domains, both in simulation and on real robot. Our framework is able to align uncalibrated monocular video of a real robot arm to dynamic state-action trajectories of a simulated arm without paired data. Video demonstrations of our results are available at: https://sjtuzq.github.io/cycle_dynamics.html.

1. INTRODUCTION

Humans have a remarkable ability to learn motor skills by mimicking behaviors of agents that look and act very differently from them. For example, developmental psychologists has shown that 18-month-old children are able to infer the intentions and imitate behaviors of adults (Meltzoff, 1995) . Imitation is not easy: children likely need to infer correspondences between their observations and their internal representations, which effectively aligns the two domains. Learning such a cross-domain correspondence is particularly valuable for robotics and control. For example, in imitation learning, if we want robots to imitate the motor skills of humans (or robots with different morphologies), we need to find the correspondence in both visual observations and morphology dynamics. Similarly, when transferring a policy trained in simulation to a real robot, we, again, need to align visual inputs and physics parameters across different environments. To align the skills across different domains, several prior approaches have proposed learning invariant feature representations across the domains (Gupta et al., 2017; Sermanet et al., 2018) . Policies or visual representations are trained to be invariant to the changes which are irrelevant to the downstream task, while maintaining useful information for cross-domain alignment. However, these methods require paired and aligned trajectories, usually collected by pre-trained policies or human labeling, which is often too expensive to collect for real-world learning problems. Additionally, invariance is a rather strong constraint, and might not be universally suitable. The reason is that different invariances might be beneficial for different downstream tasks, which has been recently studied in self-supervised visual representation learning (Tian et al., 2020) .

