CURRICULUM-BASED CO-DESIGN OF MORPHOLOGY AND CONTROL OF VOXEL-BASED SOFT ROBOTS

Abstract

Co-design of morphology and control of a Voxel-based Soft Robot (VSR) is challenging due to the notorious bi-level optimization. In this paper, we present a Curriculum-based Co-design (CuCo) method for learning to design and control VSRs through an easy-to-difficult process. Specifically, we expand the design space from a small size to the target size gradually through a predefined curriculum. At each learning stage of the curriculum, we use reinforcement learning to simultaneously train the design policy and the control policy, which is enabled by incorporating the design process into the environment and using differentiable policy representations. The converged morphology and the learned policies from last stage are inherited and then serve as the starting point for the next stage. In empirical studies, we show that CuCo is more efficient in creating larger robots with better performance by reusing the practical design and control patterns learned within each stage, in comparison to prior approaches that learn from scratch in the space of target size.

1. INTRODUCTION

The philosophy of embodied cognition (Pfeifer & Bongard, 2006; Pfeifer et al., 2014) inspires the domain of robotics that a robot's ability to interact with the environment depends both on its brain (control policy) and body (morphology), which are inherently coupled (Spielberg et al., 2019; Gupta et al., 2021) . However, finding an optimal robot morphology and its controller for solving a given task is often unfeasible. The major challenge for this endeavor is the enormous combined design and policy space. Firstly, the freedom to pick the number of multi-material modules and the ways they are connected makes it notoriously difficult to explore the design space (Medvet et al., 2022) . For instance, in a robot simulator (Liu et al., 2020) , there are over 4 × 10 8 possible morphologies for a robot composed of only 12 modules. Secondly, the evaluation of a morphology requires a separate training procedure for its unique controller. In this work, we consider the co-optimization of design and control of Voxel-based Soft Robots (VSRs) (Bhatia et al., 2021) , a form of modular soft robots composed of elastic, multi-material cubic blocks. Unlike fragile fully-integrated robots, they can be easily disassembled and reassembled to adapt to a wide range of environments (Shah et al., 2020; Pigozzi et al., 2022) . For efficiently exploring the modular robot design space, prior approaches commonly rely on artificial evolution (Sims, 1994; Cheney et al., 2013; Medvet et al., 2021) , which maintains a population of design prototypes and adopts a bi-level optimization loop, where the outer loop of morphology optimization is based on the fitness of individual controllers from the inner loop. These methods, however, tend to learn from scratch in the target design space where there is a significant combinatorial explosion. Thus, they spend a large amount of time on policy optimization and evaluation. Additionally, their separate training procedures significantly hinder the experience of design and control to be shared across different robots.

