CURRICULUM-BASED CO-DESIGN OF MORPHOLOGY AND CONTROL OF VOXEL-BASED SOFT ROBOTS

Abstract

Co-design of morphology and control of a Voxel-based Soft Robot (VSR) is challenging due to the notorious bi-level optimization. In this paper, we present a Curriculum-based Co-design (CuCo) method for learning to design and control VSRs through an easy-to-difficult process. Specifically, we expand the design space from a small size to the target size gradually through a predefined curriculum. At each learning stage of the curriculum, we use reinforcement learning to simultaneously train the design policy and the control policy, which is enabled by incorporating the design process into the environment and using differentiable policy representations. The converged morphology and the learned policies from last stage are inherited and then serve as the starting point for the next stage. In empirical studies, we show that CuCo is more efficient in creating larger robots with better performance by reusing the practical design and control patterns learned within each stage, in comparison to prior approaches that learn from scratch in the space of target size.

1. INTRODUCTION

The philosophy of embodied cognition (Pfeifer & Bongard, 2006; Pfeifer et al., 2014) inspires the domain of robotics that a robot's ability to interact with the environment depends both on its brain (control policy) and body (morphology), which are inherently coupled (Spielberg et al., 2019; Gupta et al., 2021) . However, finding an optimal robot morphology and its controller for solving a given task is often unfeasible. The major challenge for this endeavor is the enormous combined design and policy space. Firstly, the freedom to pick the number of multi-material modules and the ways they are connected makes it notoriously difficult to explore the design space (Medvet et al., 2022) . For instance, in a robot simulator (Liu et al., 2020) , there are over 4 × 10 8 possible morphologies for a robot composed of only 12 modules. Secondly, the evaluation of a morphology requires a separate training procedure for its unique controller. In this work, we consider the co-optimization of design and control of Voxel-based Soft Robots (VSRs) (Bhatia et al., 2021) , a form of modular soft robots composed of elastic, multi-material cubic blocks. Unlike fragile fully-integrated robots, they can be easily disassembled and reassembled to adapt to a wide range of environments (Shah et al., 2020; Pigozzi et al., 2022) . For efficiently exploring the modular robot design space, prior approaches commonly rely on artificial evolution (Sims, 1994; Cheney et al., 2013; Medvet et al., 2021) , which maintains a population of design prototypes and adopts a bi-level optimization loop, where the outer loop of morphology optimization is based on the fitness of individual controllers from the inner loop. These methods, however, tend to learn from scratch in the target design space where there is a significant combinatorial explosion. Thus, they spend a large amount of time on policy optimization and evaluation. Additionally, their separate training procedures significantly hinder the experience of design and control to be shared across different robots. In view of these challenges, we propose a Curriculum-based Co-design (CuCo) method for learning to design and control VSRs from easy to difficult (Figure 1 ). CuCo draws inspiration from Curriculum Learning (CL) that starting with easier tasks can help the agent learn better when presented with more difficult tasks later on (Bengio et al., 2009) . The key to our approach is expanding the design space from a small size to the target size gradually through a predefined curriculum. Precisely, at each stage of the curriculum, we learn practical design and control patterns via Reinforcement Learning (RL) (Schulman et al., 2017), which is enabled by incorporating the design process into the environment and using differentiable policy representations. The converged morphology and the learned policies from last stage are inherited and then serve as the starting point for the next stage. Due to the exploitation of the previous knowledge of design and control, CuCo can quickly produce robust morphologies with better performance in larger dimensional settings. To handle incompatible state-action spaces and make the control patterns transferable across various morphologies, we model the local observations of all voxels as a sequence and adopt the selfattention mechanism (Vaswani et al., 2017) in our control policy to capture the internal dependencies between voxels, which caters to the need for dynamically accommodating changes in morphologies. We also add material information to each voxel's observation, thus the control policy is conditioned on the robot's morphology, which eliminates the need for a population of prototypes and enables experience to be shared across different robots, bringing better sample efficiency and flexibility. For the design policy, we propose to utilize a Neural Cellular Automata (NCA) (Mordvintsev et al., 2020) , which takes multiple actions to grow a robot from an initial seed (morphology). NCA encodes complex patterns in a neural network and generates different developmental outcomes while using a smaller set of trainable parameters. In this paper, we make the following contributions: (1) We propose CuCo, a curriculum-based codesign approach for learning to design and control VSRs. (2) We demonstrate that the practical design and control patterns learned in small design spaces can be exploited to produce high-performing morphologies in larger dimensional settings. (3) We showcase that our NCA-based design policy and self-attention-based control policy can share design and control patterns across different sizes of morphologies, bringing better sample efficiency and flexibility. (4) Comprehensive experimental studies show that CuCo is more efficient in creating larger robots with better performance, in comparison to prior approaches that learn from scratch in the target design space.

2. RELATED WORK

Brain-body Co-design. Co-designing a robot to solve a given task has been challenging for robot designers for decades (Sims, 1994; Wang et al., 2019b; Nygaard et al., 2020; Medvet et al., 2021; Hejna et al., 2021; Ma et al., 2021) . Related works can be roughly divided into two categories, gradient-free and gradient-based. Most gradient-free methods rely on artificial evolution to explore



Figure1: Schematic of our approach CuCo. (A) We investigate the brain-body co-design problem of VSRs by establishing an easy-to-difficult optimization process between a small design space (e.g., 3 × 3) to the target space (e.g., 7 × 7). (B) In 3 × 3 design space, the design policy performs a finite number of steps to develop a VSR which is initiated by a single voxel. (C) A VSR is composed of 5 kinds of voxels.

