CURRICULUM REINFORCEMENT LEARNING VIA MORPHOLOGY-ENVIRONMENT CO-EVOLUTION

Abstract

Throughout long history, natural species have learned to survive by evolving their physical structures adaptive to the environment changes. In contrast, current reinforcement learning (RL) studies mainly focus on training an agent with a fixed morphology (e.g., skeletal structure and joint attributes) in a fixed environment, which can hardly generalize to changing environments or new tasks. In this paper, we optimize an RL agent and its morphology through "morphology-environment co-evolution (MECE)", in which the morphology keeps being updated to adapt to the changing environment, while the environment is modified progressively to bring new challenges and stimulate the improvement of the morphology. This leads to a curriculum to train generalizable RL, whose morphology and policy are optimized for different environments. Instead of hand-crafting the curriculum, we train two policies to automatically change the morphology and the environment. To this end, (1) we develop two novel and effective rewards for the two policies, which are solely based on the learning dynamics of the RL agent; (2) we design a scheduler to automatically determine when to change the environment and the morphology. We find these two designs are critical to the success of MECE, as verified by extensive ablation studies. In experiments on two classes of tasks, the morphology and RL policies trained via MECE exhibit significantly better generalization performance in unseen test environments than SOTA morphology optimization methods. Our ablation studies on the two MECE policies further show that the co-evolution between the morphology and environment is the key to the success.

1. INTRODUCTION

Deep Reinforcement learning (RL) has achieved unprecedented success in some challenging tasks (Lillicrap et al., 2016; Mnih et al., 2015) . Although current RL can excel on a specified task in a fixed environment through massing training, it usually struggles to generalize to unseen tasks and/or adapt to new environments. A promising strategy to overcome this problem is to train the agent on multiple tasks in different environments (Wang et al., 2019a; Portelas et al., 2019; Gur et al., 2021; Jaderberg et al., 2017) via multi-task learning or meta-learning (Salimans et al., 2017; Finn et al., 2017) . However, it increases the training cost and the space for possible environments/tasks can be too large to be fully explored by RL. So how to select the most informative and representative environments/tasks to train an RL agent to evolve generalizable skills becomes an critical open challenge. Curriculum learning (Narvekar et al., 2020) for RL aims at developing a sequence of tasks for RL to progressively improve its generalization performance through multiple training stages. However, the curriculum usually relies more on human heuristics, e.g., moving from easy environments to hard ones, chasing the ones with the greatest progress, while lacking sufficient innate incentives from the agent itself to drive the changes towards improving the learning process. Unlike RL agents that do not actively seek new environments to improve their learning capability, natural species have full motivations to do so to survive in the competitive world, and one underlying mechanism to drive them is evolution. Evolution is a race with the changing environment for every species, and a primary goal is to accelerate its adaptation to new environments. Besides merely optimizing its control policy, evolution more notably changes the morphology of species, i.e., the skeletal structure and the attributes for each part, in order to make them adapt to the environment. In fact, the improvement on morphology can be more critical because there could exist a variety of actions or skills an agent cannot do (no matter how the control policy is optimized) without certain structures, e.g., more than one leg, a long-enough limb, or a 360-degree rotation joint. For RL, we claim that A good morphology should improve the agent's adaptiveness and versatility, i.e., learning faster and making more progress in different environments. From prehistoric person to modern Homo sapiens, there is a definite association between the rise of civilization and the Homo sapiens'

