TASK-AGNOSTIC MORPHOLOGY EVOLUTION

Abstract

Deep reinforcement learning primarily focuses on learning behavior, usually overlooking the fact that an agent's function is largely determined by form. So, how should one go about finding a morphology fit for solving tasks in a given environment? Current approaches that co-adapt morphology and behavior use a specific task's reward as a signal for morphology optimization. However, this often requires expensive policy optimization and results in task-dependent morphologies that are not built to generalize. In this work, we propose a new approach, Task-Agnostic Morphology Evolution (TAME), to alleviate both of these issues. Without any task or reward specification, TAME evolves morphologies by only applying randomly sampled action primitives on a population of agents. This is accomplished using an information-theoretic objective that efficiently ranks agents by their ability to reach diverse states in the environment and the causality of their actions. Finally, we empirically demonstrate that across 2D, 3D, and manipulation environments TAME can evolve morphologies that match the multi-task performance of those learned with task supervised algorithms. Our code and videos can be found at https://sites.google.com/view/task-agnostic-evolution.

1. INTRODUCTION

Recently, deep reinforcement learning has shown impressive success in continuous control problems across a wide range of environments (Schulman et al., 2017; Barth-Maron et al., 2018; Haarnoja et al., 2018) . The performance of these algorithms is usually measured via the reward achieved by a pre-specified morphology on a pre-specified task. Arguably, such a setting where both the morphology and the task are fixed limits the expressiveness of behavior learning. Biological agents, on the other hand, both adapt their morphology (through evolution) and are simultaneously able to solve a multitude of tasks. This is because an agent's performance is intertwined with its morphology as morphology fundamentally endows an agent with the ability to act. But how should one design morphologies that are performative across tasks? Recent works have approached morphology design using alternating optimization schemes (Hazard et al., 2018; Wang et al., 2019; Luck et al., 2020) . Here, one step evaluates the performance of morphologies through behavior optimization while the second step improves the morphology design typically through gradient-free optimization. It thus follows that the final morphology's quality will depend directly on the quality of learned behavior as inadequate policy learning will result in a noisy signal to the morphology learner. This begs the question: is behavior learning a necessary crutch upon which morphology optimization should stand? Unfortunately, behavior learning across a multitude of tasks is both difficult and expensive and hence a precise evaluation of each new candidate morphology requires explicit policy training. As a result, current research on morphology optimization primarily focuses on improving morphology for just one task (Wang et al., 2019; Ha, 2019) . By exploiting task-specific signals, learned morphologies demonstrate impressive performance but provide no guarantees of success outside of the portion of the environment covered by the given task. This is at odds with biological morphologies that are usually able to complete many tasks within their environment. Fundamentally, we want agents that are generalists, not specialists and as such, we seek to shift the paradigm of morphology optimization to multi-task environments. One obvious solution to this issue would be to just learn multiple behaviors in the behavior-learning step. However, such an approach has two challenges. First, multi-task RL is both algorithmically and computationally inhibiting, and hence in itself is an active area of research 1

