TASK-AGNOSTIC MORPHOLOGY EVOLUTION

Abstract

Deep reinforcement learning primarily focuses on learning behavior, usually overlooking the fact that an agent's function is largely determined by form. So, how should one go about finding a morphology fit for solving tasks in a given environment? Current approaches that co-adapt morphology and behavior use a specific task's reward as a signal for morphology optimization. However, this often requires expensive policy optimization and results in task-dependent morphologies that are not built to generalize. In this work, we propose a new approach, Task-Agnostic Morphology Evolution (TAME), to alleviate both of these issues. Without any task or reward specification, TAME evolves morphologies by only applying randomly sampled action primitives on a population of agents. This is accomplished using an information-theoretic objective that efficiently ranks agents by their ability to reach diverse states in the environment and the causality of their actions. Finally, we empirically demonstrate that across 2D, 3D, and manipulation environments TAME can evolve morphologies that match the multi-task performance of those learned with task supervised algorithms. Our code and videos can be found at https://sites.google.com/view/task-agnostic-evolution.

1. INTRODUCTION

Recently, deep reinforcement learning has shown impressive success in continuous control problems across a wide range of environments (Schulman et al., 2017; Barth-Maron et al., 2018; Haarnoja et al., 2018) . The performance of these algorithms is usually measured via the reward achieved by a pre-specified morphology on a pre-specified task. Arguably, such a setting where both the morphology and the task are fixed limits the expressiveness of behavior learning. Biological agents, on the other hand, both adapt their morphology (through evolution) and are simultaneously able to solve a multitude of tasks. This is because an agent's performance is intertwined with its morphology as morphology fundamentally endows an agent with the ability to act. But how should one design morphologies that are performative across tasks? Recent works have approached morphology design using alternating optimization schemes (Hazard et al., 2018; Wang et al., 2019; Luck et al., 2020) . Here, one step evaluates the performance of morphologies through behavior optimization while the second step improves the morphology design typically through gradient-free optimization. It thus follows that the final morphology's quality will depend directly on the quality of learned behavior as inadequate policy learning will result in a noisy signal to the morphology learner. This begs the question: is behavior learning a necessary crutch upon which morphology optimization should stand? Unfortunately, behavior learning across a multitude of tasks is both difficult and expensive and hence a precise evaluation of each new candidate morphology requires explicit policy training. As a result, current research on morphology optimization primarily focuses on improving morphology for just one task (Wang et al., 2019; Ha, 2019) . By exploiting task-specific signals, learned morphologies demonstrate impressive performance but provide no guarantees of success outside of the portion of the environment covered by the given task. This is at odds with biological morphologies that are usually able to complete many tasks within their environment. Fundamentally, we want agents that are generalists, not specialists and as such, we seek to shift the paradigm of morphology optimization to multi-task environments. One obvious solution to this issue would be to just learn multiple behaviors in the behavior-learning step. However, such an approach has two challenges. First, multi-task RL is both algorithmically and computationally inhibiting, and hence in itself is an active area of research Predictable, but not diverse. Diverse, but not predictable Predictable and diverse. Figure 1 . Visual motivation of our information theoretic objective. We seek to evolve morphologies that can easily explore a large number of states and remain predicable while doing so. ( Fu et al., 2016; Yu et al., 2020) . Second, it is unrealistic to assume that we can enumerate all the tasks we would want an agent to perform before its inception. In this work, we propose a framework for morphology design without the requirements of behavior learning or task specification. Instead, inspired by contemporary work in unsupervised skill discovery (Eysenbach et al., 2018; Sharma et al., 2019) and empowerment (Mohamed & Rezende, 2015), we derive a task-agnostic objective to evaluate the quality of a morphology. The key idea behind this evaluator is that a performant morphology is likely one that exhibits strong exploration and control by easily reaching a large number of states in a predictable manner. We formalize this intuition with an information-theoretic objective and use it as a fitness function in an evolutionary optimization loop. Candidate morphologies are mutated and then randomly sample and execute action primitives in their environment. The resulting data is used to estimate the agents' fitness per the information-theoretic objective. Our contributions are summarized as follows: First, we derive an easily computable informationtheoretic objective to rank morphologies by their ability to explore and control their environment. Second, using this metric in conjunction with Graph Neural Networks, we develop Task-Agnostic Morphology Evolution (TAME), an unsupervised algorithm for discovering morphologies of an arbitrary number of limbs using only randomly sampled action primitives. Third, we empirically demonstrate that across 2D, 3D, and manipulation environments TAME can evolve morphologies that match the multi-task performance of those learned with task supervised algorithms.

2. RELATED WORK

Our approach to morphology optimization builds on a broad set of prior work. For conciseness, we summarize the most relevant ones. Morphology Optimization. Optimizing hardware has been a long studied problem, yet most approaches share two common attributes: first, they all focus on a single task, and second, they all explicitly learn behavior for that task. Sims (1994) pioneered the field of morphology optimization by simultaneously evolving morphologies of 3D-blocks and their policy networks. Cheney et al. ( 2013 



) andCheney et al. (2018)  reduce the search space by constraining form and function to oscillating 3D voxels. More recently,Nygaard et al. (2020)  evolve the legs of a real-world robot. Unlike TAME, these approaches depend on task reward as a fitness function to maintain and update a population of agents. Quality diversity based objectives(Lehman & Stanley, 2011; Nordmoen et al., 2020)   augment regular task-fitness with unsupervised objectives to discover a diverse population of agents. These approaches are complementary to ours as quality diversity metrics could be incorporated into the TAME algorithm for similar effects. RL has also been applied to optimize the parameters of an agent's pre-defined structure. Ha (2019) use a population-based policy-gradient method, Schaff et al. (2019) utilize a distribution over hardware parameters, Luck et al. (2020) learn a morphology conditioned value function, and Chen et al. (2020) treat hardware as policy parameters by simulating the agent with computational graphs. While these RL-based approaches explicitly learn task behavior to inform morphology optimization, we do not learn any policies due to their computation expense. Moreover, all these methods are gradient-based, restricting them to fixed topology optimization where morphologies cannot have a varying number of joints.Graph Neural Networks. Graph Neural Networks have shown to be effective representations for policy learning across arbitrary agent topologies(Wang et al., 2018; Huang et al., 2020). These representations have also been used for agent design.Pathak et al. (2019)  treats agent construction as an RL problem by having modular robots learn to combine. Most related to our work, Neural Graph

