DESIGNING AND USING GOAL-CONDITIONED TOOLS

Abstract

When limited by their own morphologies, humans and some species of animals have the remarkable ability to use objects from the environment towards accomplishing otherwise impossible tasks. Embodied agents might similarly unlock a range of additional capabilities through tool use. Recent techniques for jointly optimizing morphology and control via deep learning output effective solutions for tasks such as designing locomotion agents. But while designing a single-goal morphology makes sense for locomotion, manipulation involves a wide variety of strategies depending on the task goals at hand. An agent must be capable of rapidly prototyping specialized tools for different goals. Therefore, we propose the idea of learning a designer policy, rather than a single design. A designer policy is conditioned on task goals, and outputs a design for a tool that helps solve the task. A design-agnostic controller policy can then perform manipulation using these tools. In this work, we introduce a reinforcement learning framework for learning these policies. Through simulated manipulation tasks, we show that this framework is more sample efficient than black-box optimization methods in multi-goal settings. It can also perform zero-shot interpolation or finetuning to tackle previously unseen goals. Finally, we demonstrate that our framework allows tradeoffs between the complexity of design and control policies when required by practical constraints.

1. INTRODUCTION

Humans and animals are able to make use of tools to solve manipulation tasks when they are constrained by their own morphologies. For example, when an item has been lost below the sofa, one might quickly deduce that a long stick will help them retrieve it. Chimpanzees have been observed using tools to access termites as food and hold water (Goodall, 1964) , and cockatoos are able to create stick-like tools by cutting shapes from wood (Auersperg et al., 2016) . To flexibly and resourcefully accomplish a range of tasks comparable to humans, embodied agents should also be able to leverage tools. However, while any object in a human or robot's environment is a potential tool, these objects often need to be correctly selected or combined to form a useful aid for the task goal at hand. For this reason, we investigate not only how agents can perform control using tools, but also how they can design appropriate tools when presented with a particular task goal, such as a target position or object location. For an embodied agent to design and use tools in realistic environments with minimal supervision, it must be able to efficiently learn design and control policies with reward signals specified based only on task completion. Furthermore, it should form specialized tools based on the task goal at hand, as shown in Figure 1 . Finally, it should be able to work with the materials it has available, rather than attempting to create potentially unrealizable designs. Without detailed supervision, how can an agent acquire effective policies for both tool design and control? The combined space of potential designs and control policies grows exponentially even for simple tasks, and the majority of candidate tools and trajectory executions may not make any progress towards task completion. As a result, zeroth-order optimization techniques like evolutionary strategies and naive reinforcement learning approaches require many samples from the environment to find solutions. Prior works have studied joint learning of agent morphologies and control policies for locomotion tasks (Pathak et al., 2019; Luck et al., 2019; Hejna et al., 2021; Gupta et al., 2021) , and methods leveraging graph neural networks (GNN)s have shown promising performance

