DESIGNING AND USING GOAL-CONDITIONED TOOLS

Abstract

When limited by their own morphologies, humans and some species of animals have the remarkable ability to use objects from the environment towards accomplishing otherwise impossible tasks. Embodied agents might similarly unlock a range of additional capabilities through tool use. Recent techniques for jointly optimizing morphology and control via deep learning output effective solutions for tasks such as designing locomotion agents. But while designing a single-goal morphology makes sense for locomotion, manipulation involves a wide variety of strategies depending on the task goals at hand. An agent must be capable of rapidly prototyping specialized tools for different goals. Therefore, we propose the idea of learning a designer policy, rather than a single design. A designer policy is conditioned on task goals, and outputs a design for a tool that helps solve the task. A design-agnostic controller policy can then perform manipulation using these tools. In this work, we introduce a reinforcement learning framework for learning these policies. Through simulated manipulation tasks, we show that this framework is more sample efficient than black-box optimization methods in multi-goal settings. It can also perform zero-shot interpolation or finetuning to tackle previously unseen goals. Finally, we demonstrate that our framework allows tradeoffs between the complexity of design and control policies when required by practical constraints.

1. INTRODUCTION

Humans and animals are able to make use of tools to solve manipulation tasks when they are constrained by their own morphologies. For example, when an item has been lost below the sofa, one might quickly deduce that a long stick will help them retrieve it. Chimpanzees have been observed using tools to access termites as food and hold water (Goodall, 1964) , and cockatoos are able to create stick-like tools by cutting shapes from wood (Auersperg et al., 2016) . To flexibly and resourcefully accomplish a range of tasks comparable to humans, embodied agents should also be able to leverage tools. However, while any object in a human or robot's environment is a potential tool, these objects often need to be correctly selected or combined to form a useful aid for the task goal at hand. For this reason, we investigate not only how agents can perform control using tools, but also how they can design appropriate tools when presented with a particular task goal, such as a target position or object location. For an embodied agent to design and use tools in realistic environments with minimal supervision, it must be able to efficiently learn design and control policies with reward signals specified based only on task completion. Furthermore, it should form specialized tools based on the task goal at hand, as shown in Figure 1 . Finally, it should be able to work with the materials it has available, rather than attempting to create potentially unrealizable designs. Without detailed supervision, how can an agent acquire effective policies for both tool design and control? The combined space of potential designs and control policies grows exponentially even for simple tasks, and the majority of candidate tools and trajectory executions may not make any progress towards task completion. As a result, zeroth-order optimization techniques like evolutionary strategies and naive reinforcement learning approaches require many samples from the environment to find solutions. Prior works have studied joint learning of agent morphologies and control policies for locomotion tasks (Pathak et al., 2019; Luck et al., 2019; Hejna et al., 2021; Gupta et al., 2021) , and methods leveraging graph neural networks (GNN)s have shown promising performance improvements using just task rewards as supervision (Yuan et al., 2022) . However, these approaches optimize designs for a generic goal, such as maintaining balance or forward speed. Designing a single-goal morphology is suited for locomotion, but manipulation requires a range of strategies depending on the given task. An agent must be capable of rapidly prototyping specialized tools for different manipulation goals. In this work, we tackle the challenges of learning design and control solely from task progress rewards by leveraging recent work in joint morphology and control optimization for locomotion agents, which performs RL using a multi-stage Markov decision process (MDP) combined with graph neural network (GNN) policies and value functions to achieve improved sample efficiency and performance in the joint learning setting. This, combined with a simple chain-link tool design parameterization, allows us to perform efficient learning of designer and controller policies together in a high-dimensional combined space. We train these policies on multiple goal settings for each task so that they can produce designs best suited to each goal and manipulate tools with varying geometries. Lastly, we investigate how tunable parameters can control the trade-offs between the complexity of design and manipulation under resource constraints. Our main contribution is a learning framework for embodied agents to design and use rigid tools for manipulation tasks. We leverage a multi-stage reinforcement learning pipeline to learn goal-specific tools in addition to manipulation policies that can perform control with a range of tools. We demonstrate that this approach can jointly learn these policies in a sample-efficient manner in a variety of sparse reward manipulation tasks, outperforming zeroth-order stochastic optimization approaches. By introducing a tradeoff parameter between the complexity of design and control components, our approach allows us to adjust the learned components to fit resource and environmental constraints, such as available materials or energy costs. To the best of our knowledge, this work is the first that studies learning goal-dependent tool design and control without any prior knowledge about the task.

2. RELATED WORK

Computational approaches to agent design. Many works have studied the problem of optimizing the design of robotic agents and end-effectors via model-based optimization (Kawaharazuka et al., 2020; Allen et al., 2022) , generative modeling (Wu et al., 2019; Ha et al., 2020) , evolutionary strategies (Hejna et al., 2021 ), stochastic optimization (Exarchos et al., 2022) , or reinforcement learning (Li et al., 2021) . These methods provide feedback to the design procedure by having the agent execute predefined trajectories or perform motion planning. In contrast, we aim to jointly learn control policies along with designing tool structures. In settings where the desired design is known but must be assembled from subcomponents, geometry (Nair et al., 2020) and reinforcement learning (Ghasemipour et al., 2022) have been used to compose objects into tools. Learning robotic tool use. Several approaches have been proposed for empowering robots to learn to use tools. Affordance learning is one common paradigm (Fang et al., 2018; Brawer et al., 2020; Xu et al., 2021 ). Noguchi et al. (2021) integrate tool and gripper action spaces in a Transporter-style framework. Learned or simulated dynamics models (Allen et al., 2019; Xie et al., 2019; Girdhar et al., 2020; Lin et al., 2022) have also been used for model-based optimization of tool-aided control. These methods assume that a helpful tool is already present in the scene, whereas we focus on optimizing tool design in conjunction with learning manipulation, which is a more likely scenario for a generalist robot operating for example in a household. Joint optimization of morphology and control. One approach for jointly solving tool design and manipulation problems is formulating and solving nonlinear programs, which have been shown to be especially effective at longer horizon sequential manipulation tasks (Toussaint et al., 2018; 2021) . In this work, we aim to apply our framework to arbitrary environments, and so we select a purely learning-based approach at the cost of increasing the complexity of the search space.



Figure 1: An agent may need to design and use different tools to fetch a high-up book (orange) or push it into the bookshelf (green). Therefore, it should rapidly prototype tools for the tasks at hand.

