PRE-TRAINING FOR ROBOTS: LEVERAGING DIVERSE MULTITASK DATA VIA OFFLINE RL

Abstract

Recent progress in deep learning highlights the tremendous potential of utilizing diverse datasets for achieving effective generalization and makes it enticing to consider leveraging broad datasets for attaining more robust generalization in robotic learning as well. However, in practice we likely will want to learn a new skill in a new environment that is unlikely to be contained in the prior data. Therefore we ask: how can we leverage existing diverse offline datasets in combination with small amounts of task-specific data to solve new tasks, while still enjoying the generalization benefits of training on large amounts of data? In this paper, we demonstrate that end-to-end offline RL can be an effective approach for doing this, without the need for any representation learning or vision-based pre-training. We present pre-training for robots (PTR), a framework based on offline RL that attempts to effectively learn new tasks by combining pre-training on existing robotic datasets with rapid fine-tuning on a new task, with as a few as 10 demonstrations. At its core, PTR applies an existing offline RL method such as conservative Qlearning (CQL), but extends it to include several crucial design decisions that enable PTR to actually work and outperform a variety of prior methods. To the best of our knowledge, PTR is the first offline RL method that succeeds at learning new tasks in a new domain on a real WidowX robot with as few as 10 task demonstrations, by effectively leveraging an existing dataset of diverse multi-task robot data collected in a variety of toy kitchens.

1. INTRODUCTION

Robotic learning methods based on reinforcement learning (RL) or imitation learning (IL) have led to a number of impressive results (Levine et al., 2016; Kalashnikov et al., 2018; Young et al., 2020; Kalashnikov et al., 2021; Ahn et al., 2022) , but the generalization abilities of policies learned in this way are typically limited by the quantity and breadth of the data available to train them. In practice, the cost of real-world data collection for each task means that such methods often use smaller datasets, which leads to more limited generalization. A natural way to circumvent this limitation is to incorporate existing diverse robotic datasets into the training pipeline of a robot learning algorithm, analogously to how pretraining on diverse prior datasets has enabled rapid finetuning in supervised learning fields, such as computer vision and NLP. But how can we devise algorithms that enable effective pretraining for robotic RL? In most cases, answering this question requires a method that can pre-train on existing data from a wide range of tasks and domains, and then provide a good starting point for efficiently learning a new task in a new domain. Prior approaches utilize such existing data by running imitation learning (IL) (Young et al., 2020; Ebert et al., 2021; Shafiullah et al., 2022) or by using representation learning (Nair et al., 2022) methods for pre-training and then fine-tuning with imitation learning. However, this may not necessarily lead to representations that can reason about the consequences of their actions. In contrast, end-to-end RL can offer a more general paradigm, that can be effective for both pre-training and fine-tuning, and is applicable even when assumptions in prior work are violated. Therefore we ask, can we devise a simple and unified framework where both the pretraining and finetuning process uses RL? This presents significant challenges pertaining to leveraging large amounts of offline multi-task datasets, which would require high capacity models and this can be very challenging (Bjorck et al., 2021) .

