GOAL-AUXILIARY ACTOR-CRITIC FOR 6D ROBOTIC GRASPING WITH POINT CLOUDS Anonymous

Abstract

6D robotic grasping beyond top-down bin-picking scenarios is a challenging task. Previous solutions based on 6D grasp synthesis with robot motion planning usually operate in an open-loop setting without considering perception feedback and dynamics and contacts of objects, which makes them sensitive to grasp synthesis errors. In this work, we propose a novel method for learning closed-loop control policies for 6D robotic grasping using point clouds from an egocentric camera. We combine imitation learning and reinforcement learning in order to grasp unseen objects and handle the continuous 6D action space, where expert demonstrations are obtained from a joint motion and grasp planner. We introduce a goal-auxiliary actor-critic algorithm, which uses grasping goal prediction as an auxiliary task to facilitate policy learning. The supervision on grasping goals can be obtained from the expert planner for known objects or from hindsight goals for unknown objects. Overall, our learned closed-loop policy achieves over 90% success rates on grasping various ShapeNet objects and YCB objects in simulation. The policy also transfers well to the real world with only one failure among grasping of ten different unseen objects in the presence of perception noises 1 .

1. INTRODUCTION

Robotic grasping of arbitrary objects is a challenging task. A robot needs to deal with objects it has never seen before, and generates a motion trajectory to grasp an object. Due to the complexity of the problem, majority works in the literature focus on bin-picking tasks, where top-down grasping is sufficient to pick up an object. Both grasp detection approaches (Redmon & Angelova, 2015; Pinto & Gupta, 2016; Mahler et al., 2017) and reinforcement learning-based methods (Kalashnikov et al., 2018; Quillen et al., 2018) are introduced to tackle the top-down grasping problem. However, it is difficult for these methods to grasp objects in environments where 6D grasping is necessary, i.e., 3D translation and 3D rotation of the robot gripper, such as a cereal box on a tabletop or in a cabinet. While 6D grasp synthesis has been studied using 3D models of objects (Miller & Allen, 2004; Eppner et al., 2019) and partial observations (ten Pas et al., 2017; Yan et al., 2018; Mousavian et al., 2019) , these methods only generate 6D grasp poses of the robot gripper for an object, instead of generating a trajectory of the gripper pose to reach and grasp the object. As a result, a motion planner is needed to plan the grasping motion according to the grasp poses. Usually, the planned trajectory is executed in an open-loop fashion since re-planning is expensive, and perception feedback during grasping as well as dynamics and contacts of the object are often ignored, which makes the grasping sensitive to grasp synthesis errors. In this work, to overcome the limitations in the paradigm of 6D grasp synthesis followed by robot motion planning, we introduce a novel method for learning closed-loop 6D grasping polices from partially-observed point clouds of objects. Our policy directly outputs the control action of the robot gripper, which is the relative 6D pose transformation of the gripper. For the state representation, we adopt an egocentric view with a wrist camera mounted on the robot gripper, which avoids self-occlusion of the robot arm during grasping compared to using an external static camera. Additionally, we aggregate point clouds of the object from previous time steps to avoid ambiguities in the current view and encode the history observations. Our point cloud representation provides richer 3D information for 6D grasping and generalizes better to different objects compared to RGB images.



Videos and code are available at https://sites.google.com/view/gaddpg 1

