END-TO-END INVARIANCE LEARNING WITH RE-LATIONAL INDUCTIVE BIASES IN MULTI-OBJECT ROBOTIC MANIPULATION

Abstract

Although reinforcement learning has seen remarkable progress over the last years, solving robust dexterous object-manipulation tasks in multi-object settings remains a challenge. In this paper, we focus on models that can learn manipulation tasks in fixed multi-object settings and extrapolate this skill zero-shot without any drop in performance when the number of objects changes. We consider the generic task of moving a single cube out of a set to a goal position. We find that previous approaches, which primarily leverage attention and graph neural network-based architectures, do not exhibit this invariance when the number of input objects changes while scaling as K 2 . We analyse effects on generalization of different relational inductive biases and then propose an efficient plug-and-play module that overcomes these limitations. Besides exceeding performances in their training environment, we show that our approach, which scales linearly in K, allows agents to extrapolate and generalize zero-shot to any new object number.

1. INTRODUCTION

Deep reinforcement learning (RL) has witnessed remarkable progress over the last years, particularly in domains such as video games or other synthetic toy settings (Mnih et al., 2015; Silver et al., 2016; Vinyals et al., 2019) . On the other hand, applying deep RL on real-world grounded robotic setups such as learning seemingly simple dexterous manipulation tasks in multi-object settings is still confronted with many fundamental limitations being the focus of many recent works (Duan et al., 2017; Janner et al., 2018; Deisenroth et al., 2011; Kroemer et al., 2018; Andrychowicz et al., 2020; Rajeswaran et al., 2017; Lee et al., 2021; Funk et al., 2021) . The RL problem in robotics setups is much more challenging (Dulac-Arnold et al., 2019) . Compared to discrete toy environments, state and action spaces are continuous, and solving tasks typically requires long-horizon time spans, where the agent needs to apply long sequences of precise low-level control actions. Accordingly, exploration under easy-to-define sparse rewards becomes only feasible with horrendous amounts of data. This is usually impossible in the real world but has been likewise hard for computationally demanding realistic physics simulators. To alleviate this, manually designing task-specific dense reward functions is usually required, but this can often lead to undesirable or very narrow solutions. Numerous approaches exist to alleviate this even further by e.g. imitation learning from expert demonstrations (Abbeel & Ng, 2004 ), curricula (Narvekar et al., 2020) , or model-based learning (Kaelbling et al., 1996) . Another promising path is to constrain the possible solution space of a learning agent by encoding suitable inductive biases in their architecture (Geman et al., 1992) . Choosing inductive biases that leverage the underlying problem structure can help to learn solutions that facilitate desired generalization capabilities Mitchell (1980); Baxter (2000) ; Hessel et al. (2019) . In robotics, multi-object manipulation tasks naturally suit a compositional description of their current state in terms of symbol-like entities (such as physical objects, robot parts, etc.). These representations can be directly obtained in simulator settings and ultimately hoped to be inferred robustly from learned object perception modules Greff et al. (2020a); Kipf et al. (2021); Locatello et al. (2020) . While such a compositional understanding is in principle considered crucial for any systematic generalization ability Greff et al. (2020b) ; Spelke (1990); Battaglia et al. (2018); Garnelo et al. (2016) it remains an open question how to design an agent that can process this type of input data to leverage this promise.

