UNIVERSAL EMBODIED INTELLIGENCE: LEARNING FROM CROWD, RECOGNIZING THE WORLD, AND REIN-FORCED WITH EXPERIENCE

Abstract

The interactive artificial intelligence in the motion control field is an interesting topic, especially when universal knowledge adaptive to multiple task and universal environments is wanted. Although there are increasing efforts on Reinforcement learning (RL) studies with the assistance of transformers, it might subject to the limitation of the offline training pipeline, in which the exploration and generalization ability is prohibited. Motivated by the cognitive and behavioral psychology, such agent should have the ability to learn from others, recognize the world, and practice itself based on its own experience. In this study, we propose the framework of Online Decision MetaMorphFormer (ODM) which attempts to achieve the above learning modes, with a unified model architecture to both highlight its own body perception and produce action and observation predictions. ODM can be applied on any arbitrary agent with a multi-joint body, located in different environments, trained with different type of tasks. Large-scale pretrained dataset are used to warmup ODM while the targeted environment continues to reinforce the universal policy. Substantial online experiments as well as few-shot and zero-shot tests in unseen environments and never-experienced tasks verify ODM's performance, and generalization ability. Our study shed some lights on research of general artificial intelligence on the embodied and cognitive field studies. Code, result and video examples can be found on the website https://baimaxishi.github.io/.

1. INTRODUCTION

Research of embodied intelligence focus on the learning of control policy given the agent with some morphology (joints, limbs, motion capabilities), while it has always been a topic whether the control policy should be more general or specific. As the improvement of large-scale data technology and cloud computing ability, the idea of artificial general intelligence (AGI) has received substantial interest (Reed et al., 2022) . Accordingly, a natural motivation is to develop a universal control policy for different morphological agents and easy adaptive to different scenes. It is argued that such a smart agent could be able to identify its 'active self' by recognizing the egocentric, proprioceptive perception, react with exteroceptive observations and have the perception of world forward model (Hoffmann & Pfeifer, 2012) . However, there is seldom such machine learning framework by so far although some previous studies have similar attempts in one or several aspects. Reinforcement Learning(RL) learns the policy interactively based on the environment feedback therefore could be viewed as a general solution for our embodied control problem. Conventional RL could solve the single-task problem in an online paradigm, but is relatively difficult to implement and slow in practice, and lack of generalization and adaptation ability. Offline RL facilitates the implementation but in cost of performance degradation. Inspired by recent progress of large model on language and vision fields, transformer-based RL (Reed et al., 2022; Chen et al., 2021; Lee et al., 2022; Janner et al., 2021; Zheng et al., 2022; Xu et al., 2022) has been proposed by transforming RL trajectories as a large time sequence model and train it in the auto-regressive manner. Such methods provide an effective approach to train a generalist agent for different tasks and environments, but usually have worse performance than classic RL, and fail to capture the morphology information. In contrast, MetaMorph (Gupta et al., 2022) chooses to encode on agent's body mor-

