HELP ME EXPLORE: COMBINING AUTOTELIC AND SO-CIAL LEARNING VIA ACTIVE GOAL QUERIES

Abstract

Most approaches to open-ended skill learning train a single agent in a purely sensorimotor world. But because no human child learns everything on their own, we argue that sociality will be a key component of open-ended learning systems. This paper enables learning agents to blend individual and socially-guided skill learning through a new interaction protocol named Help Me Explore (HME). In social episodes triggered at the agent's demand, a social partner suggests a goal at the frontier of the agent's capabilities and, when the goal is reached, follows up with a new adjacent goal just beyond. In individual episodes, the agent practices skills autonomously by pursuing goals it has already discovered through either its own experience or social suggestions. The idea of augmenting an individual goal exploration with social goal suggestions is simple, general and powerful. We demonstrate its efficiency on hard exploration problems: continuous mazes and a 5-block robotic manipulation task. With minimal social interventions, the HME agent outperforms both the purely social and purely individual agents.

1. INTRODUCTION

Open-ended learning is an important challenge in artificial intelligence (AI) where the goal is to design embodied artificial agents able to grow diverse repertoires of skills across their lives (Doncieux et al., 2018; Stooke et al., 2021) . Goal-conditioned reinforcement learning (GC-RL), because it offers the possibility to train an agent on multiple goals in parallel, recently emerged as a key component in this quest (Schaul et al., 2015; Andrychowicz et al., 2017; Liu et al., 2022) . But where do goals come from? Almost always, they are sampled from a fixed distribution over a predefined goal space; i.e. they come from an engineer. Beyond the heavy engineering burden it presupposes, this approach is fundamentally limited in realistic environments with infinite possibilities because engineers cannot foresee how the agents will learn or what they will be able to achieve. Instead, we must draw inspiration from the study of human learning and pursue a developmental approach: agents should be intrinsically motivated to learn to represent, generate, pursue and master their own goals -i.e. they must be autotelic Steels ( 2004 2020)). In this second challenge -the one we focus on -agents must learn to organize their own learning trajectories by prioritizing goals with the objective of maximizing long-term skill mastery. Exploring and developing skills in unknown goal spaces can be hard when they are not uniform: some goals might be easy, others hard; most of them might be unreachable. So how can an agent select which goals to practice now? Because they direct the agent's behavior, the selected goals impact the discovery of future goals (exploration), and the mastery of known ones (exploitation) -yet another instance of the exploration-exploitation dilemma (Thrun, 1992) . Existing methods belong to the family of automatic curriculum learning strategies (Portelas et al., 2020) . They propose to replace the hard-to-optimize distal objective of general skill mastery with proxies such as intermediate difficulty (Florensa et al., 2018 ), learning progress (Colas et al., 2019; Blaes et al., 2019; Akakzia 



); Colas et al. (2022). In a recent paper introducing the autotelic framework, Colas et al. identify two challenges: 1) learning goal representations; 2) exploring the corresponding goal space and mastering the associated skills (Colas et al., 2022). Although, eventually, all autotelic agents must learn their own goal representations (challenge 1, Eysenbach et al. (2018); Nair et al. (2018b); Colas et al. (2020)), most existing approaches assume a pre-defined goal space and focus on its exploration and mastery (challenge 2, Schaul et al. (2015); Colas et al. (2019); Akakzia et al. (

