HELP ME EXPLORE: COMBINING AUTOTELIC AND SO-CIAL LEARNING VIA ACTIVE GOAL QUERIES

Abstract

Most approaches to open-ended skill learning train a single agent in a purely sensorimotor world. But because no human child learns everything on their own, we argue that sociality will be a key component of open-ended learning systems. This paper enables learning agents to blend individual and socially-guided skill learning through a new interaction protocol named Help Me Explore (HME). In social episodes triggered at the agent's demand, a social partner suggests a goal at the frontier of the agent's capabilities and, when the goal is reached, follows up with a new adjacent goal just beyond. In individual episodes, the agent practices skills autonomously by pursuing goals it has already discovered through either its own experience or social suggestions. The idea of augmenting an individual goal exploration with social goal suggestions is simple, general and powerful. We demonstrate its efficiency on hard exploration problems: continuous mazes and a 5-block robotic manipulation task. With minimal social interventions, the HME agent outperforms both the purely social and purely individual agents.

1. INTRODUCTION

Open-ended learning is an important challenge in artificial intelligence (AI) where the goal is to design embodied artificial agents able to grow diverse repertoires of skills across their lives (Doncieux et al., 2018; Stooke et al., 2021) . Goal-conditioned reinforcement learning (GC-RL), because it offers the possibility to train an agent on multiple goals in parallel, recently emerged as a key component in this quest (Schaul et al., 2015; Andrychowicz et al., 2017; Liu et al., 2022) . But where do goals come from? Almost always, they are sampled from a fixed distribution over a predefined goal space; i.e. they come from an engineer. Beyond the heavy engineering burden it presupposes, this approach is fundamentally limited in realistic environments with infinite possibilities because engineers cannot foresee how the agents will learn or what they will be able to achieve. Instead, we must draw inspiration from the study of human learning and pursue a developmental approach: agents should be intrinsically motivated to learn to represent, generate, pursue and master their own goals -i.e. they must be autotelic Steels ( 2004 2020)). In this second challenge -the one we focus on -agents must learn to organize their own learning trajectories by prioritizing goals with the objective of maximizing long-term skill mastery. Exploring and developing skills in unknown goal spaces can be hard when they are not uniform: some goals might be easy, others hard; most of them might be unreachable. So how can an agent select which goals to practice now? Because they direct the agent's behavior, the selected goals impact the discovery of future goals (exploration), and the mastery of known ones (exploitation) -yet another instance of the exploration-exploitation dilemma (Thrun, 1992) . Existing methods belong to the family of automatic curriculum learning strategies (Portelas et al., 2020) . They propose to replace the hard-to-optimize distal objective of general skill mastery with proxies such as intermediate difficulty (Florensa et al., 2018) , learning progress (Colas et al., 2019; Blaes et al., 2019; Akakzia et al., 2020) or novelty (Pong et al., 2019; Ecoffet et al., 2019) . But all ACL strategies are so far limited to generate goals from the distribution of effects the agents already experienced. Can agents demonstrate efficient exploratory capacities if they only target goals they already achieved (Campos et al., 2020) ? To solve this problem, we must once again draw inspiration from the study of human learning, this time from socio-cultural psychology. Philosophers, psychologists, linguists and roboticists alike have argued for the central importance of rich socio-cultural environments in human development (Wood et al., 1976; Vygotsky & Cole, 1978; Berk, 1994; Tomasello, 1999; Lindblom & Ziemke, 2003; Lupyan, 2012; Colas, 2021) . Humans are social beings wired to interact with and learn from others (Vygotsky, 1978; Tomasello, 1999; 2019) . When they explore, it is often through socially-guided interactions (e.g. by having a parent organize the playground) and it often relies on the inter-generational population-based exploration of others, a phenomenon known as the cultural ratchet (Tomasello, 1999). In guided-play interactions (Weisberg et al., 2013; Yu et al., 2018) caretakers scaffold the learning environments of children and help them practice new skills just beyond their current capacities, i.e. in Vygotsky's zone of proximal development (Vygotsky, 1978) . Can AI learn from these insights? We believe so. This paper introduces a novel social interaction protocol for autotelic lagents named Help Me Explore (HME). This consists of a succession of individual and social episodes. In individual episodes, the agent pursues self-generated goals sampled from the set of outcomes the agent already experienced during training. In social episodes, a social partner suggests a novel goal to the agent and decomposes it into two consecutive sub-goals: 1) a frontier goal that the agent already discovered and, if it is reached, 2) a beyond goal never achieved by the agent but just beyond the its current abilities. The frontier goal acts as a stepping stone facilitating the discovery and mastery of the beyond goal. Combined with a powerful autotelic RL algorithm, HME makes significant steps towards teachable autotelic agents: autonomous agents that leverage external social signals and interweave them with their self-organized skill learning processes to gain autonomy, creativity and skill mastery (Sigaud et al., 2021) . Our contributions are twofold: • Help Me Explore (HME). This social interaction protocol lets agents blend social and individual goal explorations through social goal suggestions. It is generally applicable to any multi-goal environment and it is tested in a 5-block robotic manipulation domain, where it enables agents to master 5-block stacking, a notoriously hard exploration challenge. • A socially-sensitive autotelic agent (HME agent). We augment an existing autotelic RL agent with two social capabilities: 1) an active learning mechanism allowing the agent to self-monitor its learning progress and, when it stagnates, query the social partner for a goal suggestion and 2) the ability to internalize social goal suggestions and rehearse them autonomously during individual episodes. This ability is inspired from Vygotsky's concept of internalization describing how learners model social processes occurring in the zone of proximal development to augment their future autonomy and capacities (Vygotsky, 1978) . These two mechanisms drastically reduce the amount of social interactions required to achieve skill mastery. 



); Colas et al. (2022). In a recent paper introducing the autotelic framework, Colas et al. identify two challenges: 1) learning goal representations; 2) exploring the corresponding goal space and mastering the associated skills (Colas et al., 2022). Although, eventually, all autotelic agents must learn their own goal representations (challenge 1, Eysenbach et al. (2018); Nair et al. (2018b); Colas et al. (2020)), most existing approaches assume a pre-defined goal space and focus on its exploration and mastery (challenge 2, Schaul et al. (2015); Colas et al. (2019); Akakzia et al. (

Figure1: The HME social interaction protocol and the social autotelic agent. The autotelic agent (blue) interacts with its world and discovers new configurations (grey box, right). Through its goal source selector (blue box, middle), the agent decides whether to sample from its known goals (blue box, bottom left) or to query the social partner for a social goal (pink box, top left). Socially-suggested goals are further internalized within the agent for later use during individual episodes.

