LEARNING WITH AMIGO: ADVERSARIALLY MOTIVATED INTRINSIC GOALS

Abstract

A key challenge for reinforcement learning (RL) consists of learning in environments with sparse extrinsic rewards. In contrast to current RL methods, humans are able to learn new skills with little or no reward by using various forms of intrinsic motivation. We propose AMIGO, a novel agent incorporating-as form of meta-learning-a goal-generating teacher that proposes Adversarially Motivated Intrinsic GOals to train a goal-conditioned "student" policy in the absence of (or alongside) environment reward. Specifically, through a simple but effective "constructively adversarial" objective, the teacher learns to propose increasingly challenging-yet achievable-goals that allow the student to learn general skills for acting in a new environment, independent of the task to be solved. We show that our method generates a natural curriculum of self-proposed goals which ultimately allows the agent to solve challenging procedurally-generated tasks where other forms of intrinsic motivation and state-of-the-art RL methods fail.

1. INTRODUCTION

The success of Deep Reinforcement Learning (RL) on a wide range of tasks, while impressive, has so far been mostly confined to scenarios with reasonably dense rewards (e.g. Mnih et al., 2016; Vinyals et al., 2019) , or to those where a perfect model of the environment can be used for search, such as the game of Go and others (e.g. Silver et al., 2016; Duan et al., 2016; Moravcík et al., 2017) . Many real-world environments offer extremely sparse rewards, if any at all. In such environments, random exploration, which underpins many current RL approaches, is likely to not yield sufficient reward signal to train an agent, or be very sample inefficient as it requires the agent to stumble onto novel rewarding states by chance. In contrast, humans are capable of dealing with rewards that are sparse and lie far in the future. For example, to a child, the future adult life involving education, work, or marriage provides no useful reinforcement signal. Instead, children devote much of their time to play, generating objectives and posing challenges to themselves as a form of intrinsic motivation. Solving such self-proposed tasks encourages them to explore, experiment, and invent; sometimes, as in many games and fantasies, without any direct link to reality or to any source of extrinsic reward. This kind of intrinsic motivation might be a crucial feature to enable learning in real-world environments (Schulz, 2012) . To address this discrepancy between naïve deep RL exploration strategies and human capabilities, we present a novel meta-learning method wherein part of the agent learns to self-propose Adversarially Motivated Intrinsic Goals (AMIGO). In AMIGO, the agent is decomposed into a goal-generating teacher and a goal-conditioned student policy. The teacher acts as a constructive adversary to the

