LEARNING SUBGOAL REPRESENTATIONS WITH SLOW DYNAMICS

Abstract

In goal-conditioned Hierarchical Reinforcement Learning (HRL), a high-level policy periodically sets subgoals for a low-level policy, and the low-level policy is trained to reach those subgoals. A proper subgoal representation function, which abstracts a state space to a latent subgoal space, is crucial for effective goal-conditioned HRL, since different low-level behaviors are induced by reaching subgoals in the compressed representation space. Observing that the high-level agent operates at an abstract temporal scale, we propose a slowness objective to effectively learn the subgoal representation (i.e., the high-level action space). We provide a theoretical grounding for the slowness objective. That is, selecting slow features as the subgoal space can achieve efficient hierarchical exploration. As a result of better exploration ability, our approach significantly outperforms stateof-the-art HRL and exploration methods on a number of benchmark continuouscontrol tasks 12 . Thanks to the generality of the proposed subgoal representation learning method, empirical results also demonstrate that the learned representation and corresponding low-level policies can be transferred between distinct tasks.

1. INTRODUCTION

Deep Reinforcement Learning (RL) has demonstrated increasing capabilities in a wide range of domains, including playing games (Mnih et al., 2015; Silver et al., 2016) , controlling robots (Schulman et al., 2015; Gu et al., 2017) and navigation in complex environments (Mirowski et al., 2016; Zhu et al., 2017) . Solving temporally extended tasks with sparse or deceptive rewards is one of the major challenges for RL. Hierarchical Reinforcement Learning (HRL), which enables control at multiple time scales via a hierarchical structure, provides a promising way to solve those challenging tasks. Goal-conditioned methods have long been recognized as an effective paradigm in HRL (Dayan & Hinton, 1993; Schmidhuber & Wahnsiedler, 1993; Nachum et al., 2019) . In goal-conditioned HRL, higher-level policies set subgoals for lower-level ones periodically, and lower-level policies are incentivized to reach these selected subgoals. A proper subgoal representation function, abstracting a state space to a latent subgoal space, is crucial for effective goal-conditioned HRL, because the abstract subgoal space, i.e., high-level action space, simplifies the high-level policy learning, and explorative low-level behaviors can be induced by setting different subgoals in this compressed space as well. Recent works in goal-conditioned HRL have been concentrated on implicitly learning the subgoal representation in an end-to-end manner with hierarchical policies (Vezhnevets et al., 2017; Dilokthanakul et al., 2019) , e.g., using a variational autoencoder (Péré et al., 2018; Nair & Finn, 2019; Nasiriany et al., 2019) , directly utilizing the state space (Levy et al., 2019) or a handcrafted space (Nachum et al., 2018) as a subgoal space. Sukhbaatar et al. (2018) proposed to learn subgoal embeddings via self-play, and Ghosh et al. ( 2018) designed a representation learning objective using an actionable distance metric, but both of the methods need a pretraining process. Near-Optimal



Videos available at https://sites.google.com/view/lesson-iclr Find open-source code at https://github.com/SiyuanLee/LESSON 1

