TOMA: TOPOLOGICAL MAP ABSTRACTION FOR REINFORCEMENT LEARNING

Abstract

Animals are able to discover the topological map (graph) of surrounding environment, which will be used for navigation. Inspired by this biological phenomenon, researchers have recently proposed to learn a graph representation for Markov decision process (MDP) and use such graphs for planning in reinforcement learning (RL). However, existing learning-based graph generation methods suffer from many drawbacks. One drawback is that existing methods do not learn an abstraction for graphs, which results in high memory and computation cost. This drawback also makes generated graph non-robust, which degrades the planning performance. Another drawback is that existing methods cannot be used for facilitating exploration which is important in RL. In this paper, we propose a new method, called topological map abstraction (TOMA), for learning-based graph generation. TOMA can learn an abstract graph representation for MDP, which costs much less memory and computation cost than existing methods. Furthermore, TOMA can be used for facilitating exploration. In particular, we propose planning to explore, in which TOMA is used to accelerate exploration by guiding the agent towards unexplored states. A novel experience replay module called vertex memory is also proposed to improve exploration performance. Experimental results show that TOMA can outperform existing methods to achieve the state-of-the-art performance.

1. INTRODUCTION

Animals are able to discover topological map (graph) of surrounding environment (O' Keefe and Dostrovsky, 1971; Moser et al., 2008) , which will be used as hints for navigation. For example, previous maze experiments on rats (O' Keefe and Dostrovsky, 1971) reveal that rats can create mental representation of the maze and use such representation to reach the food placed in the maze. In cognitive science society, researchers summarize these discoveries in cognitive map theory (Tolman, 1948) , which states that animals can extract and code the structure of environment in a compact and abstract map representation. Inspired by such biological phenomenon, researchers have proposed to generate topological graph representation for Markov decision process (MDP) and use such graphs for planning in reinforcement learning (RL). Early graph generation methods (Mannor et al., 2004 ) are usually prior-based, which apply some human prior to aggregate similar states to generate vertices. Recently, researchers propose some learning-based graph generation algorithms which learn such state aggregation automatically. Such methods have been proved to be better than prior-based methods (Metzen, 2013) . These methods generally treat the states in a replay buffer as vertices. For the edges of the graphs, some methods like SPTM (Savinov et al., 2018) train a reachability predictor via self-supervised learning and combine it with human experience to construct the edges. Other methods like SoRB (Eysenbach et al., 2019) exploit a goal-conditioned agent to estimate the distance between vertices, based on which edges are constructed. These existing methods suffer from the following drawbacks. Firstly, these methods do not learn an abstraction for graphs and usually consider all the states in the buffer as vertices (Savinov et al., 2018) , which results in high memory and computation cost. This drawback also makes generated graph non-robust, which will degrade the planning performance. Secondly, existing methods cannot be used for facilitating exploration, which is important in RL. In particular, methods like SPTM rely on human sampled trajectories to generate the graph, which is infeasible in RL exploration. Methods like SoRB require training another goal-conditioned agent. Such training

