EFFICIENT EXPLORATION VIA FRAGMENTATION AND RECALL

Abstract

Efficient exploration and model-building are critical for learning in large statespaces. However, agents can face problems like getting stuck in local optima during exploration, and catastrophic forgetting when constructing models in heterogenous environments. Here, we propose and apply the concept of Fragmentation-and-Recall to solve spatial (FarMap) and reinforcement learning problems (FarCuriosity). Agents construct local maps or local models, respectively, which are used to predict the current observation. High surprisal points lead to a fragmentation event. At fracture points, we store the current map or model fragment in a long-term memory (LTM) and initialize a new fragment. On the other hand, Fragments are recalled (and thus reused) from LTM if the observations of their fracture points match the agent's current observation during exploration. The set of fracture points defines a set of intrinsic potential subgoals. Agents choose their next subgoal from the set of near and far potential subgoals in the current fragment or LTM respectively. Thus, local maps and model fragments guide exploration locally and avoid catastrophic forgetting when learning in heterogeneous environments, while LTM promotes global exploration. We evaluate FarMap and FarCuriosity on complex procedurally-generated spatial environments and on reinforcement learning benchmarks to demonstrate that the proposed methods are more efficient from a memory usage standpoint, and achieve better task performance overall.

1. INTRODUCTION

Human episodic memory breaks our continuous experience of the world into episodes or fragments that are divided by event boundaries that involve large changes of place, context, affordances, and perceptual inputs (Baldassano et al., 2017; Ezzyat & Davachi, 2011; Newtson & Engquist, 1976; Richmond & Zacks, 2017; Swallow et al., 2009; Zacks & Swallow, 2007) . The episodic nature of memory is a core component of how we construct models of the world. It has been conjectured that episodic memory makes it easier to perform memory search, and to use the retrieved memories in chunks that are relevant for the current context. Humans also continue to learn and memorize new information throughout their lives, without needing to reconfigure all previously stored memories. These observations suggest a certain locality or fragmented nature to how we model the world. Chunking of experience has been shown to play a key role in perception, learning and cognition in humans and animals (De Groot, 1946; Egan & Schwartz, 1979; Gobet et al., 2001; Gobet & Simon, 1998; Simon, 1974) . In the hippocampus, place cells appear to chunk spatial information by defining separate maps when there has been a sufficiently large change in context or in other non-spatial or spatial variables, through a process called remapping; see Colgin et al. (2008); Fyhn et al. (2007) . Grid and place cells in the hippocampal formation have also been shown to fragment their representations when the external world or their own behaviors have changed only gradually rather than discontinuously (Derdikman et al., 2009; Low et al., 2021 ). Recently, Klukas et al. (2021) proposed how such remapping could occur in even during continuous navigation through a continuous environment, modeling the process as one of online clustering based on observational surprisal. Similarly, when fitting complicated manifolds or functions, it is common to build a set of simpler local models of the manifold or function. Inspired by these ideas, here we propose building models for complicated spaces by fitting a sequence of local models, and using local models obtained through an online process of fragmentation to aid in the the exploratory process of moving through a large space and building a model of the space. We propose a new framework for exploration based on a concept of online Fragmentation-and-Recall, schematized in Figure 1 . This model combines two ideas: 1) when faced with a complex world, it can be more efficient to build and combine multiple (and implicitly simpler) local models than to build a single global (and implicitly complex) model, and 2) boundaries between local models should occur when a local model ceases to be predictive. In what follows, as an agent explores, it predicts its next observation. Based on a measure of surprisal between its observation and prediction, there can be a fragmentation event, at which point the agent writes the current model into long-term memory (LTM) and initiates a new local model. While exploring the space, the agent consults its LTM, and recalls an existing model if it matches its observations. For the spatial domain, this is very similar to Klukas et al. (2021) . The agent uses its current local model to act locally, and its LTM to act more globally. We apply this concept to solve spatial exploration and more general reinforcement learning exploration problems, and call the corresponding approaches FarMap and FarCuriosity, respectively. We evaluate the proposed framework on procedurally-generated spatial environments and reinforcement learning benchmarks. Experimental results support the effectiveness of the proposed framework; FarMap explores the spatial environment with much less memory usage and is faster than its baselines (Yamauchi, 1997) by large margins, and FarCuriosity achieves better performance than the baseline fragmentation-less curiosity module (Burda et al., 2019) on standard heterogeneous Atari game benchmarks 1 . The contribution of this paper is three-fold as follows: • We propose a new framework for exploration based on Fragmentation-and-Recall that divides the exploration space into multiple fragments and recalls previously explored ones. • We implemented our framework in spatial exploration tasks, referring to it as FarMap with short and long-term memory. Our experiments showed that FarMap reduces online memory size and wall-clock time relative to baselines. • We implemented our framework in a curiosity-driven reinforcement learning exploration setting, referring to it as FarCuriosity. FarCuriosity avoids catastrophic forgetting and achieves better performance compared to the baseline in heterogenous environments.

2. RELATED WORK

Frontier-based spatial exploration in SLAM SLAM (simultaneous localization and mapping) agents must efficiently explore spaces to build maps. A standard approach to exploration in SLAM is to define the frontier between observed and unobserved regions of a 2d environment, and then select exploratory goal locations from the set of frontier states (Yamauchi, 1997) . Frontier-based exploration has been extended to 3d environments (Dai et al., 2020; Dornhege & Kleiner, 2011) and used as a building block of more sophisticated exploration strategies (Stachniss et al., 2004) . Although conceptually simple, frontier-based exploration can be quite effective compared to more sophisticated decision-theoretic exploration (Holz et al., 2010) . A cost of frontier-based exploration is 1 Heterogeneous environment is an environment that has diverse states that require the larger model capacity to memorize visited states for generating intrinsic reward. Please refer to Section 4.2 for more details.



Figure 1: Overview of our approach. Given an observation from the environment, the FarMap or FarCuriosity agent decides whether to fragment the space based on how well it can predict the observation. If fragmentation occurs, the current map (or model) fragment is stored in long-term memory (LTM); the agent then initializes a new map (or model) fragment. Conversely, if the current observation closely matches with the observations stored in LTM, the agent loads an existing map (or model) fragment from there (recall). Based on the current fragment, the agent selects an action to explore the environment.

