EMERGENCE OF MAPS IN THE MEMORIES OF BLIND NAVIGATION AGENTS

Abstract

Animal navigation research posits that organisms build and maintain internal spatial representations, or maps, of their environment. We ask if machines -specifically, artificial intelligence (AI) navigation agents -also build implicit (or 'mental') maps. A positive answer to this question would (a) explain the surprising phenomenon in recent literature of ostensibly map-free neural-networks achieving strong performance, and (b) strengthen the evidence of mapping as a fundamental mechanism for navigation by intelligent embodied agents, whether they be biological or artificial. Unlike animal navigation, we can judiciously design the agent's perceptual system and control the learning paradigm to nullify alternative navigation mechanisms. Specifically, we train 'blind' agents -with sensing limited to only egomotion and no other sensing of any kind -to perform PointGoal navigation ('go to ∆x, ∆y') via reinforcement learning. Our agents are composed of navigation-agnostic components (fully-connected and recurrent neural networks), and our experimental setup provides no inductive bias towards mapping. Despite these harsh conditions, we find that blind agents are (1) surprisingly effective navigators in new environments (∼95% success); (2) they utilize memory over long horizons (remembering ∼1,000 steps of past experience in an episode); (3) this memory enables them to exhibit intelligent behavior (following walls, detecting collisions, taking shortcuts); (4) there is emergence of maps and collision detection neurons in the representations of the environment built by a blind agent as it navigates; and (5) the emergent maps are selective and task dependent (e.g. the agent 'forgets' exploratory detours). Overall, this paper presents no new techniques for the AI audience, but a surprising finding, an insight, and an explanation.

1. INTRODUCTION

Decades of research into intelligent animal navigation posits that organisms build and maintain internal spatial representations (or maps)foot_0 of their environment, that enables the organism to determine and follow task-appropriate paths (Tolman, 1948; O'keefe & Nadel, 1978; Epstein et al., 2017) . Hamsters, wolves, chimpanzees, and bats leverage prior exploration to determine and follow shortcuts they may never have taken before (Chapuis & Scardigli, 1993; Peters, 1976; Menzel, 1973; Toledo et al., 2020; Harten et al., 2020) . Even blind mole rats and animals rendered situationallyblind in dark environments demonstrate shortcut behaviors (Avni et al., 2008; Kimchi et al., 2004; Maaswinkel & Whishaw, 1999) . Ants forage for food along meandering paths but take near-optimal return trips (Müller & Wehner, 1988) , though there is some controversy about whether insects like ants and bees are capable of forming maps (Cruse & Wehner, 2011; Cheung et al., 2014) . Analogously, mapping and localization techniques have long played a central role in enabling nonbiological navigation agents (or robots) to exhibit intelligent behavior (Thrun et al., 2005; Institute, 1972; Ayache & Faugeras, 1988; Smith et al., 1990) . More recently, the machine learning community has produced a surprising phenomenon -neural-network models for navigation that curiously do not contain any explicit mapping modules but still achieve remarkably high performance (Savva et al., 2019; Wijmans et al., 2020; Kadian et al., 2020; Chattopadhyay et al., 2021; Khandelwal et al., 2022; Partsey et al., 2022; Reed et al., 2022) . For instance, Wijmans et al. (2020) showed that a simple 'pixels-to-actions' architecture (using a CNN and RNN) can navigate to a given point in a novel environment with near-perfect accuracy; Partsey et al. (2022) further generalized this result to more realistic sensors and actuators. Reed et al. (2022) showed a similar general purpose architecture (a transformer) can perform a wide variety of embodied tasks, including navigation. The mechanisms explaining this ability remain unknown. Understanding them is both of scientific and practical importance due to safety considerations involved with deploying such systems. In this work, we investigate the following question -is mapping an emergent phenomenon? Specifically, do artificial intelligence (AI) agents learn to build internal spatial representations (or 'mental' maps) of their environment as a natural consequence of learning to navigate? The specific task we study is PointGoal navigation (Anderson et al., 2018) , where an AI agent is introduced into a new (unexplored) environment and tasked with navigating to a relative location -'go 5m north, 2m west relative to start'foot_1 . This is analogous to the direction and distance of foraging locations communicated by the waggle dance of honey bees (Von Frisch, 1967) . Unlike animal navigation studies, experiments with AI agents allow us to precisely isolate mapping from alternative mechanisms proposed for animal navigation -the use of visual landmarks (Von Frisch, 1967) , orientation by the arrangement of stars (Lockley, 1967) , gradients of olfaction or other senses (Ioalè et al., 1990) . We achieve this isolation by judiciously designing the agent's perceptual system and the learning paradigm such that these alternative mechanisms are rendered implausible. Our agents are effectively 'blind'; they possess a minimal perceptual system capable of sensing only egomotion, i.e. change in the agent's location and orientation as the it moves -no vision, no audio, no olfactory, no haptic, no magnetic, or any other sensing of any kind. This perceptual system is deliberately impoverished to isolate the contribution of memory, and is inspired by blind mole rats, who perform localization via path integration and use the Earth's magnetic field as a compass (Kimchi et al., 2004) . Further still, our agents are composed of navigation-agnostic, generic, and ubiquitous architectural components (fully-connected layers and LSTM-based recurrent neural networks), and our experimental setup provides no inductive bias towards mapping -no map-like or spatial structural components in the agent, no mapping supervision, no auxiliary tasks, nothing other than a reward for making progress towards a goal. Surprisingly, even under these deliberately harsh conditions, we find the emergence of map-like spatial representations in the agent's non-spatial unstructured memory, enabling it to not only successfully navigate to the goal but also exhibit intelligent behavior (like taking shortcuts, following walls, detecting collisions) similar to aforementioned animal studies, and predict free-space in the environment. Essentially, we demonstrate an 'existence proof' or an ontogenetic developmental account for the emergence of mapping without any previous predisposition. Our results also explain the aforementioned surprising finding in recent literature -that ostensibly map-free neural-network achieve strong autonomous navigation performance -by demonstrating that these 'map-free' systems in fact learn to construct and maintain map-like representations of their environment. Concretely, we ask and answer following questions: 1) Is it possible to effectively navigate with just egomotion sensing? Yes. We find that our 'blind' agents are highly effective in navigating new environments -reaching the goal with 95.1%±1.3% success rate. And they traverse moderately efficient (though far from optimal) paths, reaching 62.9%±1.6% of optimal path efficiency. We stress that these are novel testing environments, the agent has not memorized paths within a training environment but has learned efficient navigation strategies that generalize to novel environments, such as emergent wall-following behavior. 2) What mechanism explains this strong performance by 'blind' agents? Memory. We find that memoryless agents completely fail at this task, achieving nearly 0% success. More importantly, we find that agents with memory utilize information stored over a long temporal and spatial horizon and that collision-detection neurons emerge within this memory. Navigation performance as a function of the number of past actions/observations encoded in the agent's memory does not



Throughout this work, we use 'maps' to refer to a spatial representation of the environment that enables intelligent navigation behavior like taking shortcuts. We provide a detailed discussion and contrast w.r.t. a 'cognitive map' as defined byO'keefe & Nadel (1978) in Apx. B.1. The description in English is purely for explanatory purposes; the agent receives relative goal coordinates.

