CHARACTERIZING NEURAL REPRESENTATION OF COGNITIVELY-INSPIRED RL AGENTS DURING AN EVI-DENCE ACCUMULATION TASK

Abstract

Evidence accumulation is thought to be fundamental for decision-making in humans and other mammals. It has been extensively studied in neuroscience and cognitive science with the goal of explaining how sensory information is sequentially sampled until sufficient evidence has accumulated to favor one decision over others. Neuroscience studies suggest that the hippocampus encodes a lowdimensional ordered representation of evidence through sequential neural activity. Cognitive modelers have proposed a mechanism by which such sequential activity could emerge through the modulation of recurrent weights with a change in the amount of evidence. This gives rise to neurons tuned to a specific magnitude of evidence which resemble neurons recorded in the hippocampus. Here we integrated a cognitive science model inside a Reinforcement Learning (RL) agent and trained the agent to perform a simple evidence accumulation tasks inspired by the behavioral experiments on animals. We compared the agent's performance with the performance of agents equipped with GRUs and RNNs. We found that the agent based on a cognitive model was able to learn faster and generalize better while having significantly fewer parameters. We also compared the emergent neural activity across agents and found that in some cases, GRU-based agents developed similar neural representations to agents based on a cognitive model. This study illustrates how integrating cognitive models and artificial neural networks can lead to brain-like neural representations that can improve learning.

1. INTRODUCTION

Converging evidence from cognitive science and neuroscience suggests that the brain represents physical and abstract variables in a structured form, as mental or cognitive maps. These maps are thought to play an essential role in learning and reasoning (Tolman, 1948; Ekstrom & Ranganath, 2018; Behrens et al., 2018) . Cognitive maps are characterized by neurons that activate sequentially as a function of the magnitude of the variable they encode. For instance, neurons called place cells activate sequentially as a function of spatial distance from some landmark (Moser et al., 2015; Muller, 1996; Sheehan et al., 2021) . Similarly, time cells activate sequentially as a function of elapsed time from some event (Pastalkova et al., 2008; MacDonald et al., 2011; Cruzado et al., 2020; Salz et al., 2016) . Similar sequential activity has also been observed for sound frequency (Aronov et al., 2017 ), probability (Knudsen & Wallis, 2021) and accumulated evidence (Nieh et al., 2021; Morcos & Harvey, 2016b) . For example, in the "accumulating towers task" Nieh et al. (2021) trained mice to move along a virtual track and observe objects (towers) on the left-and right-hand sides. When mice arrived at the end of the track, to receive a reward they had to turn left or right, depending on which side had more towers. The difference in the number of towers here corresponds to the amount of evidence for turning left vs. turning right. Nieh et al. ( 2021) recorded activity of hundreds of individual neurons from mice hippocampus, part of the brain commonly thought to play a key role in navigation in physical and abstract spaces (Bures et al., 1997; Eichenbaum, 2014; Moser et al., 2015) . The results indicated the existence of cells tuned to a particular difference in the number of towers, such that a population of neurons tiles the entire evidence axis (Nieh et al., 2021) (see also Morcos & Harvey (2016b) ). This provides valuable insight into how abstract variables are represented in the brain. Cognitive scientists have developed elaborate models of evidence accumulation to explain the response time in a variety of behavioral tasks (Laming, 1968; Link, 1975; Ratcliff, 1978) . These models hypothesize that the brain contains an internal variable that represents the progress towards the decision. A neural-level cognitive model proposed that the brain could implement this process using a framework based on the Laplace transform (Howard et al., 2018) . The Laplace framework gives rise to map-like representations and it has been successful in describing the emergence of sequentially activated time cells (Shankar & Howard, 2012) and place cells (Howard et al., 2014; Howard & Hasselmo, 2020) . Artificial neural networks (ANNs) are commonly thought to have a distributed representation that does not have a map-like structure. While ANNs excel in many domains, they still struggle at many tasks that humans find relatively simple. Unlike humans, ANNs typically require a large number of training examples and fail to generalize to examples that are outside the training distribution (Bengio, 2017; LeVine, 2017; Marcus, 2020) . Using cognitive models informed by neural data as an inductive bias for ANNs is an important direction that can help not only advance the current AI systems but also improve our understanding of cognitive mechanisms in the brain. Here we integrate the Laplace framework into reinforcement learning (RL) agents. The Laplace framework is based on recurrent neurons with analytically computed weights. We use the Laplace domain to generate a map-like representation of the amount of evidence. This representation is then fed into a trainable RL module based on the A2C architecture (Mnih et al., 2016) . We compare map-based agents to standard RL agents that use simple recurrent neural networks (RNNs) and Gated Recurrent Units (GRUs) (Chung et al., 2014) in terms of performance and similarity of the neural activity to neural activity recorded in the brain. Contributions of this work are as follows: • We integrated a cognitive model for evidence accumulation based on the Laplace transform into an RL agent. • We showed that symbolic operations in the Laplace domain give rise to individual neurons that are tuned to the magnitude of the evidence, just like neurons in neuroscience studies (Nieh et al., 2021; Morcos & Harvey, 2016a ). • We found that agents based on the Laplace framework learn faster and generalize better than agents based on commonly used RNNs. This indicates that RL agents were able to efficiently use the brain-like sequential representation of evidence. • We found that GRUs performed much better than RNNs, suggesting that gating plays an important role in constructing a neural representation of time-varying latent variables. This is consistent with the cognitive modeling work, which uses gating to convert a representation of elapsed time into a representation of accumulated evidence. Figure 1 : Schematic of the accumulating towers environment. In this simple example, two towers appeared on the right, and one tower appeared on the left, so the agent has to turn right once it reaches the end of the track. Each tower is encoded with a single pixel value.

