UNIFIED NEURAL REPRESENTATION MODEL FOR PHYSICAL SPACE AND LINGUISTIC CONCEPTS

Abstract

The spatial processing system of the brain uses grid-like neural representations (grid cells) for supporting vector-based navigation. Experiments also suggest that neural representations for concepts (concept cells) exist in the human brain, and conceptual inference relies on navigation in conceptual spaces. We propose a unified model called "disentangled successor information (DSI)" that explains neural representations for physical space and linguistic concepts. DSI generates grid-like representations in a 2-dimensional space that highly resemble those observed in the brain. Moreover, the same model creates concept-specific representations from linguistic inputs, corresponding to concept cells. Mathematically, DSI vectors approximate value functions for navigation and word vectors obtained by word embedding methods, thus enabling both spatial navigation and conceptual inference based on vector-based calculation. Our results suggest that representations for space and concepts can emerge from a shared mechanism in the human brain.

1. INTRODUCTION

In the brain, grid cells in the entorhinal cortex (EC) represent the space by grid-like representations (Hafting et al., 2005; Doeller et al., 2010; Jacobs et al., 2013) . This neural representation is often related to vector-based spatial navigation because grid cells provide global metric over the space. Theoretically, an animal can estimate the direction to a goal when representations of a current position and a goal position are given (Fiete et al., 2008; Bush et al., 2015) . Furthermore, self-position can be estimated by integrating self-motions when sensory information is not available (McNaughton et al., 2006) . These functions are the basis of robust spatial navigation by animals. There are not only spatial but also conceptual representations in EC. Neurons called as "concept cells" have been found in human medial temporal lobe including EC (Quiroga, 2012; Reber et al., 2019) . Concept cells respond to specific concepts, namely, stimuli related to a specific person, a famous place, or a specific category like "foods" and "clothes". Furthermore, recent experiments also suggest that grid-like representations appear not only for physical space but also for conceptual space if there is a 2-dimensional structure (e.g. lengths of a neck and legs, intensity of two odors), and those representations are the basis of vector-based conceptual inference (Bao et al., 2019; Constantinescu et al., 2016; Park et al., 2021) . Thus, it is expected that there is a shared processing mechanism for physical and conceptual spaces in EC. Existence of shared neural mechanism may also explain why humans use sense of physical space (such as directionality) to communicate abstract concepts (conceptual metaphor (Lakoff & Johnson, 1980) ). However, a principle behind such universal computation in the brain is still unclear. In this paper, we propose a representation model which we call disentangled successor information (DSI) model. DSI is an extension of successor representation (SR), which stems from a theory of reinforcement learning and became one of promising computational models of the hippocampus and EC (Dayan, 1993; Stachenfeld et al., 2017; Momennejad et al., 2017; Momennejad, 2020) . Like eigenvectors of SR, DSI forms grid-like codes in a 2-D space, and those representations support vector-based spatial navigation because DSI approximates value functions for navigation in the framework of linear reinforcement learning (Todorov, 2006; 2009; Piray & Daw, 2021) . Remarkably, when we apply DSI to text data by regarding a sequence of words as a sequence of states, DSI forms concept-specific representations like concept cells. Furthermore, we show mathematical correspondence between DSI and word embedding models in natural language processing (NLP) (Mikolov et al., 2013a; b; Pennington et al., 2014; Levy & Goldberg, 2014) , thus we can perform intuitive vector-based conceptual inference as in those models. Our model reveals a new theoretical relationship between spatial and linguistic representation learning, and suggests a hypothesis that there is a shared computational principle behind grid-like and concept-specific representations in the hippocampal system.

2. CONTRIBUTIONS AND RELATED WORKS

We summarize contributions of this work as follows. (1) We extended SR to successor information (SI), by which we theoretically connected reinforcement learning and word embedding, thus spatial navigation and conceptual inference. (2) We found that dimension reduction with constraints for grid-like representations (decorrelative NMF) generates disentangled word vectors with conceptspecific units, which has not been found previously. (3) Combining these results, we demonstrated that a computational model for grid cells can be extended to represent and compute linguistic concepts in an intuitive and biologically plausible manner, which has not been shown in previous studies. Our model is an extension of successor representation (SR), which is recently viewed as a plausible model of hippocampus and EC (Dayan, 1993; Stachenfeld et al., 2017; Momennejad et al., 2017; Momennejad, 2020) . Furthermore, default representation (DR), which is based on linear reinforcement learning theory, has been also proposed as a model of EC (Piray & Daw, 2021) . We show that our model can extract linguistic concepts, which has not been shown for SR and DR. Furthermore, we demonstrate vector-based compositionality of words in our model, which expands the range of compositionality of EC representations (Piray & Daw, 2021) to semantic processing. Our model produces biologically plausible grid-like representations in 2-D space, which supports spatial navigation. Previous studies have revealed that non-negative and orthogonal constraints are important to obtain realistic grid-like representations (Dordek et al., 2016; Sorscher et al., 2019) . Furthermore, recurrent neural networks form grid-like representations through learning path integration, and those representations support efficient spatial navigation (Banino et al., 2018; Cueva & Wei, 2018; Gao et al., 2019) . Some of those models have reproduced experimentally observed scaling ratios between grid cell modules (Banino et al., 2018; Sorscher et al., 2019) . However, previous models have not been applied to learning of linguistic concepts, or other complex conceptual spaces in real-world data. Whittington et al. (2020) proposed a unified model for spatial and nonspatial cognition. However, their model was applied only to simple graph structures and conceptual specificity like our model was not observed. Analogical inference by our model is a same function as word embedding methods in NLP (Mikolov et al., 2013a; b; Pennington et al., 2014; Levy & Goldberg, 2014) . However, a unique feature of DSI representations is that each dimension of vectors corresponds to a specific concept like concept cells in the human brain (Quiroga, 2012; Reber et al., 2019) . Our model provides biological plausible interpretation of word embedding: each word is represented by combination of disentangled conceptual units, inference is recombination of those concepts, and such representations emerge through the same constraints with grid cells. It was recently shown that transformer-based models (Vaswani et al., 2017; Brown et al., 2020) , which are currently state-of-the-art models in NLP, generate grid-like representations when applied to spatial learning (Whittington et al., 2022) . Similarly to our model, this finding implies the relationship between spatial and linguistic processing in the brain. However, concept-specific representations has not been found in such model. Furthermore, clear theoretical interpretation in this study depends on the analytical solution for skip-gram (Levy & Goldberg, 2014) . Such analytical solution is currently unknown for transformer-based models.

3.1. DISENTANGLED SUCCESSOR INFORMATION

Let us assume N s discrete states exist in the environment. Successor representation (SR) between two states s and s ′ is defined as SR(s, s ′ ) = E ∞ t=0 γ t δ(s t , s ′ )|s 0 = s = ∞ t=0 γ t P (s t = s ′ |s 0 = s),

