PREDICTION AND GENERALISATION OVER DIRECTED ACTIONS BY GRID CELLS

Abstract

Knowing how the effects of directed actions generalise to new situations (e.g. moving North, South, East and West, or turning left, right, etc.) is key to rapid generalisation across new situations. Markovian tasks can be characterised by a state space and a transition matrix and recent work has proposed that neural grid codes provide an efficient representation of the state space, as eigenvectors of a transition matrix reflecting diffusion across states, that allows efficient prediction of future state distributions. Here we extend the eigenbasis prediction model, utilising tools from Fourier analysis, to prediction over arbitrary translation-invariant directed transition structures (i.e. displacement and diffusion), showing that a single set of eigenvectors can support predictions over arbitrary directed actions via action-specific eigenvalues. We show how to define a "sense of direction" to combine actions to reach a target state (ignoring task-specific deviations from translation-invariance), and demonstrate that adding the Fourier representations to a deep Q network aids policy learning in continuous control tasks. We show the equivalence between the generalised prediction framework and traditional models of grid cell firing driven by self-motion to perform path integration, either using oscillatory interference (via Fourier components as velocity-controlled oscillators) or continuous attractor networks (via analysis of the update dynamics). We thus provide a unifying framework for the role of the grid system in predictive planning, sense of direction and path integration: supporting generalisable inference over directed actions across different tasks.

1. INTRODUCTION

A "cognitive map" encodes relations between objects and supports flexible planning (Tolman [40] ), with hippocampal place cells and entorhinal cortical grid cells thought to instantiate such a map (O'Keefe and Dostrovsky [32] ; Hafting et al. [20] ). Each place cell fires when the animal is near a specific location, whereas each grid cell fires periodically when the animal enters a number of locations arranged in a triangular grid across the environment. Together, this system could support representation and flexible planning in state spaces where common transition structure is preserved across states and tasks, affording generalisation and inference, e.g., in spatial navigation where Euclidean transition rules are ubiquitous (Whittington et al. [43] ). Recent work suggests that place cell firing provides a local representation of state occupancy, while grid cells comprise an eigenbasis of place cell firing covariance (Dordek et al. [15] ; Stachenfeld et al. [38] ; Sorscher et al. [37] ; Kropff and Treves [26] ). Accordingly, grid cell firing patterns could be learned as eigenvectors of a symmetric (diffusive) transition matrix over state space, providing a basis set enabling prediction of occupancy distributions over future states. This "intuitive planning" operates by replacing multiplication of state representations by the transition matrix with multiplication of each basis vector by the corresponding eigenvalue (Baram et al. [2] ; Corneil and Gerstner [13] ). Thus a distribution over state space represented as a weighted sum of eigenvectors can be updated by re-weighting each eigenvector by its eigenvalue to predict future state occupancy. Fast prediction and inference of the common effects of actions across different environments is important for survival. Intuitive planning, in its original form, supports such ability under a single transition structure, most often corresponding to symmetrical diffusion (Baram et al. [2] ). Here we show that a single (Fourier) eigenbasis allows representation and prediction under the many different directed transition structures corresponding to different "translation invariant" actions (whose effects are the same across states, such as moving North or South or left or right in an open environment), with predictions under different actions achieved by action-specific eigenvalues. We define a "sense of direction" quantity, i.e., the optimal combinations of directed actions that most likely lead to the goal, based on the underlying translation-invariant transition structure (e.g., ignoring local obstacles). We then show how this method could be adapted to support planning in tasks that violate translation invariance (e.g. with local obstacles), and show how adding these Fourier representations to a deep RL network improves performance in a continuous control task. We propose that the medial entorhinal grid cells support this planning function, as linear combinations of Fourier eigenvectors and therefore eigenvectors themselves, and show how traditional models of grid cells performing path integration are consistent with prediction under directed actions. Hence we demonstrate that the proposed spectral model acts as a unifying theoretical framework for understanding grid cell firing.

2. "INTUITIVE PLANNING" WITH A SINGLE TRANSITION STRUCTURE

Intuitive planning represents the occupancy distribution over the state space as a weighted sum of the eigenvectors of a single transition matrix (usually corresponding to symmetric diffusion), so that the effect of one step of the transition dynamics on the distribution can be predicted by reweighting each of the eigenvectors by the corresponding eigenvalue. And this generalises to calculating the cumulative effect of discounted future transitions (Baram et al. [2] ). Specifically, consider a transition matrix, T ∈ R N ×N , T ss = P(s t+1 = s |s t = s) where s t encodes the state at time t and N is the number of states. Then, T n is the n-step transition matrix, and has the same set of eigenvectors as T . Specifically, the eigendecomposition of T and T n are: T = QΛQ -1 , T n = QΛ n Q -1 (1) where each column of the matrix Q is an eigenvector of T and Λ = diag(σ P (T )), where σ P (T ) is the set of eigenvalues of T . Similarly, any polynomial in T , p(T ), shares the same set of eigenvectors as T and the set of eigenvalues σ P (p(T )) = p(σ P (T )). Hence: ∞ k=0 (γT ) k = (I -γT ) -1 = Qdiag(w)Q -1 , where w = 1 1 -γλ , for λ ∈ σ P (T ) The resolvent form (Eq. 2) is an infinite discounted summation of transitions, which under a policy and transition structure corresponding to diffusion, is equivalent to the successor representation (SR, Fig. 1E ) with discounting factor γ (Dayan [14]; Stachenfeld et al. [38] ). See Mahadevan and Maggioni [29] for a related spectral approach using Fourier decomposition of T for estimating the value function. The SR has been shown to be useful for navigation via gradient ascent of the future probability of occupying the target state, and has a linear relationship with the true underlying Euclidean distances in spatial tasks (hence "intuitive planning", see Fig. 1 and Fig. 2D-E ). The eigenvectors of the diffusion transition matrix generally show grid-like patterns, suggesting a close relationship to grid cells. However, intuitive planning is restricted to predictions over a



Figure 1: Demonstration of intuitive planning on a diffusive transition task on a 1D ring track. A: Example transition matrix; B: P(s t+1 = s |s t = 5); C, D: same are shown for T 3 , showing predicted distribution over the next three time steps; E the resolvent form/SR (Eq. 2) computed from the eigenbasis of the transition matrix; F: SR values for state 5, which can used for navigation.

