REPRESENTING LATENT DIMENSIONS USING COM-PRESSED NUMBER LINES

Abstract

Humans use log-compressed number lines to represent different quantities, including elapsed time, traveled distance, numerosity, sound frequency, etc. Inspired by recent cognitive science and computational neuroscience work, we developed a neural network that learns to construct log-compressed number lines. The network computes a discrete approximation of a real-domain Laplace transform using an RNN with analytically derived weights giving rise to a log-compressed timeline of the past. The network learns to extract latent variables from the input and uses them for global modulation of the recurrent weights turning a timeline into a number line over relevant dimensions. The number line representation greatly simplifies learning on a set of problems that require learning associations in different spaces -problems that humans can typically solve easily. This approach illustrates how combining deep learning with cognitive models can result in systems that learn to represent latent variables in a brain-like manner and exhibit human-like behavior manifested through Weber-Fechner law.

1. INTRODUCTION

The human ability to map sensory inputs onto number lines is critical for rapid learning, reasoning, and generalizing. Recordings of activity from individual neurons in mammalian brains suggest a particular form of representation that could give rise to mental number lines over different variables. For instance, the presentation of a salient stimulus to an animal triggers sequential activation of neurons called time cells which are characterized by temporally tuned unimodal basis functions (MacDonald et al., 2011; Tiganj et al., 2017; Eichenbaum, 2014) . Each time cell reaches its peak activity at a particular time after the onset of some salient stimulus. Together, a population of time cells constitutes a temporal number line or a timeline of the stimulus history (Howard et al., 2015; Tiganj et al., 2018) . Similarly, as animals navigate spatial environments neurons called place cells exhibit spatially tuned unimodal basis functions (Moser et al., 2008) . A population of place cells constitutes a spatial number line that can be used for navigation (Bures et al., 1997; Banino et al., 2018) . The same computational strategy seems to be used to represent other variables as well, including numerosity (Nieder & Miller, 2003) , integrated evidence (Morcos & Harvey, 2016) , pitch of tones (Aronov et al., 2017) , and conjunctions of these variables (Nieh et al., 2021) . Critically, many of these "neural number lines" appear to be log-compressed (Cao et al., 2021; Nieder & Miller, 2003) , providing a natural account of the Weber-Fechner law observed in psychophysics (Chater & Brown, 2008; Fechner, 1860 Fechner, /1912 ). Here we present a method by which deep neural networks can construct continuous, log-compressed number lines of latent task-relevant dimensions. Modern deep neural networks are excellent function approximators that learn in a distributed manner: weights are adjusted individually for each neuron. Neural activity in the brain suggests a representation where a population of neurons together encodes a distribution over a latent variable in the form of a number line. In other words, a latent variable is not represented as a scalar (e.g., a count of objects could be encoded with a single neuron with a firing rate proportional to the count), but as a function supported by a population of neurons, each tuned to a particular magnitude of the latent variable. To build deep neural networks with this property, we use global modulation such that recurrent weights of a population of cells are adjusted simultaneously. We show that this gives rise to the log-compressed number lines and can greatly facilitate associative learning in the latent space. Inspired by experiments on animals, we conduct experiments where a neural network learns number lines for spatial distance and count of objects appearing over time. We designed an experimental setup where the network needs to predict when a target event will happen. In our experiments, time to the target event depends either on the elapsed time, traveled spatial distance, or the count of how many times some object appears in the input. Critically, just like in the experiments with animals, these variables are not directly observable from the inputs -they are hidden and have to be learned from the spatiotemporal dynamics of the input. For example, if the target event will happen after some object appears a certain number of times, the network needs to learn to identify the object and learn the correct number of appearances (Experiment 3; see also an illustration in Fig. 5 ). Similarly, people can estimate distance when riding a bicycle using their motor outputs and sensory inputswe can learn a non-linear mapping of motor outputs and sensory inputs onto velocity and integrate velocity to estimate distance (this concept is an inspiration for Experiments 2a and 2b; see also an illustration in Fig. 4 ). Building on models from computational and cognitive neuroscience (Shankar & Howard, 2012; Howard et al., 2014; 2018) , we propose a neural network architecture that gives rise to a number line supported by unimodal basis functions. The network is composed of three layers. The input is fed into a recurrent layer with the weights analytically computed to approximate the real-domain Laplace transform of the input. Critically, we use properties of the Laplace domain and apply global modulation to the recurrent weights to convert functions of time into functions of other variables, such as distance or count. The output of the Laplace layer is mapped through a linear layer with analytically computed weights implementing the inverse Laplace transform. The inverse gives rise to a log-compressed number line supported with unimodal basis functions. Depending on the modulatory signal, this population can resemble, for instance, time cells, place cells or count cells. The output is then mapped through a trainable dense layer with a sigmoid activation function to the network's output. This approach augments the capabilities of cognitive models, which typically rely on handcrafted features enabling them to learn latent variables. At the same time, the structure and properties of the cognitive model are preserved, allowing the resulting system to have strong explanatory power of neural activity in the brain and behavioral data. To the best of our knowledge, this work is the first time that the Laplace transform and its inverse have been implemented as a part of a neural network trainable with error backpropagation -in our experiments, the gradient flows through the Laplace and inverse Laplace transform. Global modulation of analytically computed recurrent weights is also a novel approach to learning in Recurrent Neural Networks (RNNs). The Laplace transform uses a diagonal weight matrix which makes it robust to problems of exploding and vanishing gradients related to backpropagation through time. These problems have been somewhat reduced with gated networks, such as Long Short-Term Memory (LSTM) (Greff et al., 2016; Hochreiter & Schmidhuber, 1997) and Gated Recurrent Units (GRU) (Chung et al., 2014) . Recent approaches bounded the gradients, such as Coupled Oscillatory RNN (coRNN) (Rusch & Mishra, 2020) or used formalism that does not require learning of the recurrent weights, such as Lagrange Memory Unit (LMU) (Voelker et al., 2019) . Echo state networks (Jaeger, 2001) used RNNs with non-trainable, fixed recurrent weights but without global modulation. Similarly, the multiscale temporal structure approach used fixed weights with a spectrum of time constants (Mozer, 1992) .

2. METHODS

We first describe the construction of a log-compressed timeline using an approximation of the realdomain Laplace and the inverse Laplace transform. Then we describe how the timeline can be turned into a more general number line via global weight-modulation in the Laplace domain.

2.1. CONTINUOUS-TIME FORMULATION OF LAPLACE AND INVERSE LAPLACE TRANSFORM

Given a one-dimensional input signal f (t), we define a modified version of the Laplace transform F (s; t): F (s; t) = t 0 e -s(t-t ′ ) f (t ′ )dt ′ . (1) This modified version differs from the standard Laplace transform only in variable s: instead of s being a complex value composed of the real and imaginary part, we use real and positive s. This

