CORTICO-CEREBELLAR NETWORKS AS DECOUPLED NEURAL INTERFACES

Abstract

The brain solves the credit assignment problem remarkably well. For credit to be correctly assigned across multiple cortical areas a given area should, in principle, wait for others to finish their computation. How the brain deals with this locking problem has remained unclear. Deep learning methods suffer from similar locking constraints both on the forward and backward phase. Recently, decoupled neural interfaces (DNI) were introduced as a solution to the forward and backward locking problems. Here we propose that a specialised brain region, the cerebellum, helps the cerebral cortex solve the locking problem closely matching the computations and architecture of DNI. In particular, we propose that classical cerebellar forward and inverse models are equivalent to solving the backward and forward locking problems, respectively. To demonstrate the potential of this framework we focus on modelling a given brain area as a recurrent neural network in which the cerebellum approximates temporal feedback signals as provided by BPTT. We tested the cortico-cerebellar-DNI (CC-DNI) model in a range of sensorimotor and cognitive tasks that have been shown to be cerebellar-dependent. First, we show that the CC-DNI unlocking mechanisms can facilitate learning in a simple target reaching task. Next, by building on the sequential MNIST task we demonstrate that these results generalise to more complex sensorimotor tasks. Our cortico-cerebellar model readily applies to a wider range of modalities, to demonstrate this we tested the model in a cognitive task, caption generation. Models without the cerebellar-DNI component exhibit deficits similar to those observed in cerebellar patients in both motor and cognitive tasks. Moreover, we used CC-DNI to generate a set of specific neuroscience predictions. Finally, we introduce a CC-DNI model with highly sparse connectivity as observed in the cerebellum, which substantially reduces the number of parameters while improving learning through decorrelation. Overall, our work offers a novel perspective on the cerebellum as a brain-wide decoupling machine for efficient credit assignment and opens a new avenue of research between deep learning and neuroscience.

1. INTRODUCTION

Efficient credit assignment in the brain is a critical part of learning. However, how the brain solves the credit assignment problem remains a mystery. One of the central issues of credit assignment across multiple stages of processing is the need to wait for previous stages to finish their computation before others can proceed (Rumelhart et al., 1986; Schmidhuber, 1990; Lee et al., 2015; Marblestone et al., 2016; Jaderberg et al., 2017) . In deep artificial neural networks these constraints are explicit. During the forward phase a given layer has to wait for all its previous layers to finish before it can proceed, a constraint known as the forward lock. Similarly, during the backward phase a given layer has to wait for all the layers above to finish computing its gradients -backward lock. Recently, a framework was introduced to decouple artificial neural networks -decoupled neural interfaces (DNI; (Jaderberg et al., 2017 ))foot_0 , effectively breaking forward and/or backward locks. Here, we propose that a specialised brain area, the cerebellum, performs a similar role in the brain. In the classical view the cerebellum is key for fine motor control and learning by constructing internal models of behaviour. (Marr, 1969; Albus, 1971; Raymond and Medina, 2018; Wolpert et al., 1998; Miall et al., 1993) . More recently, however, the idea that the cerebellum is also involved in cognition has gained significant traction (Schmahmann et al., 2019; Wagner and Luo, 2020; Brissenden and Somers, 2019 ). An increasing body of behavioural, anatomical and imaging studies points to a role of the cerebellum in cognition in humans and non-human primates (Schmahmann et al., 2019; Brissenden and Somers, 2019; Guell et al., 2015; 2018) . Impairments in cerebellar patients occur across a range of tasks including language (Guell et al., 2015) , working memory (Deverett et al., 2019 ), planning (Baker et al., 1996 ), and others (Fiez et al., 1992) . These observations suggest that the cerebellum implements a universal function across the brain (Marr, 1969; Albus, 1971; Raymond and Medina, 2018; Diedrichsen et al., 2019) . Moreover, experimental studies looking at corticocerebellar interactions have demonstrated that cerebellar output is crucial for maintaining neocortical representations in order to drive behaviour (Chabrol et al., 2019; Gao et al., 2018) . However, to the best of our knowledge, no theoretical framework has considered what might be the function of such interactions between the cerebellum and cortical areas. In an attempt to reduce the existing gap between experimental observations and existing computational approaches we introduce DNI as a cortico-cerebellar model -cortico-cerebellar DNI (CC-DNI). Consistent with the cerebellar universal role we theorise that the cerebellum serves to break the locks inherent to both feedforward and feedback information processing in the brain, akin to DNI. In particular, we posit that the two classical internal models of the cerebellum, forward and inverse models, are equivalent to DNI-mediated unlocking of feedback (gradients) and feedforward communication, respectively. Following this view the cerebellum not only provides motor or sensory estimates, but also any other modality encoded by a particular brain region. Inspired by neuroscientific studies, we test our model on sensorimotor tasks: (i) a target reaching task (Sanes et al., 1990; Butcher et al., 2017; Nashef et al., 2019) and (ii) a set of more complex temporal tasks based on the MNIST dataset, but also (iii) on a cognitive task -caption generation (Guell et al., 2015) . Our results support the cortico-cerebellar DNI models we study and show that they generally speed up learning by unlocking the main network, qualitatively consistent with a wide range of behavioural observations (Guell et al., 2015; Sanes et al., 1990; Butcher et al., 2017; Nashef et al., 2019) . Two defining features of the cerebellum are the large expansion at the granule cell input layer with 50 billion neurons (the most numerous cell in the brain) and the highly sparse connectivity (each granule cell receives ∼ 4 synapses) (Sanger et al., 2020) . These observations have been long suggested to help speed-up learning in the cerebellum through decorrelation (Albus, 1971; Sanger et al., 2020; Cayco-Gajic et al., 2017) . Building on these studies we introduce a new DNI model, sparse CC-DNI. Consistent with classical cerebellar models (Albus, 1971; Cayco-Gajic et al., 2017) we show that input sparsity can improve learning in the presence of high correlations. We finish with a discussion on the implications and predictions of this new brain-wide model of the cerebellum.

2. CEREBELLUM AS A DECOUPLING MACHINE

We first describe DNIs following Jaderberg et al. ( 2017) and then establish the link to corticocerebellar networks. Assume that a feedforward neural network consists of N layers, with the ith layer (1 ≤ i ≤ N ) performing a "computational step" f i with parameters θ i . Given input x at layer 1, the output of the network at its final layer is therefore given by f N (f N -1 (. . . f 2 (f 1 (x)) . . . )). We use F j i to denote the composition of steps from layer i to layer j (inclusively). Finally, let h i denote the (hidden) activity at layer i, so that h i = f i (h i-1 ) with h 0 = x. To illustrate the locking constraints of standard artificial neural networks used in deep learning suppose that a network is in the process of learning via backpropogation, with current input-target pair (x, y targ ). To update the layer parameters θ i the gradient ∂L ∂θi is required, where L = L(y, y targ ) is the loss which compares the target value against the model output y = F N 1 (x) under some loss function L; we then apply gradient descent on the parameters, θ i ← θ i -α ∂L ∂θi , with learning rate α > 0. Suppose however that the network has only recently received the input x and is currently only at module i of the forward computation. In order to update the corresponding parameters of that layer, θ i , the layer must first wait for all remaining layers to finish f j (j > i) for the loss to be computed. Only then the various gradients of the loss are backpropogated and ∂L ∂θi is finally available. These two characteristics of backpropogation make layer i "backward locked" to F j i+1 , enforcing a strong dependence of the layer's learning to the speed of forward and backward propagation through



DNIs are related to earlier work on using network critics to train neural networks(Schmidhuber, 1990).

