A STUDY OF BIOLOGICALLY PLAUSIBLE NEURAL NETWORK: THE ROLE AND INTERACTIONS OF BRAIN-INSPIRED MECHANISMS IN CONTINUAL LEARNING

Abstract

Humans excel at continually acquiring, consolidating, and retaining information from an ever-changing environment, whereas artificial neural networks (ANNs) exhibit catastrophic forgetting. There are considerable differences in the complexity of synapses, the processing of information, and the learning mechanisms in biological neural networks and their artificial counterpart, which may explain the mismatch in performance. We consider a biologically plausible framework that constitutes separate populations of exclusively excitatory and inhibitory neurons which adhere to Dale's principle, and the excitatory pyramidal neurons are augmented with dendritic-like structures for context-dependent processing of stimuli. We then conduct a comprehensive study on the role and interactions of different mechanisms inspired by the brain including sparse non-overlapping representations, Hebbian learning, synaptic consolidation, and replay of past activations that accompanied the learning event. Our study suggests that the employing of multiple complementary mechanisms in a biologically plausible architecture, similar to the brain, may be effective in enabling continual learning in ANNs.

1. INTRODUCTION

The human brain excels at continually learning from a dynamically changing environment whereas standard artificial neural networks (ANNs) are inherently designed for training from stationary i.i.d. data. Sequential learning of tasks in continual learning (CL) violates this strong assumption, resulting in catastrophic forgetting. Although ANNs are inspired by biological neurons (Fukushima, 1980) , they omit numerous details of design principles and learning mechanisms in the brain. These fundamental differences may account for the mismatch in performance and behavior. Biological neural networks are characterized by considerably more complex synapses and dynamic context-dependent processing of information. Also, the individual neurons have a specific role. Each presynaptic neuron has an exclusive excitatory or inhibitory impact on its postsynaptic partners, as postulated by Dale's principle (Strata et al., 1999) . Furthermore, distal dendritic segments in pyramidal neurons, which comprises the majority of excitatory cells in the neocortex, receive additional context information and enable context-dependent processing of information. This, in conjunction with inhibition, allows the network to learn task-specific patterns and avoid catastrophic forgetting (Yang et al., 2014; Iyer et al., 2021; Barron et al., 2017) . Furthermore, the replay of nonoverlapping and sparse neural activities of previous experiences in the neocortex and hippocampus is considered to play a critical role in memory formation, consolidation, and retrieval (Walker & Stickgold, 2004; McClelland et al., 1995) . To protect information from erasure, the brain employs synaptic consolidation in which plasticity rates are selectively reduced in proportion to strengthened synapses (Cichon & Gan, 2015) . The pyramidal cells are augmented with dendritic segments which receive an additional context signal c and the dendrite segment whose weights are most aligned with the context vector (bottom row) is selected to modulate the output activity of the feedforward neurons for context-dependent processing of information. (b) The Hebbian update step further strengthens the association between the context and the winning dendritic segment with maximum absolute value (indicated with darker shade for bottom row). Finally, Heterogeneous dropout keeps the activation count of each pyramidal cell (indicated with the gray shade) and drops the neurons which were most active for the previous task (darkest shade dropped) to enforce non-overlapping representations. The top-k remaining cells then project to the next layer (increased shade). tions of exclusively excitatory and inhibitory neurons in each layer which adheres to Dale's principle (Cornford et al., 2020) and the excitatory neurons (mimicking pyramidal cells) are augmented with dendrite-like structures for context-dependent processing of information (Iyer et al., 2021) . Dendritic segments process an additional context signal encoding task information and subsequently modulate the feedforward activity of the excitatory neuron (Figure 1 ). We then systematically study the effect of controlling the overlap in representations, employing the "fire together, wire together" learning paradigm and employing experience replay and synaptic consolidation. Our empirical study shows that: i. An ANN architecture equipped with context-dependent processing of information by dendrites and separate populations of excitatory pyramidal and inhibitory neurons adhering to Dale's principle can learn effectively in CL setup. ii. Enforcing different levels of activation sparsity in the hidden layers using k-winner-take-all activations and employing a complementary dropout mechanism that encourages the model to use a different set of active neurons for each task can effectively control the overlap in representations, and hence reduce interference. iii. Task similarities need to be considered when enforcing such constraints to allow for a balance between forwarding transfer and interference. iv. Mimicking the ubiquitous "fire together, wire together" learning rule in the brain through a Hebbian update step on the connection between context signal and dendritic segments, which further strengthens context gating and facilitates the formation of task-specific subnetworks. v. Synaptic consolidation by utilizing Synaptic Intelligence (Zenke et al., 2017) with importance measures adjusted to take into account the discrepancy in the effect of weight changes in excitatory and inhibitory neurons further reduces forgetting. vi. Replaying the activations of previous tasks in a context-specific manner is critical for consolidating information across different tasks, especially for the challenging Class-IL setting.



We will make the code available upon acceptance.



Figure1: Architecture of one hidden layer in the biologically plausible framework. Each layer consists of separate populations of exclusively excitatory pyramidal cells and inhibitory neurons which adheres to Dale's principle. The shade indicates the strength of weights or activations where darker shade indicating higher value. (a) The pyramidal cells are augmented with dendritic segments which receive an additional context signal c and the dendrite segment whose weights are most aligned with the context vector (bottom row) is selected to modulate the output activity of the feedforward neurons for context-dependent processing of information. (b) The Hebbian update step further strengthens the association between the context and the winning dendritic segment with maximum absolute value (indicated with darker shade for bottom row). Finally, Heterogeneous dropout keeps the activation count of each pyramidal cell (indicated with the gray shade) and drops the neurons which were most active for the previous task (darkest shade dropped) to enforce non-overlapping representations. The top-k remaining cells then project to the next layer (increased shade).

