NOISE TRANSFORMS FEED-FORWARD NETWORKS INTO SPARSE CODING NETWORKS

Abstract

A hallmark of biological neural networks, which distinguishes them from their artificial counterparts, is the high degree of sparsity in their activations. Here, we show that by simply injecting symmetric, random, noise during training in reconstruction or classification tasks, artificial neural networks with ReLU activation functions eliminate this difference; the neurons converge to a sparse coding solution where only a small fraction are active for any input. The resulting network learns receptive fields like those of primary visual cortex and remains sparse even when noise is removed in later stages of learning.

1. INTRODUCTION

The brain is highly sparse with an estimated 15% of neurons firing at any given time (Attwell & Laughlin, 2001) . The most immediate answer for why is metabolic efficiency: action potentials consume ∼20% of the brain's energy (Sterling & Laughlin, 2015; Attwell & Laughlin, 2001; Sengupta et al., 2010) . However, there are further advantages to sparsity in the brain (Olshausen & Field, 2004) . One significant advantage is improving the signal to noise ratio (SNR) of neural signals. Sparsity improves SNR by (i) turning off any weakly firing neurons activated by noise; (ii) increasing the separability of data points (Ahmad & Scheinkman, 2019; Xie et al., 2022) . Inhibitory interneurons that suppress all but the most active neurons from firing are an important mechanism for enforcing this sparsity (Haider et al., 2010) . Theoretical and empirical results support their involvement in both silencing noise and separating neural representations. Examples include horizontal interneurons in the retina (Sterling & Laughlin, 2015) and Golgi interneurons in cerebellar-like structures (Fleming et al., 2022; Lin et al., 2014; Xie et al., 2022) . Biologically, as depicted in Fig. 1 , these inhibitory interneurons implement a negative feedback loop whereby the more active the excitatory neurons are, the more active the interneuron becomes, and hence the more it inhibits these excitatory neurons. Simplified models of the circuit written as ordinary differential equations (ODEs) show convergence to an approximate k of the most active neurons remaining on (Gozel & Gerstner, 2021 ). This will be referred to as a Top-K activation function (also known as k Winners Take All). There is empirical support for a number of interneuron circuits approximating the Top-K operation (Sterling & Laughlin, 2015; Fleming et al., 2022; Lin et al., 2014) . By contrast, in the field of deep learning, while inhibition is possible, analogous interneuron circuits that enforce sparsity across a layer have not been widely adopted. The only truly sparse activation function is the ReLU (Glorot et al., 2011) . Moreover, mechanisms that enforce sparse neuronal activity are also rarely used and when given the choice, networks will prefer to be dense. This is because sparsity can limit model capacity, resulting in information bottlenecks that harm performance (Goodfellow et al., 2015) . Here, we find that by simply introducing isotropic, symmetric noise centered about zero during training, a layer of artificial neurons will converge to a sparse coding solution. This solution mimicks a simplified version of the biological inhibitory interneuron circuit. The network gradually implementing this inhibitory interneuron also results in better performance than explicitly enforcing any sort of inhibition at the start of training. Concretely, the network synchronizes every neuron's bias term, setting them to be approximately the same negative value, and also every neuron's weight vector, setting them to have the same L 2 norm. 1

