BRAIN-LIKE APPROACHES TO UNSUPERVISED LEARNING OF HIDDEN REPRESENTATIONS -A COMPARATIVE STUDY

Abstract

Unsupervised learning of hidden representations has been one of the most vibrant research directions in machine learning in recent years. In this work we study the brain-like Bayesian Confidence Propagating Neural Network (BCPNN) model, recently extended to extract sparse distributed high-dimensional representations. The saliency and separability of the hidden representations when trained on MNIST dataset is studied using an external linear classifier and compared with other unsupervised learning methods that include restricted Boltzmann machines and autoencoders.

1. INTRODUCTION

Artificial neural networks have made remarkable progress in supervised pattern recognition in recent years. In particular, deep neural networks have dominated the field largely due to their capability to discover hierarchies of salient data representations. However, most recent deep learning methods rely extensively on supervised learning from labelled samples for extracting and tuning data representations. Given the abundance of unlabeled data there is an urgent demand for unsupervised or semi-supervised approaches to learning of hidden representations (Bengio et al., 2013) . Although early concepts of greedy layer-wise pretraining allow for exploiting unlabeled data, ultimately the application of deep pre-trained networks to pattern recognition problems rests on label dependent end-to-end weight fine tuning (Erhan et al., 2009) . At the same time, we observe a surge of interest in more brain plausible networks for unsupervised and semi-supervised learning problems that build on some fundamental principles of neural information processing in the brain (Pehlevan & Chklovskii, 2019; Illing et al., 2019) . Most importantly, these brain-like computing approaches rely on local learning rules and label independent biologically compatible mechanisms to build data representations whereas deep learning methods predominantly make use of error back-propagation (backprop) for learning the weights. Although efficient, backprop has several issues that make it an unlikely candidate model for synaptic plasticity in the brain. The most apparent issue is that the synaptic connection strength between two biological neurons is expected to comply with Hebb's postulate, i.e. to depend only on the available local information provided by the activities of preand postsynaptic neurons. This is violated in backprop since synaptic weight updates need gradient signals to be communicated from distant output layers. Please refer to (Whittington & Bogacz, 2019; Lillicrap et al., 2020) for a detailed review of possible biologically plausible implementations of and alternatives to backprop. In this work we utilize the MNIST dataset to compare two classical learning systems, the autoencoder (AE) and the restricted Boltzmann machine (RBM), with two brain-like approaches to unsupervised learning of hidden representations, i.e. the recently proposed model by Krotov and Hopfield (referred to as the KH model) (Krotov & Hopfield, 2019) , and the BCPNN model (Ravichandran et al., 2020) , which both rely on biologically plausible learning strategies. In particular, we qualitatively examine the extracted hidden representations and quantify their label dependent separability using a simple linear classifier on top of all the networks under investigation. This classification step is not part of the learning strategy, and we use it merely to evaluate the resulting representations. Special emphasis is on the feedforward BCPNN model with a single hidden layer, which frames the update and learning steps of the neural network as probabilistic computations. Probabilistic ap-proaches are widely used in both deep learning models (Goodfellow et al., 2016) and computational models of brain function (Doya et al., 2007) . One disadvantage of probabilistic models is that exact inference and learning on distributed representations is often intractable and forces approximate approaches like sampling-based or variational methods (Rezende et al., 2014) . In this work, we adopt a modular BCPNN architecture, previously used in abstract models of associative memory (Sandberg et al., 2002; Lansner et al., 2009) , action selection (Berthet et al., 2012) , and in application to brain imaging (Benjaminsson et al., 2010; Schain et al., 2013) and data mining (Orre et al., 2000) . Spiking versions of BCPNN have also been used in biologically detailed models of different forms of cortical associative memory (Lundqvist et al., 2011; Fiebig & Lansner, 2017; Tully et al., 2014) . The modules in BCPNN, referred to as hypercolumns (HCs), comprise a set of functional minicolumns (MCs) that compete in a soft-winner-take-all manner. The abstract view of a HC in this abstract cortical-like network is that it represents some attribute, e.g. edge orientation, in a discrete coded manner. A minicolumn comprises a unit that conceptually represents one discrete value (a realization of the given attribute) and, as a biological parallel, it accounts for a local subnetwork of around a hundred recurrently connected neurons with similar receptive field properties (Mountcastle, 1997) . Such an architecture was initially generalized from the primary visual cortex, but today has more support also from later experimental work and has been featured in spiking computational models of cortex (Rockland, 2010; Lansner, 2009) . Finally, in this work we highlight additional mechanisms of bias regulation and structural plasticity, introduced recently to the BCPNN framework (Ravichandran et al., 2020) , which enable unsupervised learning of hidden representations. The bias regulation mechanism ensures that the activities of all units in the hidden layer are maintained near their target activity by regulating their bias parameter. Structural plasticity learns a set of sparse connections from the input layer to hidden layer by maximizing a local greedy information theoretic score.

2. RELATED WORKS

A popular unsupervised learning approach is to train a hidden layer to reproduce the input data as, for example, in AE and RBM. The AE and RBM networks trained with a single hidden layer are relevant here since learning weights of the input-to-hidden-layer connections relies on local gradients, and the representations can be stacked on top of each other to extract hierarchical features. However, stacked autoencoders and deep belief nets (stacked RBMs) have typically been used for pre-training procedures followed by end-to-end supervised fine-tuning (using backprop) (Erhan et al., 2009) . The recently proposed KH model (Krotov & Hopfield, 2019) addresses the problem of learning solely with local gradients by learning hidden representations only using an unsupervised method. In this network the input-to-hidden connections are trained and additional (non-plastic) lateral inhibition provides competition within the hidden layer. For evaluating the representation, the weights are frozen, and a linear classifier trained with labels is used for the final classification. Our approach shares some common features with the KH model, e.g. learning hidden representations solely by unsupervised methods, and evaluating the representations by a separate classifier (Illing et al. (2019) provides an extensive review of methods with similar goals). All the aforementioned models employ either competition within the hidden layer (KH), or feedback connections from hidden to input (RBM and AE). The BCPNN uses only the feedforward connections, along with an implicit competition via a local softmax operation, the neural implementation of which would be lateral inhibition. It is also observed that, for unsupervised learning, having sparse connectivity in the feedforward connections performs better than full connectivity (Illing et al., 2019) . In addition to the unsupervised methods, networks employing supervised learning like convolutional neural networks (CNNs) force a fixed spatial filter to obtain this sparse connectivity (Lindsay, 2020). The BCPNN model takes an alternate approach where, along with learning the weights of the feedforward connections, which is regarded as biological synaptic plasticity, a sparse connectivity between the input and hidden layer is learnt simultaneously, in analogy with the structural plasticity in the brain (Butz et al., 2009) .

