A COMPUTATIONAL FRAMEWORK TO UNIFY REPRESEN-TATION SIMILARITY AND FUNCTION IN BIOLOGICAL AND ARTIFICIAL NEURAL NETWORKS

Abstract

Artificial neural networks (ANNs) are powerful tools for studying neural representations in the ventral visual stream of the brain, which, in turn, have inspired new designs of ANN models to improve task performance. However, a unified framework for merging these two directions has been lacking so far. In this study, we propose an integrated framework called Deep Autoencoder with Neural Response (DAE-NR), which incorporates information from the visual cortex into ANN models to achieve better image reconstruction performance and higher neural representation similarity between biological and artificial neurons. The same visual stimuli (i.e., natural images) are input to both the mice brain and DAE-NR. The encoder of DAE-NR jointly learns the dependencies from neural spike encoding and image reconstruction. For the neural spike encoding task, the features derived from a specific hidden layer of the encoder are transformed by a mapping function to predict the ground-truth neural response under the constraint of image reconstruction. Simultaneously, for the image reconstruction task, the latent representation obtained by the encoder is assigned to a decoder to restore the original image under the guidance of neural information. In DAE-NR, the learning process of encoder, mapping function and decoder are all implicitly constrained by these two tasks. Our experiments demonstrate that if and only if with the joint learning, DAE-NRs can improve the performance of visual image reconstruction and increase the representation similarity between biological neurons and artificial neurons. The DAE-NR offers a new perspective on the integration of computer vision and neuroscience.

1. INTRODUCTION

Computer vision has achieved almost comparable performance to the human visual system on some tasks, mainly thanks to recent advances in deep learning. Image reconstruction is one of the essential tasks in computer vision Hinton & Salakhutdinov (2006) ; Kingma & Welling (2014); Ravishankar et al. (2020) . As a solution, the auto-encoder (AE) framework embeds the high-dimensional input to a low-dimensional latent space by the encoder and then reconstructs the image by the decoder Hinton & Salakhutdinov (2006); Goodfellow et al. (2016) . Despite the popularity and the practical successes of AE models, the setting of prior would largely influence the image reconstruction performance on DAEs Tomczak & Welling (2018). Moreover, there usually needs to be more biological interpretability in the model architecture. Inspired by neuroscience, computer vision researchers have been interested in how to use information from biological neurons to achieve brain-like performance (such as robustness and ability to learn from small samples). The biology-inspired AE models may help improve performance in image reconstruction tasks and bring biological interpretability Federer et al. (2020); Schrimpf et al. (2018); Safarani et al. (2021) . To this end, the key question is how to integrate biological information into AEs. On the other hand, computational neuroscience is interested in building models that map stimuli to neural responses. Traditional models have difficulty expressing nonlinear characteristics between stimulus and neural response. Deep learning empowers computational neuroscience models and reveals the relationship between stimuli and neural spikes Klindt et al. (2017) Although biological and 1

