WIRING UP VISION: MINIMIZING SUPERVISED SYNAPTIC UPDATES NEEDED TO PRODUCE A PRIMATE VENTRAL STREAM

Abstract

After training on large datasets, certain deep neural networks are surprisingly good models of the neural mechanisms of adult primate visual object recognition. Nevertheless, these models are poor models of the development of the visual system because they posit millions of sequential, precisely coordinated synaptic updates, each based on a labeled image. While ongoing research is pursuing the use of unsupervised proxies for labels, we here explore a complementary strategy of reducing the required number of supervised synaptic updates to produce an adult-like ventral visual stream (as judged by the match to V1, V2, V4, IT, and behavior). Such models might require less precise machinery and energy expenditure to coordinate these updates and would thus move us closer to viable neuroscientific hypotheses about how the visual system wires itself up. Relative to the current leading model of the adult ventral stream, we here demonstrate that the total number of supervised weight updates can be substantially reduced using three complementary strategies: First, we find that only 2% of supervised updates (epochs and images) are needed to achieve ∼80% of the match to adult ventral stream. Second, by improving the random distribution of synaptic connectivity, we find that 54% of the brain match can already be achieved "at birth" (i.e. no training at all). Third, we find that, by training only ∼5% of model synapses, we can still achieve nearly 80% of the match to the ventral stream. When these three strategies are applied in combination, we find that these new models achieve ∼80% of a fully trained model's match to the brain, while using two orders of magnitude fewer supervised synaptic updates. These results reflect first steps in modeling not just primate adult visual processing during inference, but also how the ventral visual stream might be "wired up" by evolution (a model's "birth" state) and by developmental learning (a model's updates based on visual experience).

1. INTRODUCTION

Particular artificial neural networks (ANNs) are the leading mechanistic models of visual processing in the primate visual ventral stream (Schrimpf et al., 2018; Kubilius et al., 2019; Dapello et al., 2020) . After training on large-scale datasets such as ImageNet (Deng et al., 2009) by updating weights based on labeled images, internal representations of these ANNs partly match neural representations in the primate visual system from early visual cortex V1 through V2 and V4 to high-level IT (Yamins et al., 2014; Khaligh-Razavi & Kriegeskorte, 2014; Cadena et al., 2017; Tang et al., 2018; Schrimpf et al., 2018; Kubilius et al., 2019) , and model object recognition behavior can partly account for primate object recognition behavior (Rajalingham et al., 2018; Schrimpf et al., 2018) . Recently, such models have been criticized due to how their learning departs from brain development (Marcus, 2004; Grossberg, 2020; Zador, 2019) . For example, all the current top models of the primate ventral stream rely on trillions of supervised synaptic updates, i.e. the training of millions of parameters with millions of labeled examples over dozens of epochs. In biological systems, on the other hand, the at-birth synaptic wiring as encoded by the genome already provides structure that is sufficient for macaques to exhibit adult-like visual representations after a few months (Movshon & Kiorpes, 1988; Kiorpes & Movshon, 2004; Seibert, 2018) , which restricts the amount of experience dependent learning. Furthermore, different neuronal populations in cortical circuits undergo different plasticity mechanisms: neurons in supragranular and infragranular layers adapt more rapidly than those in layer 4 which receives inputs from lower areas (Diamond et al., 1994; Schoups et al., 2001) , while current artificial synapses, on the other hand, all change under the same plasticity mechanism. While current models provide a basic understanding of the neural mechanisms of adult ventral stream inference, can we start to build models that provide an understanding of how the ventral stream "wires itself up" -models of the initial state at birth and how it develops during postnatal life? Related Work. Several papers have addressed related questions in machine learning: Distilled student networks can be trained on the outputs of a teacher network (Hinton et al., 2015; Cho & Hariharan, 2019; Tian et al., 2019) , and, in pruning studies, networks with knocked out synapses perform reasonably well (Cheney et al., 2017; Morcos et al., 2018) , demonstrating that models with many trained parameters can be compressed which is further supported by the convergence of training gradients onto a small subspace (Gur-Ari et al., 2018) . Tian et al. (2020) show that a pre-trained encoder's fixed features can be used to train a thin decoder with performance close to full finetuning and recent theoretically-driven work has found that training only BatchNorm layers (Frankle et al., 2020) or determining the right parameters from a large pool of weights (Frankle et al., 2019; Ramanujan et al., 2019) can already achieve high classification accuracy. Unsupervised approaches are also starting to develop useful representations without requiring many labels by inferring internal labels such as clusters or representational similarity (Caron et al., 2018; Wu et al., 2018; Zhuang et al., 2019; Hénaff et al., 2019; Konkle & Alvarez, 2020; Zhuang et al., 2020) . Many attempts are also being made to make the learning algorithms themselves more biologically plausible (e.g. Lillicrap et al., 2016; Scellier & Bengio, 2017; Pozzi et al., 2020) Nevertheless, all of these approaches require many synaptic updates in the form of labeled samples or precise machinery to determine the right set of weights. In this work, we take first steps of relating findings in machine learning to neuroscience and using such models to explore hypotheses about the product of evolution (a model's "birth state") while simultaneously reducing the number of supervised synaptic updates (a model's visual experience dependent development) without sacrificing high brain predictivity. Our contributions follow from a framework in which evolution endows the visual system with a well-chosen, yet still largely random "birth" pattern of synaptic connectivity (architecture + initialization), and developmental learning corresponds to training a fraction of the synaptic weights using very few supervised labels. We do not view the proposed changes as fully biological models of post-natal development, only that they more concretely correspond to biology than current models. Solving the entire problem of development all at once is too much for one study, but even partial improvements in this direction will likely be informative to further work. Specifically, 1. we build models with a fraction of supervised updates (training epochs and labeled images) that retain high similarity to the primate ventral visual stream (quantified by a brain predictivity score from benchmarks on Brain-Score (Schrimpf et al., 2018) ), 2. we improve the "at-birth" synaptic connectivity to achieve reasonable brain predictivity with no training at all, 3. we propose a thin, "critical training" technique which reduces the number of trained synapses while maintaining high brain predictivity, 4. we combine these three techniques to build models with two orders of magnitude fewer supervised synaptic updates but high brain predictivity relative to a fully trained model Code and pre-trained models will be available through GitHub.

2. MODELING PRIMATE VISION

We evaluate all models on a suite of ventral stream benchmarks in Brain-Score (Schrimpf et al., 2018) , and we base the new models presented here on the CORnet-S architecture as this is currently the most accurate model of adult primate visual processing (Kubilius et al., 2019) . Brain-Score benchmarks. To obtain quantified scores for brain-likeness, we use a thorough set of benchmarks from Brain-Score (Schrimpf et al., 2018) . To keep scores comparable, we only included those neural benchmarks from Brain-Score (Schrimpf et al., 2018) with the same predictivity

