WIRING UP VISION: MINIMIZING SUPERVISED SYNAPTIC UPDATES NEEDED TO PRODUCE A PRIMATE VENTRAL STREAM

Abstract

After training on large datasets, certain deep neural networks are surprisingly good models of the neural mechanisms of adult primate visual object recognition. Nevertheless, these models are poor models of the development of the visual system because they posit millions of sequential, precisely coordinated synaptic updates, each based on a labeled image. While ongoing research is pursuing the use of unsupervised proxies for labels, we here explore a complementary strategy of reducing the required number of supervised synaptic updates to produce an adult-like ventral visual stream (as judged by the match to V1, V2, V4, IT, and behavior). Such models might require less precise machinery and energy expenditure to coordinate these updates and would thus move us closer to viable neuroscientific hypotheses about how the visual system wires itself up. Relative to the current leading model of the adult ventral stream, we here demonstrate that the total number of supervised weight updates can be substantially reduced using three complementary strategies: First, we find that only 2% of supervised updates (epochs and images) are needed to achieve ∼80% of the match to adult ventral stream. Second, by improving the random distribution of synaptic connectivity, we find that 54% of the brain match can already be achieved "at birth" (i.e. no training at all). Third, we find that, by training only ∼5% of model synapses, we can still achieve nearly 80% of the match to the ventral stream. When these three strategies are applied in combination, we find that these new models achieve ∼80% of a fully trained model's match to the brain, while using two orders of magnitude fewer supervised synaptic updates. These results reflect first steps in modeling not just primate adult visual processing during inference, but also how the ventral visual stream might be "wired up" by evolution (a model's "birth" state) and by developmental learning (a model's updates based on visual experience).

1. INTRODUCTION

Particular artificial neural networks (ANNs) are the leading mechanistic models of visual processing in the primate visual ventral stream (Schrimpf et al., 2018; Kubilius et al., 2019; Dapello et al., 2020) . After training on large-scale datasets such as ImageNet (Deng et al., 2009) by updating weights based on labeled images, internal representations of these ANNs partly match neural representations in the primate visual system from early visual cortex V1 through V2 and V4 to high-level IT (Yamins et al., 2014; Khaligh-Razavi & Kriegeskorte, 2014; Cadena et al., 2017; Tang et al., 2018; Schrimpf et al., 2018; Kubilius et al., 2019) , and model object recognition behavior can partly account for primate object recognition behavior (Rajalingham et al., 2018; Schrimpf et al., 2018) . Recently, such models have been criticized due to how their learning departs from brain development (Marcus, 2004; Grossberg, 2020; Zador, 2019) . For example, all the current top models of the primate ventral stream rely on trillions of supervised synaptic updates, i.e. the training of millions of parameters with millions of labeled examples over dozens of epochs. In biological systems, on the other hand, the at-birth synaptic wiring as encoded by the genome already provides structure that is sufficient for macaques to exhibit adult-like visual representations after a few months (Movshon & Kiorpes, 1988; Kiorpes & Movshon, 2004; Seibert, 2018) , which restricts the amount of experience

