A NEW PHOTORECEPTOR-INSPIRED CNN LAYER EN-ABLES DEEP LEARNING MODELS OF RETINA TO GEN-ERALIZE ACROSS LIGHTING CONDITIONS

Abstract

As we move our eyes, and as lighting changes in our environment, the light intensity reaching our retinas changes dramatically and on multiple timescales. Despite these changing conditions, our retinas effortlessly extract visual information that allows downstream brain areas to make sense of the visual world. Such processing capabilities are desirable in many settings, including computer vision systems that operate in dynamic lighting environments like in self-driving cars, and in algorithms that translate visual inputs into neural signals for use in vision-restoring prosthetics. To mimic retinal processing, we first require models that can predict retinal ganglion cell (RGC) responses reliably. While existing state-of-the-art deep learning models can accurately predict RGC responses to visual scenes under steady-state lighting conditions, these models fail under dynamic lighting conditions. This is because changes in lighting markedly alter RGC responses: adaptation mechanisms dynamically tune RGC receptive fields on multiple timescales. Because current deep learning models of the retina have no in-built notion of light level or these adaptive mechanisms, they are unable to accurately predict RGC responses under lighting conditions that they were not trained on. We present here a new deep learning model of the retina that can predict RGC responses to visual scenes at different light levels without requiring training at each light level. Our model combines a fully trainable biophysical front end capturing the fast and slow adaptation mechanisms in the photoreceptors with convolutional neural networks (CNNs) capturing downstream retinal processing. We tested our model's generalization performance across light levels using monkey and rat retinal data. Whereas conventional CNN models without the photoreceptor layer failed to predict RGC responses when the lighting conditions changed, our model with the photoreceptor layer as a front end fared much better in this challenge. Overall, our work demonstrates a new hybrid approach that equips deep learning models with biological vision mechanisms enabling them to adapt to dynamic environments.

1. INTRODUCTION

A key problem in visual neuroscience is to generate models that can accurately predict how neurons will respond to visual stimuli. Along with their role in basic neuroscience, these models have applications in prosthetic devices and can form the basis for bio-inspired computer vision systems that aim to mimic the impressively robust functions of the human visual system. Use of machine learning models for such applications in neuroscience has become increasingly ubiquitous given their strong performance in computer vision applications like object recognition (Chollet, 2017; Simonyan & Zisserman, 2015; Krizhevsky et al., 2017) . For example using convolutional neural networks (CNNs) to predict responses of neurons in visual cortex (Kindel et al., 2019; Cadena et al., 2017) and retina (McIntosh et al., 2016; Tanaka et al., 2019; Yan et al., 2022; Goldin et al., 2022) to visual stimuli. We focus here on the retina. Under carefully controlled experimental conditions, and with constant lighting conditions, CNN models can predict responses of retinal ganglion cells (RGCs, the "output" cells of the retina, whose axons form the optic nerve) to visual stimuli with high accuracy (McIntosh et al., 2016) . In natural vision, however, lighting conditions are highly dynamic: the amount of light falling on the retina can change by several orders of magnitude at mul-

