CORTICALLY MOTIVATED RECURRENCE ENABLES TASK EXTRAPOLATION

Abstract

Feedforward deep neural networks have become the standard class of models in the field of computer vision. Yet, they possess a striking difference relative to their biological counterparts which predominantly perform "recurrent" computations. Why do biological neurons evolve to employ recurrence pervasively? In this paper, we show that a recurrent network is able to flexibly adapt its computational budget during inference and generalize within-task across difficulties. Simultaneously in this study, we contribute a recurrent module we call LocRNN that is designed based on a prior computational model of local recurrent intracortical connections in primates to support such dynamic task extrapolation. LocRNN learns highly accurate solutions to the challenging visual reasoning problems of Mazes and PathFinder that we use here. More importantly, it is able to flexibly use less or more recurrent iterations during inference to zero-shot generalize to less-and more difficult instantiations of each task without requiring extra training data, a potential functional advantage of recurrence that biological visual systems capitalize on. Feedforward networks on the other hand with their fixed computational graphs only partially exhibit this trend, potentially owing to image-level similarities across difficulties. We also posit an intriguing tradeoff between recurrent networks' representational capacity and their stability in the recurrent state space. Our work encourages further study of the role of recurrence in deep learning models -especially from the context of out-of-distribution generalization & task extrapolation -and their properties of task performance and stability.

1. INTRODUCTION

Deep learning based models for computer vision have recently matched and even surpassed humanlevel performance on various semantic tasks (Dosovitskiy et al., 2020; Vaswani et al., 2021; He et al., 2021; Liu et al., 2022) . While the gap between human and machine task performance has been diminishing with more successful deep learning architectures, differences in their architectures and in critical behaviors such as adversarial vulnerability (Athalye et al., 2018) , texture bias (Geirhos et al., 2018) , lack of robustness to perceptual distortions (Hendrycks & Dietterich, 2019; Geirhos et al., 2019) , etc. have increased significantly. We are interested in one such stark architectural difference between artificial and biological vision that we believe underlies the above-mentioned critical behavioral differences, i.e., recurrent neural processing. While biological neurons predominantly process input stimuli with recurrence, existing high-performing deep learning architectures are largely feedforward in nature. In this work, we argue for further incorporation of recurrent processing in future deep learning architectures -we compare matched recurrent and feedforward networks to show how the former are capable of extrapolating representations learned on a task to unseen difficulty levels without extra training data; feedforward networks are strictly restricted in this front as they cannot dynamically change their computational graph. Ullman (1984) introduced a particularly important research direction in visual cognition that is of fundamental importance to understanding the human ability to extrapolate learned representations within-task across difficulties. Ullman hypothesized that all visual tasks we perform are supported by combinations of a small set of key elemental operations that are applied in a sequential manner (analogous to recurrent processing) like an instruction set. Instances of varying difficulty levels of a task can be solved by dynamically piecing together shorter or longer sequences of operations corresponding to that task. This approach of decomposing tasks into a sequence of elemental opera-

