INCREMENTAL PREDICTIVE CODING: A PARALLEL AND FULLY AUTOMATIC LEARNING ALGORITHM

Abstract

Neuroscience-inspired models, such as predictive coding, have the potential to play an important role in the future of machine intelligence. However, they are not yet used in industrial applications due to some limitations, one of them being the lack of efficiency. In this work, we address this by proposing incremental predictive coding (iPC), a variation of the original framework derived from the incremental expectation maximization algorithm, where every operation can be performed in parallel without external control. We show both theoretically and empirically that iPC is more efficient than the original algorithm by Rao and Ballard 1999, while maintaining performance comparable to backpropagation in image classification tasks. This work impacts several areas, has general applications in computational neuroscience and machine learning, and specific applications in scenarios where automatization and parallelization are important, such as distributed computing and implementations of deep learning models on analog and neuromorphic chips.

1. INTRODUCTION

In recent years, deep learning has reached and surpassed human-level performance in a multitude of tasks, such as game playing (Silver et al., 2017; 2016) , image recognition (Krizhevsky et al., 2012; He et al., 2016) , natural language processing (Chen et al., 2020), and image generation (Ramesh et al., 2022) . These successes are achieved entirely using deep artificial neural networks trained via backpropagation (BP), which is a learning algorithm that is often criticized for its biological implausibilities (Grossberg, 1987; Crick, 1989; Abdelghani et al., 2008; Lillicrap et al., 2016; Roelfsema & Holtmaat, 2018; Whittington & Bogacz, 2019) , such as lacking local plasticity and autonomy. In fact, backpropagation requires a global control signal required to trigger computations, since gradients must be sequentially computed backwards through the computation graph. These properties are not only important for biological plausibility: parallelization, locality, and automation are key to build efficient models that can be trained end-to-end on non Von-Neumann machines, such as analog chips (Kendall et al., 2020) . A learning algorithm with most of the above properties is predictive coding (PC). PC is an influential theory of information processing in the brain (Mumford, 1992; Friston, 2005) , where learning happens by minimizing the prediction error of every neuron. PC can be shown to approximate backpropagation in layered networks (Whittington & Bogacz, 2017), as well as on any other model (Millidge et al., 2020) , and can exactly replicate its weight update if some external control is added (Salvatori et al., 2022b) . Also the differences with BP are interesting, as PC allows for a much more flexible training and testing (Salvatori et al., 2022a) , has a rich mathematical formulation (Friston, 2005; Millidge et al., 2022) , and is an energy-based model (Bogacz, 2017) . This makes PC unique, as it is the only model that jointly allows training on neuromorphic chips, is an implementation of influential models of cortical functioning in the brain, and can match the performance of backpropagation in different tasks. Its main drawback, however, is the efficiency, as it is slower than BP. In this work, we address this problem by proposing a variation of PC that is much more efficient than the original one. Simply put, PC is based on the assumption that brains implement an internal generative model of the world, needed to predict incoming stimuli (or data) (Friston et al., 2006; Friston, 2010; Friston et al., 2016) . When presented with a stimulus that differs from the prediction, learning happens by

