PREDICTIVE CODING WITH APPROXIMATE LAPLACE MONTE CARLO

Abstract

Predictive coding (PC) accounts of perception now form one of the dominant computational theories of the brain, where they prescribe a general algorithm for inference and learning over hierarchical Gaussian latent general models. Despite this, they have enjoyed little export to the broader field of machine learning, where comparative generative modelling techniques have flourished. In part, this has been due to the poor performance of models trained with PC when evaluated by both sample quality and marginal likelihood. By adopting the perspective of PC as a variational Bayes algorithm under the Laplace approximation, we identify the source of these deficits to lie in the exclusion of an associated Hessian term in the standard PC objective function. To remedy this, we make three primary contributions: we begin by suggesting a simple Monte Carlo estimated evidence lower bound which relies on sampling from the Hessian-parameterised variational posterior. We then derive a novel block diagonal approximation to the full Hessian matrix that has lower memory requirements and favourable mathematical properties. Lastly, we present an algorithm that combines our method with standard PC to reduce memory complexity further. We evaluate models trained with our approach against the standard PC framework on image benchmark datasets. Our approach produces higher log-likelihoods and qualitatively better samples that more closely capture the diversity of the data-generating distribution.

1. INTRODUCTION

In the last two decades, conceptions of the brain as an organ actively engaged in Bayesian inference have become exceedingly prominent in cognitive neuroscience (Pouget et al., 2013; Clark, 2013; Kanai et al., 2015) . Under this paradigm, the brain adopts a probabilistic generative model of the world, with perception corresponding to inference over latent states, and learning to the inference over its parameters. Predictive coding (PC) (Rao and Ballard, 1999; Friston, 2018) , arguably the most notable instantiation of this perspective, describes a method for parameter learning in hierarchical latent Gaussian generative models with arbitrarily complex and highly non-linear parameterisations governing their conditional distributions. This computational scheme remains one of the foremost and popular computational models for explaining cortical function, (Mumford, 1992; Hosoya et al., 2005; Hohwy et al., 2008; Bastos et al., 2012; Shipp, 2016; Feldman and Friston, 2010; Fountas et al., 2022) , emphasizing the importance of evaluating it as a successful technique for training deep generative models of the kind presupposed in the brain. From a machine learning perspective, PC bares a close mathematical relationship to Bayesian techniques such as the variational auto-encoder (VAE) (Kingma and Welling, 2014), which also relies on optimising an evidence lower bound (ELBO); with a key advantage over VAEs ostensibly being in PC's use of non-amortised inference (Cremer et al., 2018) . Furthermore, PC also benefits from design principles inherited from its origins as a theory of cognitive function -namely asynchronous and local error computation (Whittington and Bogacz, 2019), suggesting a far greater amenability to implementation on energy-efficient neuromorphic hardware. In this work, we show that generative models trained with PC (of the kind described in (Bogacz, 2017; Tschantz et al., 2022; Millidge et al., 2022) ), have poor log marginal likelihoods when evaluated on common image datasets, and poor sample quality, despite producing good reconstructions. To diagnose these issues we begin by adopting the perspective of PC as a variational Bayes algorithm under the Laplace approximation (Friston, 2003; 2005; 2008) . Under this approximation, quadratic

