HYPERREALISTIC NEURAL DECODING: RECONSTRUCTION OF FACE STIMULI FROM FMRI MEASUREMENTS VIA THE GAN LATENT SPACE

Abstract

We introduce a new framework for hyperrealistic reconstruction of perceived naturalistic stimuli from brain recordings. To this end, we embrace the use of generative adversarial networks (GANs) at the earliest step of our neural decoding pipeline by acquiring functional magnetic resonance imaging data as subjects perceived face images created by the generator network of a GAN. Subsequently, we used a decoding approach to predict the latent state of the GAN from brain data. Hence, latent representations for stimulus (re-)generation are obtained, leading to state-ofthe-art image reconstructions. Altogether, we have developed a highly promising approach for decoding sensory perception from brain activity and systematically analyzing neural information processing in the human brain.

1. INTRODUCTION

In recent years, the field of neural decoding has been gaining more and more traction as advanced computational methods became increasingly available for application on neural data. This is a very welcome development in both neuroscience and neurotechnology since reading neural information will not only help understand and explain human brain function but also find applications in brain computer interfaces and neuroprosthetics to help people with disabilities. Figure 1 : The mapping between sensory stimuli (left) and fMRI recordings (right). Neural encoding seeks to find a transformation from stimulus to the observed brain response via a latent representation (middle). Conversely, neural decoding seeks to find the information present in the observed brain responses by a mapping from brain activity back to the original stimulus. Neural decoding can be conceptualized as the inverse problem of mapping brain responses back to sensory stimuli via a latent space (20). Such a mapping can be idealized as a composite function of linear and nonlinear transformations (Figure 1 ). The linear transformation models the mapping from brain responses to the latent space. The latent space should effectively capture the defining properties of the underlying neural representations. The nonlinear transformation models the mapping from the latent space to sensory stimuli. The systematic correspondences between latent representations of discriminative convnets and neural representations of sensory cortices are well established (23; 14; 2; 7; 8; 6). As such, exploiting these systematic correspondences in neural decoding of visual experience has pushed the state-of-the-art forward (20). This includes linear reconstruction of perceived handwritten characters (15), neural decoding of perceived and imagined object categories (10), and reconstruction of natural images (17; 16) and faces (9; 21). Yet, there is still much room for improvement since state-of-the-art results still fall short of providing photorealistic reconstructions. At the same time, generative adversarial networks (GANs) have emerged as perhaps the most powerful generative models to date (5; 11; 12; 1) that can potentially bring neural decoding to the next level. However, since the true latent representations of GANs are not readily available for preexisting neural data (unlike those of the aforementioned discriminative convnets), the adoption of GANs in neural decoding has been relatively slow (see ( 16) for an earlier attempt with GANs and ( 21) for a related attempt with VAEs). In this study, we introduce a very powerful yet simple framework for HYperrealistic reconstruction of PERception (HYPER), which elegantly integrates GANs in neural decoding by combining the following components (Figure 2 ): i GAN. We used a pretrained GAN, which allows for the generation of meaningful data samples from randomly sampled latent vectors. This model is used both for generating the stimulus set and for the ultimate reconstruction of perceived stimuli. In the current study, we used the progressive growing of GANs (PGGAN) model ( 11), which generates photorealistic faces that resemble celebrities. ii fMRI. We made use of neural data with a known latent representation, obtained by presenting the stimulus set produced using the above-mentioned generative model, and recording the brain responses to these stimuli. In the current study, we collected fMRI recordings in response to the images produced using the PGGAN. We created a dataset consisting of a separate training and test set. iii Decoding model. We used a decoding model, mapping the neural data to the latent space of the generative model. Using this model, we then obtained latent vectors for the neural responses corresponding to the stimulus images in the test set. Feeding these latent vectors back into the generative model resulted in the hyperrealistic reconstructions of perception.

2.1. TRAINING ON SYNTHETIC IMAGES WITH KNOWN LATENT FEATURES

State-of-the art face reconstruction techniques use deep neural networks to encode vectors of latent features for the images presented during the fMRI experiment (9; 21). These feature vectors have been shown to have a linear relation with measured brain responses. However, this approach entails information loss since the target images need to be reconstructed from the linear prediction using an approximate inversion network such as a variational decoder, leading to a severe bottleneck to the maximum possible reconstruction quality.



Figure 2: Schematic illustration of the HYPER framework. Face images are generated from randomly sampled latent features z ∈ Z by a face-generating GAN, as denoted by the dotted box. These faces are then presented as visual stimuli during brain scanning. Next, a linear decoding model learns the mapping from brain responses to the original latent representation, after which it predicts latent features ẑ for unseen brain responses. Ultimately, these predicted latent features are fed to the GAN for image reconstruction.

