BRAINBERT: SELF-SUPERVISED REPRESENTATION LEARNING FOR INTRACRANIAL RECORDINGS

Abstract

We create a reusable Transformer, BrainBERT, for intracranial field potential recordings bringing modern representation learning approaches to neuroscience. Much like in NLP and speech recognition, this Transformer enables classifying complex concepts, i.e., decoding neural data, with higher accuracy and with much less data by being pretrained in an unsupervised manner on a large corpus of unannotated neural recordings. Our approach generalizes to new subjects with electrodes in new positions and to unrelated tasks showing that the representations robustly disentangle the neural signal. Just like in NLP where one can study language by investigating what a language model learns, this approach enables investigating the brain by studying what a model of the brain learns. As a first step along this path, we demonstrate a new analysis of the intrinsic dimensionality of the computations in different areas of the brain. To construct BrainBERT, we combine super-resolution spectrograms of neural data with an approach designed for generating contextual representations of audio by masking. In the future, far more concepts will be decodable from neural recordings by using representation learning, potentially unlocking the brain like language models unlocked language.

1. INTRODUCTION

Methods that analyze neural recordings have an inherent tradeoff between power and explainability. Linear decoders, by far the most popular, provide explainability; if something is decodable, it is computed and available in that area of the brain. The decoder itself is unlikely to be performing the task we want to decode, instead relying on the brain to do so. Unfortunately, many interesting tasks and features may not be linearly decodable from the brain for many reasons including a paucity of annotated training data, noise from nearby neural processes, and the inherent spatial and temporal resolution of the instrument. More powerful methods that perform non-linear transformations have lower explainability: there is a danger that the task is not being performed by the brain, but by the decoder itself. In the limit, one could conclude that object class is computed by the retina using a CNN-based decoder but it is well established that the retina does not contain explicit information about objects. Self-supervised representation learning provides a balance between these two extremes. We learn representations that are generally useful for representing neural recordings, without any knowledge of a task being performed, and then employ a linear decoder. The model we present here, BrainBERTfoot_0 , learns a complex non-linear transformation of neural data using a Transformer. Using BrainBERT, one can linearly decode neural recordings with much higher accuracy and with far fewer examples than from raw features. BrainBERT is pretrained once across a pool of subjects, and then provides off-the-shelf capabilities for analyzing new subjects with new electrode locations even when data is scarce. Neuroscientific experiments tend to have little data in comparison to other machine learning settings, making additional sample efficiency critical. Other applications, such as brain-computer interfaces can also benefit from shorter training regimes, as well as from BrainBERT's significant performance improvements. In addition, the embeddings of the neural data provide a new means by which to investigate the brain. BrainBERT provides contextualized neural embeddings, in the same way that masked language modeling provides contextual word embeddings. Such methods have proven themselves in areas like speech recognition where a modest amount of speech, 200 to 400 hours, leads to models from which one can linearly decode the word being spoken. We use a comparable amount of recordings, 43.7 hours across all subjects (4,551 electrode-hours), of unannotated neural recordings to build similarly reusable and robust representations. To build contextualized embeddings, BrainBERT borrows from masked language modeling (Devlin et al., 2019) and masked audio modeling (Baevski et al., 2020; Liu et al., 2021) . Given neural activity, as recorded by a stereo-electroencephalographic (SEEG) probe, we compute a spectrogram per electrode. We mask random parts of that spectrogram and train BrainBERT to produce embeddings from which the original can be reconstructed. But unlike speech audio, neural activity has fractal and scale-free characteristics (Lutzenberger et al., 1995; Freeman, 2005) , meaning that similar patterns appear at different time scales and different frequencies, and identifying these patterns is often a challenge. To that end, we adapt modern neural signal processing techniques for producing superresolution time-frequency representations of neurophysiological signals (Moca et al., 2021) . Such techniques come with a variable trade-off in time-frequency resolution, which we account for in our adaptive masking strategy. Finally, the activity captured by intracranial electrodes is often sparse. To incentivize the model to better represent short-bursts of packet activity, we use a content-aware loss that places more weight on non-zero spectrogram elements. Our contributions are: 1. the BrainBERT model -a reusable, off-the-shelf, subject-agnostic, and electrode-agnostic model that provides embeddings for intracranial recordings, 2. a demonstration that BrainBERT systematically improves the performance of linear decoders, 3. a demonstration that BrainBERT generalizes to previously unseen subjects with new electrode locations, and 4. a novel analysis of the intrinsic dimensionality of the computations performed by different parts of the brain made possible by BrainBERT embeddings.

2. METHOD

The core of BrainBERT is a stack of Transformer encoder layers (Vaswani et al., 2017) . In pretraining, BrainBERT receives an unannotated time-frequency representation of the neural signal as input. This input is randomly masked, and the model learns to reconstruct the missing portions.



https://github.com/czlwang/BrainBERT



Figure 1: (a) Locations of intracranial electrodes (yellow dots) projected onto the surface of the brain across all subjects for each hemisphere. (b) Subjects watched movies while neural data was recorded (bottom, example electrode trace). (c) Neural recordings were converted to spectrograms which are embedded with BrainBERT. The resulting spectrograms are useful for many downstream tasks, like sample-efficient classification. BrainBERT can be used off-the-shelf, zero-shot, or if data is available, by fine-tuning for each subject and/or task. (d) During pretraining, BrainBERT is optimized to produce embeddings that enable reconstruction of a masked spectrogram, for which it must learn to infer the masked neural activity from the surrounding context.

