GROUP-LEVEL BRAIN DECODING WITH DEEP LEARNING

Abstract

Decoding experimental variables from brain imaging data is gaining popularity, with applications in brain-computer interfaces and the study of neural representations. Decoding is typically subject-specific and does not generalise well over subjects. Here, we propose a method that uses subject embedding, analogous to word embedding in Natural Language Processing, to learn and exploit the structure in between subject variability as part of a decoding model, our adaptation of the WaveNet architecture for classification. We apply this to magnetoencephalography data, where 15 subjects viewed 118 different images, with 30 examples per image; to classify images using the entire 1s window following image presentation. We show that the combination of deep learning and subject embedding is crucial to closing the performance gap between subject-and group-level decoding models. Importantly, group models outperform subject models on low-accuracy subjects (but impair high-accuracy subjects) and can be helpful for initialising subject models. The potential of such group modelling is even higher with bigger datasets. To better enable physiological interpretation at the group level we demonstrate the use of permutation feature importance developing insights into the spatio-temporal and spectral information encoded in the models. All code is available on GitHub 1 .

1. INTRODUCTION

In recent years, decoding has gained in popularity in neuroscience (Kay et al., 2008) , specifically decoding external variables (e.g. stimulus category) from internal states (i.e. brain activity). Such analyses can be useful for brain-computer interface (BCI) applications (Willett et al., 2021) or to gain neuroscientific insights (Guggenmos et al., 2018; Kay et al., 2008) . Analysing deep learning methods on such data is also beneficial for the machine learning community. Namely, the small, noisy, high-dimensional datasets test the limits of popular architectures on real data and demand research into new methods (Zubarev et al., 2019; Kostas et al., 2021) . Applications of decoding to brain recordings typically fit separate (often linear) models per dataset, per subject (Guggenmos et al., 2018; Dash et al., 2020b) . This has the benefit that the decoding is tuned to the dataset/subject, but has the drawback that it is unable to leverage knowledge that could be transferred across datasets/subjects. This is especially desirable for the field of neuroimaging, because gathering more data is expensive and often impossible (e.g. in clinical populations). More practical drawbacks of subject-specific (subject-level) models include increased computational load, a higher chance of overfitting, and the inability to adapt to new subjects. We aim to leverage data from multiple subjects and train a shared model that can generalise across subjects (group-level). A conceptual visualisation of subject-level (SL) and group-level (GL) models is given in Figure 1 . Magnetoencephalography (MEG) measures magnetic fields induced by electrical activity in the brain, and it is one of the main noninvasive brain recording methodologies, next to electroencephalography (EEG) and functional Magnetic Resonance Imaging (fMRI). Due to high temporal resolution and relatively good spatial resolution, MEG is an excellent method for studying the fast dynamics of brain activity. MEG is highly suitable for decoding analyses (Du et al., 2019) , which is mostly done using SL models. This is because between-subject variability of neuroimaging data limits the application of a single shared model between subjects without capturing the structure of between-subject variability (Olivetti et al., 2014; Li et al., 2021) . Such an approach, which we call naive group modelling, effectively pretends that all data comes from the same subject (see Figure 1b ). Between-subject variability has multiple sources, such as different anatomical structures, different positions in the scanner, signal-to-noise ratio, etc. (Saha & Baumert, 2020). To overcome this, we propose a general architecture capable of jointly decoding multiple subjects with the help of subject embeddings (Figure 2 ). The scope of this paper is full-epoch decoding, and comparisons with sliding-window decoding approaches often used in neuroscience are left for future work. To qualify how we aim to improve on SL models, we will next describe the two main approaches to evaluating decoding models, with different underlying assumptions and goals. One approach is to construct separate train and test splits for each subject that are made up of different, non-overlapping trials. This can be called within-subject splitting evaluation. SL models are evaluated by definition in this way, and it is a very common setup in the neuroscience literature (Guggenmos et al., 2018; Cooney et al., 2019b; Cichy & Pantazis, 2017; Dash et al., 2020b; a; Nath et al., 2020) . In this work, our main aim is to improve over SL models in the context of within-subject splitting evaluation and improve the prediction of left-out trials, by using a single group decoding model that generalises across subjects. We call this GL method across-subject decoding. We are motivated by the fact that GL models that perform well in this manner can be useful for gaining neuroscientific insights that are relevant at the group level, as we will show in Sections 4.4 and 4.5. The other prominent approach to evaluating group models, leave-one-subject-out (LOSO) analysis, is also presented in Section 4.3. In this scenario, GL models are trained on data from multiple subjects and tested on a new, unseen subject (Zubarev et al., 2019) , which can be especially useful in zero-shot BCI applications. Although in this case, we find no improvement using our embedding-aided group model, we think this may change with larger datasets with many more subjects. Our aim is to improve across-subject decoding of MEG data by using a group model that generalizes across subjects. To be clear this objective and the datasets we use are not related to any kind of direct BCI application. We make the following contributions using a MEG dataset with visual task (Cichy et al 

2. RELATED WORK

Decoding can be applied to most tasks/modalities, such as images (Cichy et al., 2016 ), phonemes (Mugler et al., 2014 ), words (Cooney et al., 2019b; Hultén et al., 2021) , sentences (Dash et al., 2020b) , and motor movements such as imagined handwriting (Willett et al., 2021) , jaw movements (Dash et al., 2020a) , or finger movements (Elango et al., 2017) . Here, we used image categorisation because it is a widely studied decoding task and we had access to a dataset which is relatively large for the field of neuroimaging. Our results should readily generalise to other decoding modalities. Chaibub Neto



Anonymized.



Naive group-level (GL) model.

Figure 1: Comparison of subject-level (SL) and naive group-level (GL) modelling. (a) A separate model is trained on the trials (examples) of each subject. (b) A single, shared model is trained on the trials of all subjects without capturing between subject variability. Each trial is C x T (channels x timesteps) dimensional. Each of the s subjects has t trials.

., 2016): 1. A GL model with subject embeddings is introduced, substantially improving over naive group modelling. 2. Insight is provided into how non-linearity and subject embedding helps group modelling. 3. Neuroscientific insights are gained from the deep learning-based decoding model. 4. Analysis of model weights reveals how meaningful spatio-temporal and spectral information is encoded.

