BRAIN SIGNAL GENERATION AND DATA AUGMENTA-TION WITH A SINGLE-STEP DIFFUSION PROBABILIS-TIC MODEL

Abstract

Brain-computer interfaces based on deep learning rely on large amounts of highquality data. Finding publicly available brain signal datasets that meet all requirements is a challenge. However, brain signals synthesized with generative models may provide a solution to this problem. Our work builds on diffusion probabilistic models (DPMs) and aims to generate brain signals that have the properties needed to develop further classification models based on deep learning. We show that our DPM can generate high-quality event-related potentials (ERPs) and motor imagery (MI) signals. Furthermore, with the progressive distillation of the model, subject-specific data can be produced in a one-step reverse process. We augment publicly available datasets and demonstrate the impact of the generated signals on a deep learning classification model. DPMs are versatile models, and this work shows that brain signal processing is one of many other tasks in which these models can be useful.

1. INTRODUCTION

Electroencephalography (EEG) is undoubtedly one of the most popular brain mapping technologies, which is widely used in research and clinical diagnosis (de Aguiar Neto & Rosa (2019) , van Mierlo et al. (2020) , Wang et al. (2020) ). EEG records the neural activity of the brain in a non-invasive manner. (Biasiucci et al. (2019) ) EEG is less complex and cheaper than other brain imaging technologies. EEG has one of the best temporal resolutions. However, the spatial resolution of the technology is quite poor due to its heavy dependence on the number of electrodes used for signal recording and non-invasiveness. (Craik et al. (2019b)) Brain-computer interfaces (BCIs) connect the brain and external processing devices, making it possible to perform tasks using only brain signals. BCIs can help in everyday life for people with limited movement and communication abilities (Pandarinath et al. (2017) ). BCIs are also applied in many other fields from healthcare (Galán et al. (2008) 2020)). To the best of our knowledge, there are no works examining the capabilities of DPMs or score-based models in multi-channel EEG signal generation tasks. The structure of this work is as follows: in Section 2 we present the background on the DPM framework that we used in this paper, followed by a brief description of the progressive distillation process in Section 3. Our EEGWave architecture is presented in Section 4. The description of the experiments with the used datasets and procedures are given in Section 5. Finally, we conclude our work and thoughts in Section 6.

2. CONTINUOUS-TIME DIFFUSION MODELS

The distribution of the training data set is given as p(x). Let x ∈ R E×L , where E is the number of electrodes (or EEG channels) and L is the length of the recorded sequence. In a continuous-time diffusion framework (Kingma et al. ( 2021)), in the forward and reverse diffusion processes, there are latent variables that are denoted by z t . For every time step, where t ∈ [0, 1], the latent variables have the same shape as the training data samples (z t ∈ R E×L ). The forward diffusion process, which is a Gaussian process in continuous time can be given as: q(z t |x) = N (z t ; α t x, σ 2 t I) ,where α t and σ 2 t are smooth, differentiable, positive scalar-valued functions. With α t and σ 2 t the log signal-to-noise ratio is given as: λ t = log (α 2 t /σ 2 t ), which decreases strictly monotonically as t → 1. For any 0 ≤ s ≤ t ≤ 1, the following Gaussian conditional distribution can be given: q(z t |z s ) = N (z t ; α t α s z s , σ2 t|s I), σ2 t|s = (1 -e λt-λs )σ 2 t (2) The reverse process is based on the a posteriori distribution: q(z s |z t , x) = N (z s ; μs|t (z t , x), σ2 s|t I) , where s ≤ t and μs|t (z t , x) = e λt-λs α s α t z t + (1 -e λt-λs )α s x, σ2 s|t = (1 -e λt-λs )σ 2 s (4) In this framework, x in the reverse process is predicted by the neural network xθ (z t , λ t ) with the parameter set θ. Data inference is done by sampling a latent white noise variable z 1 at t = 1, setting a noise controlling γ factor and iteratively applying the following, until t = 0 (Salimans & Ho (2022)): z s = μs|t (z t , xθ (z t , λ t )) + (σ 2 s|t ) 1-γ (σ 2 t|s ) γ ϵ, ϵ ∼ N (0, I) During training, the model is aimed to maximize the variational lower bound (ELBO) on the loglikelihood of the data. However, with a re-parameterization, the weighted ELBO can be given as the weighted mean squared error objective: min θ L(θ) = E ϵ,t ω(λ t )∥x -xθ (z t , λ t )∥ 2 2 (6) , where ω(λ t ) weighting is choosable, however it is ω(λ t ) = max( 2022)) as we found this approach the most efficient one.



, Vilela & Hochberg (2020)) to entertainment (Finke et al. (2009)). BCIs are often based on EEG due to the ability of the technology to measure signals with only a couple of milliseconds difference and its relatively low cost and more comfort. The measurements are then processed by a decoder unit in the BCI that turns the recorded temporal and frequency patterns into actions. (Lotte et al. (2018)) In recent years, deep learning (DL) algorithms have become more and more commonly used in EEG signal processing (Roy et al. (2019), Craik et al. (2019a), Kotowski et al. (2020)). DL models can decode brain signals with high accuracy. However, developing DL models requires a large amount of high-quality data. The size and quality of publicly available data sets are limited, also often insufficient and imbalanced. Recording a new data set can be highly resource-consuming and requires professionals to check the measurements. Another option to augment data sets is data synthesis. (Lashgari et al. (2020)) Score-based models (Tashiro et al. (2021), Song et al. (2021)), diffusion probabilistic models (DPMs) (Ho et al. (2020), Luo & Hu (2021)) and generative adversarial networks (GANs) (Liu et al. (2021), Chan et al. (2021)) hold the state-of-the-art in deep-learning-based generative modelling. The recent advances show the performance and effectiveness DPMs over GANs in both image (Dhariwal & Nichol (2021)) and audio generation (Kong et al. (2021)). There are a handful of works for brain signal generation with GANs (Xu et al. (2022), Hartmann et al. (2018), Fahimi et al. (2019), Panwar et al. (

Diffusion models need many iterations during sampling to synthesize data, making them significantly slower than GANs. Recent studies (Luhman & Luhman (2021), Kong & Ping (2021)) presented multiple ways to fasten the inference from which we applied progressive distillation(Salimans & Ho (

