REPRESENTATION LEARNING FOR IMPROVED INTER-PRETABILITY AND CLASSIFICATION ACCURACY OF CLINICAL FACTORS FROM EEG

Abstract

Despite extensive standardization, diagnostic interviews for mental health disorders encompass substantial subjective judgment. Previous studies have demonstrated that EEG-based neural measures can function as reliable objective correlates of depression, or even predictors of depression and its course. However, their clinical utility has not been fully realized because of 1) the lack of automated ways to deal with the inherent noise associated with EEG data at scale, and 2) the lack of knowledge of which aspects of the EEG signal may be markers of a clinical disorder. Here we adapt an unsupervised pipeline from the recent deep representation learning literature to address these problems by 1) learning a disentangled representation using β-VAE to denoise the signal, and 2) extracting interpretable features associated with a sparse set of clinical labels using a Symbol-Concept Association Network (SCAN). We demonstrate that our method is able to outperform the canonical baseline classification method on a number of factors, including participant age and depression diagnosis. Furthermore, our method recovers a representation that can be used to automatically extract denoised Event Related Potentials (ERPs) from novel, single EEG trajectories, and supports fast supervised re-mapping to various clinical labels, allowing clinicians to re-use a single EEG representation regardless of updates to the standardized diagnostic system. Finally, single factors of the learned disentangled representations often correspond to meaningful markers of clinical factors, as automatically detected by SCAN, allowing for human interpretability and post-hoc expert analysis of the recommendations made by the model.

1. INTRODUCTION

Mental health disorders make up one of the main causes of the overall disease burden worldwide (Vos et al., 2013) , with depression (e.g., Major Depressive Disorder, MDD) believed to be the second leading cause of disability (Lozano et al., 2013; Whiteford et al., 2013) , and around 17% of the population experiencing its symptoms at some point throughout their lifetime (McManus et al., 2016; 2009; Kessler et al., 1993; Lim et al., 2018) . At the same time diagnosing mental health disorders has many well-identified limitations (Insel et al., 2010) . Despite the existence of diagnostic manuals , 2013) , diagnostic consistency between expert psychiatrists and psychologists with decades of professional training can be low, resulting in different diagnoses in upwards of 30% of the cases (Cohen's Kappa = 0.66) (Lobbestael et al., 2011) . Even if higher inter-rater reliability was achieved, many psychological disorders do not have a fixed symptom profile, with depression alone having many hundreds of possible symptom combinations (Fried & Nesse, 2015) . This means that any two people with the same SCID diagnosis can exhibit entirely different symptom expressions. This is a core challenge for developing an objective, symptom-driven diagnostic tool in this domain. Electroencephalography (EEG) is a measurement of post-synaptic electrical potentials that can be taken non-invasively at the scalp. EEG signals can function as important biomarkers of clinical disorders (Hajcak et al., 2019) but they are difficult to clean and interpret at scale. For example, components of the EEG signal can often significantly overlap or interfere with each other. Furthermore, nearby electronics, line noise, hardware quality, signal drift and other variations in the electrode-scalp connection can all distort the recorded EEG signal. Hence, the extraction of EEG data of sufficient quality is usually a laborious, semi-automated process executed by lab technicians with extensive training. A typical EEG analysis pipeline consists of collecting EEG recordings evoked from a large number of stimulus presentations (trials) in order to have sufficient data to average out the noise. Independent Components Analysis (ICA) is often used to visually identify and remove the component that corresponds to eye blinks (Delorme & Makeig, 2004; Makeig et al., 2004; Jung et al., 2000) (although see Weber et al. (2020); Nolan et al. (2010) as examples of fully automated artifact removal pipelines) . This can be followed by a trial rejecton stage where anomalous trials are identified and removed from the EEG data scroll, sometimes also through visual examination. The cleaned up EEG recordings from a large number of trials are then averaged to produce an Event Related Potential (ERP) (Luck, 2012) . This allows a clinician to extract specific ERP components relevant to the clinical factor of interest, average out the event-locked activity within them, and then either perform a statistical group comparison, or-in



Figure 1: Pipeline schematic. Participants are presented with images from the International Affective Picture System (IAPS). EEG trajectories recorded from the same participant over multiple trials are averaged to create ERPs. Each EEG sample consists of 256 time samples (-248-772 ms, where 0 is stimulus onset time) of stimulus-locked activity recorded from three sites as participants view either neutral (red) or positive (blue) IAPS images. ERP responses to positive and negative images are concatenated and normalised between [0, 1] across all channels simultaneously. The resulting "images" are used to train β-VAE or AE. A well disentangled pre-trained β-VAE model is used to train SCAN. Each SCAN training example consists of a 5-hot binary classification label y presented to the SCAN encoder, and its corresponding EEG "image" x presented to the β-VAE encoder. β-VAE weights are fixed during SCAN training. To obtain classification, an EEG "image" is presented to the β-VAE encoder, then the inferred z x means are fed through the SCAN decoder, where a per-class softmax is applied to obtain the predicted label (red pathway). To analyse SCAN classification decisions, a 1-hot binary vector is fed into SCAN encoder. Samples from the inferred distribution z y are then fed through the β-VAE decoder to visualise the corresponding ERP reconstructions (purple pathway). like Structured Clinical Interview for Diagnostic and Statistical Manual of Mental Disorders (SCID) (DSM-V, 2013), diagnostic consistency between expert psychiatrists and psychologists with decades of professional training can be low, resulting in different diagnoses in upwards of 30% of the cases (Cohen's Kappa = 0.66)(Lobbestael et al., 2011). Even if higher inter-rater reliability was achieved, many psychological disorders do not have a fixed symptom profile, with depression alone having many hundreds of possible symptom combinations(Fried & Nesse, 2015). This means that any two people with the same SCID diagnosis can exhibit entirely different symptom expressions. This is a core challenge for developing an objective, symptom-driven diagnostic tool in this domain.

