EFFECTIVE SELF-SUPERVISED TRANSFORMERS FOR SPARSE TIME SERIES DATA

Abstract

Electronic health records (EHRs) recorded in hospital settings such as intensive care units (ICUs) typically contain a wide range of numeric time series data that is characterized by high sparsity and irregular observations. Self-supervised Transformer architectures have shown outstanding performance in a variety of structured tasks in natural language processing and computer vision. However, the sparse irregular time series nature of ICU EHR data poses challenges for the application of transformers that have not been widely explored. One of the major challenges is the quadratic scaling of self-attention layers that can significantly limit the input sequence length. In this work, we introduce TESS, Transformers for EHR data with Self Supervised learning, a self-supervised Transformerbased architecture designed to extract robust representations from EHR data. We propose the application of input binning to aggregate the time series inputs and sparsity information into a regular sequence with fixed length, enabling the training of larger and deeper Transformers. We demonstrate that significant compression of ICU EHR data is possible without sacrificing useful information, likely due to the highly correlated nature of observations in small time bins. We then introduce self-supervised prediction tasks that provide rich and informative signals for model pre-training. TESS outperforms state-of-the-art deep learning models on multiple downstream tasks from the MIMIC-IV and PhysioNet-2012 ICU EHR datasets.

1. INTRODUCTION

Electronic health record (EHR) data collected in the hospital contains an immense amount of information about patients. This data typically comes in the form of vital sign measurements, lab results, and diagnoses/treatments. Patients in an Intensive Care Unit (ICU) are particularly heavily monitored, with frequent vital sign observations and diagnostic tests. The resulting multivariate numeric time series is highdimensional, sparse, and irregularly distributed across time, making it challenging to apply standard time series analysis methods that are primarily designed for densely sampled data. These challenges are not unique to health care, and data with such characteristics commonly arises in fields such as finance, banking, and e-commerce (Cao et al., 2021; Gómez-Losada & Duch-Brown, 2019; Zhang et al., 2015) . Good models of clinical outcomes need to extract predictive signal from the values, frequencies and missingness patterns from such data. Hand-crafting such features is a non-trivial and time-consuming task, which has led to the exploration of deep learning for problems arising in healthcare. However, when the labels are noisy and scarce, such methods too are susceptible to overfitting. Self-supervised learning (SSL) (Chopra et al., 2005; Caron et al., 2021) , has risen in popularity as a tool to reduce the dependence of representation learning on large amounts of labelled data. SSL relies on the premise that domain experts have prior knowledge about the patterns in high-dimensional data; by translating this domain knowledge into pseudo-tasks, practitioners can ensure that this knowledge is transferred to representation learning models prior to fine-1

