UNSUPERVISED REPRESENTATION LEARNING FOR TIME SERIES WITH TEMPORAL NEIGHBORHOOD CODING

Abstract

Time series are often complex and rich in information but sparsely labeled and therefore challenging to model. In this paper, we propose a self-supervised framework for learning generalizable representations for non-stationary time series. Our approach, called Temporal Neighborhood Coding (TNC), takes advantage of the local smoothness of a signal's generative process to define neighborhoods in time with stationary properties. Using a debiased contrastive objective, our framework learns time series representations by ensuring that in the encoding space, the distribution of signals from within a neighborhood is distinguishable from the distribution of non-neighboring signals. Our motivation stems from the medical field, where the ability to model the dynamic nature of time series data is especially valuable for identifying, tracking, and predicting the underlying patients' latent states in settings where labeling data is practically impossible. We compare our method to recently developed unsupervised representation learning approaches and demonstrate superior performance on clustering and classification tasks for multiple datasets.

1. INTRODUCTION

Real-world time-series data is high dimensional, complex, and has unique properties that bring about many challenges for data modeling (Yang & Wu, 2006) . In addition, these signals are often sparsely labeled, making it even more challenging for supervised learning tasks. Unsupervised representation learning can extract informative low-dimensional representations from raw time series by leveraging the data's inherent structure, without the need for explicit supervision. These representations are more generalizable and robust, as they are less specialized for solving a single supervised task. Unsupervised representation learning is well studied in domains such as vision (Donahue & Simonyan, 2019; Denton et al., 2017; Radford et al., 2015) and natural language processing (Radford et al., 2017; Young et al., 2018; Mikolov et al., 2013) , but has been underexplored in the literature for time series settings. Frameworks designed for time series need to be efficient and scalable because signals encountered in practice can be long, high dimensional, and high frequency. Moreover, it should account for and be able to model dynamic changes that occur within samples, i.e., non-stationarity of signals. The ability to model the dynamic nature of time series data is especially valuable in medicine. Health care data is often organized as a time series, with multiple data types, collected from various sources at different sampling frequencies, and riddled with artifacts and missing values. Throughout their stay at the hospital or within the disease progression period, patients transition gradually between distinct clinical states, with periods of relative stability, improvement, or unexpected deterioration, requiring escalation of care that alters the patient's trajectory. A particular challenge in medical time-series data is the lack of well-defined or available labels that are needed for identifying the underlying clinical state of an individual or for training models aimed at extracting low-dimensional representations of these states. For instance, in the context of critical-care, a patient's stay in the critical care unit (CCU) is captured continuously via streaming physiological signals by the bedside monitor. Obtaining labels for the patient's state for extended periods of these signals is practically impossible as the underlying physiological state can be unknown even to the clinicians. This further motivates the use of unsupervised representation learning in these contexts. Learning rich representations can be crucial in facilitating the tracking of disease progression, predicting the future trajectories of the patients, and tailoring treatments to these underlying states. In this paper, we propose a self-supervised framework for learning representations for complex multivariate non-stationary time series. This approach, called Temporal Neighborhood Coding (TNC), is designed for temporal settings where the latent distribution of the signals changes over time, and it aims to capture the progression of the underlying temporal dynamics. TNC is efficient, easily scalable to high dimensions, and can be used in different time series settings. We assess the quality of the learned representations on multiple datasets and show that the representations are general and transferable to many downstream tasks such as classification and clustering. We further demonstrate that our method outperforms existing approaches for unsupervised representation learning, and it even performs closely to supervised techniques in classification tasks. The contributions of this work are three-fold: 1. We present a novel neighborhood-based unsupervised learning framework for non-stationary multivariate time series data. 2. We introduce the concept of a temporal neighborhood with stationary properties as the distribution of similar windows in time. The neighborhood boundaries are determined automatically using the properties of the signal and statistical testing. represents a window of time series of length δ, centered around time t, that includes measurements of all features taken in the interval [t -δ 2 , t + δ 2 ]. Throughout the paper, we refer to this window as W t for notational simplicity. Our goal is to learn the underlying representation of W t , and by sliding this window over time, we can obtain the trajectory of the underlying states of the signal. We define the temporal neighborhood (N t ) of a window W t as the set of all windows with centroids t * , sampled from a normal distribution t * ∼ N (t, η • δ). Where N is a Gaussian centered at t, δ is the size of the window, and η is the parameter that defines the range of the neighborhood. Relying on the local smoothness of a signal's generative process, the neighborhood distribution is characterized as a Gaussian to model the gradual transition in temporal data, and intuitively, it approximates the distribution of samples that are similar to W t . The η parameter determines the neighborhood range and depends on the signal characteristics and how gradual the time series's statistical properties change over time. This can be set by domain experts based on prior knowledge of the signal behavior, or for more robust estimation, it can be determined by analyzing the stationarity properties of the signal for every W t . Since the neighborhood represents similar samples, the range should identify the approximate time span within which the signal remains stationary, and the generative process does not change. For this purpose, we use the Augmented Dickey-Fuller (ADF) statistical test to



We incorporate concepts from Positive Unlabeled Learning, specifically, sample weight adjustment, to account for potential bias introduced in sampling negative examples for the contrastive loss.2 METHODWe introduce a framework for learning representations that encode the underlying state of a multivariate, non-stationary time series. Our self-supervised approach, TNC, takes advantage of the local smoothness of the generative process of signals to learn generalizable representations for windows of time series. This is done by ensuring that in the representation space, the distribution of signals proximal in time is distinguishable from the distribution of signals far away, i.e., proximity in time is identifiable in the encoding space. We represent our multivariate time series signals as X ∈ R D×T , where D is the number of features and T is the number of measurements over time.X [t-δ 2 ,t+ δ 2 ]

