LATENT-SPACE SEMI-SUPERVISED TIME SERIES DATA CLUSTERING

Abstract

Time series data is abundantly available in the real world, but there is a distinct lack of large, labeled datasets available for many types of learning tasks. Semisupervised models, which can leverage small amounts of expert-labeled data along with a larger unlabeled dataset, have been shown to improve performance over unsupervised learning models. Existing semi-supervised time series clustering algorithms suffer from lack of scalability as they are limited to perform learning operations within the original data space. We propose an autoencoder-based semisupervised learning model along with multiple semi-supervised objective functions which can be used to improve the quality of the autoencoder's learned latent space via the addition of a small number of labeled examples. Experiments on a variety of datasets show that our methods can usually improve k-Means clustering performance. Our methods achieve a maximum average ARI of 0.897, a 140% increase over an unsupervised CAE model. Our methods also achieve a maximum improvement of 44% over a semi-supervised model.

1. INTRODUCTION

Time series data can be defined as any data which contains multiple sequentially ordered measurements. Real world examples of time series data are abundant throughout many domains, including finance, weather, and medicine. One common learning task is to partition a set of time series into clusters. This unsupervised learning task can be used to learn more about the underlying structure of a dataset, without the need for a supervised learning objective or ground-truth labels. Clustering time series data is a challenging problem because time series data may be high-dimensional, and is not always segmented cleanly, leading to issues with alignment and noise. The most basic methods for time series clustering apply general clustering algorithms to raw time series data. Familiar clustering algorithms like hierarchical clustering or k-Means clustering algorithms may be applied using Euclidean Distance (ED) for comparisons. Although ED can perform well in some cases, it is susceptible to noise and temporal shifting. The improved Dynamic Time Warping (DTW) (Berndt & Clifford, 1994 ) metric provides invariance to temporal shifts, but is expensive to compute for clustering tasks. A more scalable alternative to DTW exists in k-Shape, a measure based on the shape-based distance (SBD) metric for comparing whole time series (Paparrizos & Gravano, 2017) . Shapelet-based approaches such as Unsupervised Shapelets (Zakaria et al., 2012) can mitigate issues with shifting and noise but are limited to extracting a single pattern/feature from each time series. One alternative approach for clustering time series data is to apply dimensionality reduction through the use of an autoencoder. Autoencoders are capable of learning low-dimensional projections of high-dimensional data. Both LSTM and convolutional autoencoders have been shown to be successful at learning latent representations of time series data. These models can extract a large number of features at each time step. After training an autoencoder model, the learned low-dimensional latent representation can then be fed to an arbitrary clustering algorithm to perform the clustering task. Because autoencoder models reduce the dimensionality of the data, they naturally avoid issues with noise, and provide a level of invariance against temporal shifting. Recently, the field of semi-supervised learning has shown great success at boosting the performance of unsupervised models using small amounts of labeled data. Dau et al. (2016) proposes a solution for semi-supervised clustering using DTW. However, this solution is still based on DTW, and as such suffers from scalability issues. He et al. ( 2019) proposes a constraint-propagation approach for semi-supervised clustering, which may (but is not required to) be used in conjunction with DTW. However, this solution still performs time series comparisons within the raw data space, which may cause issues with scalability for large datasets. In this paper, we present a semi-supervised deep learning model based on a convolutional autoencoder (CAE), which may be used to perform clustering on time series datasets. We also present new semi-supervised learning objectives, adapted from well-known internal clustering metrics, which can significantly improve clustering performance when provided with a small number of labeled time series. We perform experiments to show that our semi-supervised model can improve performance relative to an unsupervised model when applied for clustering tasks. We also implement a lightly modified batch-based version of the semi-supervised learning solution shown presented in Ren et al. (2018) , and show that our proposed solutions are competitive. In the best case, our model semi-supervised model shows a best-case improvement in ARI of 140% over an unsupervised CAE model when applying k-Means clustering, and a best-case improvement of 44% over a similar model. In the remainder of this paper, Section 2 reviews the related work on time series clustering, Section 3 presents our proposed method for semi-supervised time series clustering, and in Section 4.1 we discuss our experimental methodology and present our experimental results. Finally, Section 5 details our conclusions and avenues for future research.

2. RELATED WORK

One of the most common ways to perform time series clustering is to apply the k-Means algorithm. By default, k-Means uses Euclidean Distance (ED). ED is efficient to calculate, and in many cases shows good results (Ding et al., 2008) . However, ED comparison will fail when two similar time series are shifted temporally relative to one another. Additionally, ED comparisons are sensitive to noisy data. The Dynamic Time Warping (DTW) metric (Berndt & Clifford, 1994) improves on ED by computing a warping path between a pair of time series. This approach solves issues with temporal shifting, but requires O(N 2 ) time to compute for two time series of length N . Recent work has provided bounds for this computation (Keogh & Ratanamahatana, 2005) , (Lemire, 2009) , but the scalability of DTW remains an issue for large datasets and long time series. The k-Shape algorithm (Paparrizos & Gravano, 2015) is a scalable and performant alternative to DTW, and offers similar performance to DTW at a lower computational cost. The Unsupervised Shapelets (Zakaria et al., 2012) clustering method operates by forming clusters around common subsequnces extracted from the data. This approach provides invariance against shifts since the shapelet may appear anywhere within each time series, and also provides some invariance against noise or outliers within the data, since elementwise comparisons only occur between shapelets, rather than the full time series. In this regard, the UShapelet algorithm has some advantages over DTW and k-Shape. However, this method is constrained to extracting a single shapelet/feature from each time series. Recently, semi-supervised learning has shown the benefit of augmenting a large unlabeled dataset with a small amount of labeled data. There is some existing work for applying semi-supervised learning to time series clustering. The semi-supervised time series clustering solution presented in Dau et al. (2016) proposes a modified version of DTW, which operates in a semi-supervised manner using supervised constraints. However, this method still relies on performing DTW comparison within the original data space, and as such is not a scalable solution for large datasets or long time series. Another methodology for semi-supervised time series clustering is He et al. ( 2019) which is a graph-based approach using supervised examples to generate positive and negative constraints between points. This approach does not rely on DTW, but the algorithm still performs comparisons in the original data space, which can be problematic as the length of the time series grows. Popular deep learning frameworks such as LSTMs and CNNs may also be applied to time series data. Both LSTM and CNN networks may be arranged as autoencoders, allowing for unsupervised feature learning for clustering, compression, or anomaly detection tasks. Holden et al. (2015) use a Convolutional Autoencoder (CAE) model to learn a featurized representation of gait data. Autoencoder architectures may also be applied for anomaly detection, as is shown in Bao et al. (2017) , where the authors use autoencoders for anomaly detection. Performing comparisons on embedded samples avoids many of the issues of direct pairwise comparisons. Since autoencoders reduce di-

