SOM-CPC: UNSUPERVISED CONTRASTIVE LEARNING WITH SELF-ORGANIZING MAPS FOR STRUCTURED REPRESENTATIONS OF HIGH-RATE TIME SERIES

Abstract

Continuous monitoring with an ever-increasing number of sensors has become ubiquitous across many application domains. Acquired data are typically highdimensional and difficult to interpret, but they are also hypothesized to lie on a lowdimensional manifold. Dimensionality reduction techniques have, therefore, been sought for. Recently, expressive non-linear deep learning (DL) models have gained popularity over more conventional methods like Principle Component Analysis (PCA) and Self-Organizing Maps (SOMs). However, the resulting latent space of a DL model often remains difficult to interpret. In this work we propose SOM-CPC, a model that jointly optimizes Contrastive Predictive Coding and a SOM to find an organized 2D manifold, while preserving higher-dimensional information. We address a largely unexplored and challenging set of scenarios comprising highrate time series, and show on both synthetic and real-life data (medical sleep data and audio recordings) that SOM-CPC outperforms both DL-based feature extraction, followed by PCA, K-means or a SOM, and strong deep-SOM baselines that jointly optimize a DL model and a SOM. SOM-CPC has great potential to expose latent patterns in high-rate data streams and may therefore contribute to a better understanding of many different processes and systems.

1. INTRODUCTION

The improvement and abundance of sensor technology has led to large amounts of high-dimensional, information-rich continuous data streams. However, gaining actionable insights from these data is challenging due to their low interpretability. The main objective of this study is, therefore, to develop an algorithm for acquiring a structured and interpretable representation of (high-rate) time series. We define such an interpretable representation as one that has the ability to be informative and to facilitate exploration of the underlying structure (Lipton, 2018) . According to the manifold hypothesis, high-dimensional real-world data lies on a low-dimensional manifold, comprising disentangled latent factors of variation. The area of unsupervised representation learning is concerned with models that learn this manifold from a set of training data, without the bias of human annotations. Dimensionality reduction techniques like Principle Component Analysis (PCA), possibly in combination with clustering methods like K-means clustering, have conventionally been used for this purpose. Acquiring an interpretable representation with PCA requires omitting many principle components in order to achieve an interpretable number of components. This, however, may discard important information that can not linearly be projected on these few dimensions. A Self-Organizing Map (Kohonen, 1990) , on the other hand, is an extension of K-means clustering that creates a low-dimensional interpretable visualization, while still representing the data in multiple dimensions. However, SOMs typically act on features, which need to be selected heuristically and may, therefore, strongly depend on the use case and/or data modality. Deep learning (DL) models have become popular alternatives for non-linear dimensionality reduction that can be applied directly on raw data. Such models have been combined with joint clustering objectives in the latent space (Xie et al., 2016; Yang et al., 2017; Madiraju, 2018; Lee & Schaar, 2020) . These methods, however, do typically not create a (visually) interpretable representation, and sometimes make use of label information during training (Lee & Schaar, 2020). To enhance interpretability, latent space representations of DL models are often visualized using a t-distributed stochastic neighbor embedding (t-SNE) (Hinton & Roweis, 2002) . Albeit its frequent use, t-SNE does not allow a direct deployment on unseen data as it does not learn a reusable mapping between the multi-dimensional and the low-dimensional space. To acquire visually interpretable data representations from raw data, without assuming that data must live in two or three dimensions only, non-linear DL encoders have been combined with SOMs (Ferles et al., 2018; Pesteie et al., 2018; Fortuin et al., 2019; Forest et al., 2019; Manduchi et al., 2021; Forest et al., 2021) . In the resulting joint training strategy of these deep-SOM models, the SOM objective can be seen as a regularizer on the encoding procedure, as it promotes a cluster-friendly feature space. Most of these models have focused on autoencoders as feature extractors. However, similar to Mrabah et al. ( 2020), we hypothesize that their reconstruction objective may hamper the clustering or structured representation learning objective: while within-cluster similarities should remain preserved for latent clustering, reconstruction demands a preservation of all factors of similarity. Moreover, in the context of time series representation learning, other self-supervised models -that take the temporal nature of the data into account during training -might be more suitable. Contrastive self-supervised learning approaches have quickly become popular thanks to their superior representation learning performance in many domains (see Le-Khac et al. (2020) for a review). While many of these models rely on data augmentations during training in order to construct pairs of similar data points, Contrastive Predictive Coding (CPC) (Oord et al., 2019) leverages the temporal dimension for this purposes, making it a natural choice for self-supervised representation learning of time series. In CPC, the temporal dimension not only serves as a pretext task, but simultaneously enforces latent smoothness over time. The contributions of this work are as follows: • We propose a new model in the deep-SOM family: SOM-CPC, which is suitable for learning structured and interpretable 2D representations of (high-rate) time series by encoding subsequent data windows to a topologically ordered set of quantization vectors. • Using regression and classification probing tasks, we show that SOM-CPC preserves more information in its 2D representation than CPC that is followed by PCA, and a linear classifier or K-means, or directly encoding CPC's latent space to two dimensions. SOM-CPC's joint optimization, moreover, facilitates a smooth temporal trajectory through 2D space. • We show that SOM-CPC quantitatively and qualitatively outperforms deep-SOM models with a reconstruction objective in terms of both clustering and topological ordering. It, moreover, requires less auxiliary loss functions (and associated hyperparameter tuning) thanks to its natural tendency to incorporate temporal smoothness. Lastly, SOM-CPC's training behavior shows that the SOM clustering objective better aligns with the CPC objective than with a reconstruction loss.

2. PRELIMINARIES

2.1 KOHONEN SELF-ORGANIZING MAPS Kohonen's Self-Organizing Map (SOM) (Kohonen, 1990 ) is an algorithm to find a visually interpretable topological data representation. It has been found useful to reveal intricate patterns and structure in a plethora of applications. The algorithm's output, the low-dimensional visualization, is often referred to as a SOM as well. We choose to use a use a 2D visualization to enhance interpretability. We define a set of data points Z, and quantized counterparts q Φ (z) ∈ Φ for z ∈ Z. The set Φ : {ϕ 1 , . . . , ϕ k } is a trainable quantization codebook containing k vectors or prototypes ϕ i ∈ R F , 1 ≤ i ≤ k. The j th prototype ϕ (n) i=j = q Φ (z) is the 'winning vector' for data point z, at iteration n of the training procedure. The learned codebook vectors are placed on a pre-defined 2D grid by assigning an xy-coordinate to each vector at initialization. Note that this creates a 2D representation, while each data point z still lives in R F , with F ≫ 2. This is conceptually different than the way in which PCA achieves dimensionality reduction to 2D, where all information in the 3 rd and higher principle components is strictly omitted. During training of a SOM, each ϕ i is updated as follows (Kohonen, 1990) , with z ∈ Z: ϕ (n+1) i = ϕ (n) i + η (n) S i ϕ (n) i=j z -ϕ (n) i ,

