A TRANSFORMER-BASED FRAMEWORK FOR MULTI-VARIATE TIME SERIES REPRESENTATION LEARNING

Abstract

In this work we propose for the first time a transformer-based framework for unsupervised representation learning of multivariate time series. Pre-trained models can be potentially used for downstream tasks such as regression and classification, forecasting and missing value imputation. We evaluate our models on several benchmark datasets for multivariate time series regression and classification and show that they exceed current state-of-the-art performance, even when the number of training samples is very limited, while at the same time offering computational efficiency. We show that unsupervised pre-training of our transformer models offers a substantial performance benefit over fully supervised learning, even without leveraging additional unlabeled data, i.e., by reusing the same data samples through the unsupervised objective.

1. INTRODUCTION

Multivariate time series (MTS) are an important type of data that is ubiquitous in a wide variety of domains, including science, medicine, finance, engineering and industrial applications. Despite the recent abundance of MTS data in the much touted era of "Big Data", the availability of labeled data in particular is far more limited: extensive data labeling is often prohibitively expensive or impractical, as it may require much time and effort, special infrastructure or domain expertise. For this reason, in all aforementioned domains there is great interest in methods which can offer high accuracy by using only a limited amount of labeled data or by leveraging the existing plethora of unlabeled data. There is a large variety of modeling approaches for univariate and multivariate time series, with deep learning models recently challenging or replacing the state of the art in tasks such as forecasting, regression and classification (De Brouwer et al., 2019; Tan et al., 2020a; Fawaz et al., 2019b) . However, unlike in domains such as Computer Vision or Natural Language Processing (NLP), the dominance of deep learning for time series is far from established: in fact, non-deep learning methods such as TS-CHIEF (Shifaz et al., 2020) In this work, we investigate, for the first time, the use of a transformer encoder for unsupervised representation learning of multivariate time series, as well as for the tasks of time series regression and classification. Transformers are an important, recently developed class of deep learning models, which were first proposed for the task of natural language translation (Vaswani et al., 2017) but have since come to monopolize the state-of-the-art performance across virtually all NLP tasks (Raffel et al., 2019) . A key factor for the widespread success of transformers in NLP is their aptitude for learning how to represent natural language through unsupervised pre-training (Brown et al., 2020; Raffel et al., 2019; Devlin et al., 2018) . Besides NLP, transformers have also set the state of the art in several domains of sequence generation, such as polyphonic music composition (Huang et al., 2018) . Transformer models are based on a multi-headed attention mechanism that offers several key advantages and renders them particularly suitable for time series data (see Appendix section A.4 for details). Inspired by the impressive results attained through unsupervised pre-training of transformer models in NLP, as our main contribution, in the present work we develop a generally applicable



, HIVE-COTE(Lines et al., 2018), and ROCKET (Dempster  et al., 2020) currently hold the record on time series regression and classification dataset benchmarks(Tan et al., 2020a; Bagnall et al., 2017), matching or even outperforming sophisticated deep architectures such as InceptionTime(Fawaz et al., 2019a)  andResNet (Fawaz et al., 2019b).

