MULTIVARIATE TIME-SERIES IMPUTATION WITH DIS-ENTANGLED TEMPORAL REPRESENTATIONS

Abstract

Multivariate time series often faces the problem of missing value. Many time series imputation methods have been developed in literature. However, they all rely on an entangled representation to model dynamics of time series, which may fail to fully exploit the multiple factors (e.g., periodic patterns) presented in the data. Moreover, the entangled representations usually have no semantic meaning, and thus they often lack interpretability. In addition, many recent models are proposed to deal with the whole time series to identify temporal dynamics, but they are not scalable to long time series. Different from existing approaches, we propose TIDER, a novel matrix factorization-based method with disentangled temporal representations that account for multiple factors, namely trend, seasonality, and local bias, to model complex dynamics. The learned disentanglement makes the imputation process more reliable and offers explainability for imputation results. Moreover, TIDER is scalable to long time series. Empirical results show that our method outperforms existing approaches on three typical real-world datasets, especially on long time series, reducing mean absolute error by up to 50%. It also scales well to long datasets on which existing deep learning based methods struggle. Disentanglement validation experiments further highlight the robustness and accuracy of our model.

1. INTRODUCTION

Multivariate time series analysis (e.g., forecasting (Zeng et al., 2021) and classification (Li et al., 2022) ) has a wide spectrum of applications like traffic flow forecasting (Liu et al., 2020) , electricity demand prediction (Kaur et al., 2021) , motion detection (Laddha et al., 2021) , health monitoring (Tonekaboni et al., 2021) , etc. Most of these multivariate time series analysis approaches typically assume intact input for building models. However, real-world multivariate time series tends to have missing values caused by factors like device malfunction, communication failure, or costly measurement, leading to impaired performance of these approaches, or even rendering them inapplicable. In light of this, many time series imputation methods have been proposed to infer missing values from the observed ones. A multivariate time series, denoted as X ∈ R N ×T , consists of N univariate time series (called channels) spanning over T time steps. Hence, it offers two perspectives for imputation: modeling cross-channel correlations and exploiting temporal dynamics. Earlier methods (Batista et al., 2002; Acuna & Rodriguez, 2004; Box et al., 2015) either aggregate observed entries across channels by estimating similarity between distinct channels, or solely exploit local smoothness or linear assumption in the same channel to fill in missing values. Since these methods lack the ability of modeling nonlinear dynamics and complex correlations, they may not perform well in practice.

