HIDDEN MARKOV MIXTURE OF GAUSSIAN PROCESS FUNCTIONAL REGRESSION: UTILIZING MULTI-SCALE STRUCTURE FOR TIME-SERIES FORECASTING

Abstract

The mixture of Gaussian process functional regressions (GPFRs) assumes that there are a batch of time-series or sample curves which are generated by independent random processes with different temporal structures. However, in the real situations, these structures are actually transferred in a random manner from a long time scale. Therefore, the assumption of independent curves is not true in practice. In order to get rid of this limitation, we propose the hidden Markov based GPFR mixture model (HM-GPFR) by describing these curves with both fine and coarse level temporal structures. Specifically, the temporal structure is described by the Gaussian process model at the fine level and hidden Markov process at the coarse level. The whole model can be regarded as a random process with state switching dynamics. To further enhance the robustness of the model, we also give a priori to the model parameters and develop Bayesian hidden Markov based GPFR mixture model (BHM-GPFR). Experimental results demonstrate that the proposed methods have both high prediction accuracy and good interpretability.

1. INTRODUCTION

The time-series considered in this paper has the multi-scale structure: the coarse level and the fine level. We have observations (y 1 , . . . , y T ) where each y t = (y t,1 , . . . , y t,L ) itself is a time-series of length L. The whole time-series is arranged as y 1,1 , y 1,2 , . . . , y 1,L , y 2,1 , y 2,2 , . . . , y 2,L , . . . , y T,1 , y T,2 , . . . , y T,L . (1) The subscripts of {y t } T t=1 are called coarse level indices, while the subscripts of {y t,i } L i=1 are called fine level indices. Throughout this paper, we take the electricity load dataset as a concrete example. The electricity load dataset consists of T = 365 consecutive daily records, and in each day there are L = 96 samples recorded every quarter-hour. In this example, the coarse level indices denote "day", while the fine level indices correspond to the time resolution of 15 minutes. The aim is to forecast both short-term and long-term electricity loads based on historical records. There may be partial observations y T +1,1 , . . . , y T +1,M with M < L, so the entire observed time-series has the form y 1,1 , y 1,2 , . . . , y 1,L , y 2,1 , y 2,2 , . . . , y 2,L , . . . , y T,1 , y T,2 , . . . , y T,L , y T +1,1 , . . . , y T +1,M . (2) The task is to predict future response y t * ,i * where t * ≥ T + 1, 1 ≤ i * ≤ L are positive integers. The coarse level and fine level provide different structural information about the data generation process. In the coarse level, each y t can be regarded as a time-series, and there is certain cluster structure (Shi & Wang, 2008; Wu & Ma, 2018) underlying these time-series {y t } T t=1 : we can divide {y t } T t=1 into groups such that time-series within each group share a similar evolving trend. Back to the electricity load dataset, such groups correspond to different electricity consumption patterns. We use z t to denote the cluster label of y t . In the fine level, observations {y t,i } L i=1 can be regarded as a realization of a stochastic process, and the properties of the stochastic process are determined by the cluster label z t . The mixture of Gaussian processes functional regression (mix-GPFR) model (Shi & Wang, 2008; Shi & Choi, 2011) is powerful for analyzing functional data or batch data, and it is applicable to the multi-scale time-series forecasting task. Mix-GPFR assumes there are K Gaussian processes functional regression (GPFR) (Shi et al., 2007) components, and associated with each y t there is a latent variable z t indicating y t is generated by which GPFR component. Since GPFR is good at capturing temporal dependency, this model successfully utilizes the structure information in the fine level. However, the temporal information in the coarse level is totally ignored since mix-GPFR assumes {z t } T t=1 are i.i.d. . In this work, we propose to model the temporal dependency in the coarse level by the hidden Markov model, which characterizes the switching dynamics of z 1 , . . . , z T by the transition probability matrix. We refer to the proposed model as HM-GPFR. Mix-GPFR is able to effectively predict y T +1,M +1 , . . . , y T +1,L when M > 0. To predict the responses y T +1,i * , we must determine the cluster label z T +1 based on observations y T +1,1 , . . . , y T +1,M , otherwise we do not know y T +1 is governed by which evolving pattern. If there is no observation at day T + 1 (i.e., M = 0), then mix-GPFR fails to identify the stochastic process that generates y T +1 . For the same reason, mix-GPFR is not suitable for long-term forecasting (t * > T + 1). On the other hand, HM-GPFR is able to infer z t * for any t * based on the transition probabilities of the hidden Markov model even M = 0. Therefore, HM-GPFR makes use of coarse level temporal information and solves the cold start problem in mix-GPFR. Besides, when a new day's records y T +1 have been fully observed, one needs to re-train a mix-GPFR model to utilize y T +1 , while HM-GPFR can adjust the parameters incrementally without retraining the model.

2. RELATED WORKS

Gaussian process (Rasmussen & Williams, 2006 ) is a powerful non-parametric Bayesian model. In (Girard et al., 2002; Brahim-Belhouari & Bermak, 2004; Girard & Murray-Smith, 2005) , GP has been applied for time-series forecasting. Shi et al.proposed the GPFR model to process batch data (Shi et al., 2007) . To effectively model multi-modal data, the mixture structure is further introduced to GPFR and the mix-GPFR model was proposed (Shi & Wang, 2008; Shi & Choi, 2011) . In (Wu & Ma, 2018; Li et al., 2019; Cao et al., 2021) , GP related methods for electricity load prediction have been evaluated thoroughly. However, in these works daily records are treated as i.i.d. samples, and the temporal information in the coarse level is ignored. Multi-scale time-series was proposed in (Ferreira et al., 2006; Ferreira & Lee, 2007b; a) , and further developments in this direction have been achieved in recent years. The time-series considered in this work is different from the multi-scale time-series since at the coarse level there is no aggregated observation from the samples at the fine level. In this paper, we mainly emphasize the multi-scale structure of the time-series.

3.1. HIDDEN MARKOV MODEL

For a sequence of observations {y t } T t=1 , the hidden Markov model (HMM) (Rabiner & Juang, 1986; Elliott et al., 2008) assumes there is a hidden state variable z t associated with y t . The sequence of hidden states {z t } T t=1 forms a homogeneous Markov process. Usually, {z t } T t=1 are categorical variables taking values in {1, . . . , K}, and the transition dynamics is governed by P(z t = l|z t-1 =



Figure 1: An illustration of multi-scale time-series.

