ETSFORMER: EXPONENTIAL SMOOTHING TRANS-FORMERS FOR TIME-SERIES FORECASTING

Abstract

Transformers have recently been actively studied for time-series forecasting. While often showing promising results in various scenarios, traditional Transformers are not designed to fully exploit the characteristics of time-series data and thus suffer some fundamental limitations, e.g., they are generally not decomposable or interpretable, and are neither effective nor efficient for long-term forecasting. In this paper, we propose ETSformer, a novel time-series Transformer architecture, which exploits the principle of exponential smoothing methods in improving Transformers for time-series forecasting. Specifically, ETSformer leverages a novel level-growth-seasonality decomposed Transformer architecture which leads to more interpretable and disentangled decomposed forecasts. We further propose two novel attention mechanisms -the exponential smoothing attention and frequency attention, which are specially designed to overcome the limitations of the vanilla attention mechanism for time-series data. Extensive experiments on the long sequence time-series forecasting (LSTF) benchmark validates the efficacy and advantages of the proposed method. Code is attached in the supplementary material, and will be made publicly available.

1. INTRODUCTION

Transformer models have achieved great success in the fields of natural language processing (Vaswani et al., 2017; Devlin et al., 2019 ), computer vision (Carion et al., 2020; Dosovitskiy et al., 2021) , and even more recently, time-series (Li et al., 2019; Wu et al., 2021; Zhou et al., 2021; Zerveas et al., 2021; Zhou et al., 2022) . While the success of Transformer models have been widely attributed to the self-attention mechanism, alternative forms of attention, infused with the appropriate inductive biases, have been introduced to tackle the unique properties of their underlying task or data (You et al., 2020; Raganato et al., 2020) . In time-series forecasting, decomposition-based architectures such as Autoformer and FEDformer models (Wu et al., 2021; Zhou et al., 2022) have incorporated time-series specific inductive biases, leading to increased accuracy, and more interpretable forecasts (by decomposing forecasts into seasonal and trend components). Their success has been motivated by: (i) disentangling seasonal and trend representations via seasonal-trend decomposition (Cleveland & Tiao, 1976; Cleveland et al., 1990; Woo et al., 2022) , and (ii) replacing the vanilla pointwise dot-product attention which handle time-series patterns such as seasonality and trend inefficiently, with time-series specific attention mechanisms such as the Auto-Correlation mechanism and Frequency-Enhanced Attention. While these existing work introduce the promising direction of interpretable and decomposed time-series forecasting for Transformer-based architectures, they suffer from two drawbacks. Firstly, they suffer from entangled seasonal-trend representations, evidenced in Figure 1 , where the trend forecasts exhibit periodical patterns which should only be present in the seasonal component, and the seasonal component does not accurately track the (multiple) periodicities present in the ground truth seasonal component. This arises due to their decomposition mechanism which detects trend via a simple moving average over the input signal and detrends the signal by removing the detected trend component -an arguably naive approach. This method has many known pitfalls (Hyndman & Athanasopoulos, 2018) , such as the trend-cycle component not being available for the first and last few observations, and over-smoothing rapid rises and falls. Secondly, their proposed replacements for the vanilla attention mechanism are not human interpretable -demonstrated in Section 3.3. Model inspection and diagnosis allows us to better understand the fore-

