RECURSIVE TIME SERIES DATA AUGMENTATION

Abstract

Time series observations can be seen as realizations of an underlying dynamical system governed by rules that we typically do not know. For time series learning tasks we create our model using available data. Training on available realizations, where data is limited, often induces severe over-fitting thereby preventing generalization. To address this issue, we introduce a general recursive framework for time series augmentation, which we call the Recursive Interpolation Method (RIM). New augmented time series are generated using a recursive interpolation function from the original time series for use in training. We perform theoretical analysis to characterize the proposed RIM and to guarantee its performance under certain conditions. We apply RIM to diverse synthetic and real-world time series cases to achieve strong performance over non-augmented data on a variety of learning tasks. Our method is also computationally more efficient and leads to better performance when compared to state of the art time series data augmentation.

1. INTRODUCTION

The recent success of machine learning (ML) algorithms depends on the availability of a large amount of data and prodigious computing power, which in practice are not always available. In real world applications, it is often impossible to indefinitely sample and ideally, we would like the ML model to make good decisions with a limited number of samples. To overcome these issues, we can exploit additional information such as the structure or invariance in the data that help the ML algorithms efficiently learn and focus on the most important features for solving the task. In ML, the exploitation of structure in the data has been handled using four different yet complementary approaches: 1) Architecture design, 2) Transfer learning, 3) Data representation, and 4) Data augmentation. Our focus in this work is on data augmentation approaches in the context of time series learning. Time series representations do not expose the full information of the underlying dynamical system Prado (1998) in a way that ML can easily recognize. For instance, in financial time series data, there are patterns at various scales that can be learned to improve performance. At a more fundamental level, time series are one-dimensional projections of a hypersurface of data called the phase space of a dynamical system. This projection results in a loss of information regarding the dynamics of the system. However, we can still make inferences about the dynamical system that projects a time series realization. Our approach is to use these inferences to generate additional time series data from the original realization to build richer representations and improve time series pattern identification resulting in more optimal parameters and reduced variance. We show that our methodology is applicable to a variety of ML algorithms. Time series learning problems depend on the observed historical data used for training. We often use a set of time series data to train the ML model. Each element in the set can be viewed as a sample derived from the underlying stochastic dynamical system. However, each historical time series data sample is only one particular realization of the underlying stochastic dynamical system in the real world that we are trying to learn. Our work focuses on problems where available realizations are limited but is not limited to these problems. In fact, our method can be applied to any time series learning task such as stock price prediction where we often have a single realization or problems with numerous realizations such as speech recognition where many audio clips are available for training. Let us consider the stock price prediction problem. The task is to predict or classify the trend of future price. Ideally we want our model to perform well by capturing the stochastic dynamics of stock markets. However, we only train the model using a single time series realization or limited historical realizations. As a result, we do not truly capture the characteristic behaviour of the underlying dynamical system. Using the original training data and hence one or a few realizations of the underlying dynamical system usually induces over-fitting. This is not ideal as we want our model to perform well in the stochastic system instead of just a specific realization of that system. Contributions. The contributions of our work are as follows: • We present a time series augmentation technique based on recursive interpolation. • We provide a theoretical analysis of learning improvement for the proposed time series augmentation method: -We show that our recursive augmentation allows us to control by how much the augmented time series trajectory deviates from the original time series trajectory (Theorem 3.1) and that there is a natural trade-off that is induced when our augmentation deviates considerably from the original time series (Theorem 3.2). -We demonstrate that our learning bound depends on the dimension and properties of the time series, as well as the neural network structure (Theorems 3.3 and 3.4). -We believe that this work is the first to offer a theoretical ML framework for time series data augmentation with guarantees for variance reduction in the learned model (Theorem 3.5). • We empirically demonstrate learning improvements using synthetic data as well as real world time series datasets. Outline of the paper. Section 2 presents the literature review. Section 3 defines the notations, the problem setting, and provides the main theoretical results. Section 4 describes the experimental results, and Section 5 concludes with a summary and a discussion of future work.

2. RELATED WORK

Augmentation for Computer Vision. In the computer vision context, there are multiple ways to augment image data like cropping, rotation, translation, flipping, noise injection and so on. Among them, the mixup technique proposed in Zhang et al. ( 2018) is similar to our approach. They train a neural network on convex combinations of pairs of images and their labels. However, just applying a static technique to dynamic time series data is not appropriate. Chen et al. (2020) showed that data augmentation has a similar effect to an averaging operation over the orbits of a certain group of transformation that keep the data distribution invariant. (2019) . These approaches are problem dependent and do not offer theoretical guaranteed learning improvement. In addition, the learning-based methods require large amounts of training data.



Time Series. There is an exhaustive list of transformations applied to time series that are usually used as data augmentation Wen et al. (2020a). Fawaz et al. (2018) described transformations in the time domain such as time warping and time permutation. There are methods that belong to the magnitude domain such as magnitude warping, Gaussian noise injection, quantization, scaling, and rotation Wen & Keyes (2019). There exists other transformations on time series in the frequency and time-frequency domains that are based on Discrete Fourier Transform (DFT). In this context, they apply transformations in the amplitude and phase spectra of the time series and apply the reverse DFT to generate a new time series signal Gao et al. (2020). Besides the transformations in different domains, there are also more advanced methods, including decomposition-based methods such as the Seasonal and Trend decomposition using Loess (STL) method and its variants Cleveland et al. (1990); Wen et al. (2020b), statistical generative models Kang et al. (2020), and learning-based methods. The learning-based methods can be further divided into embedding space DeVries & Taylor (2017), and deep generative models (DGMs) Esteban et al. (2017); Yoon et al.

