TIME SERIES ANOMALY DETECTION VIA HYPOTHESIS TESTING FOR DYNAMICAL SYSTEMS

Abstract

Real world systems-such as robots, weather, energy systems and stock marketsare complicated and high-dimensional. Hence, without prior knowledge of the system dynamics, detecting or forecasting abnormal events from the sequential observations of the system is challenging. In this work, we address the problem caused by high-dimensionality via viewing time series anomaly detection as hypothesis testing on dynamical systems. This perspective can avoid the dimension of the problem from increasing linearly with time horizon, and naturally leads to a novel anomaly detection model, termed as DyAD (Dynamical system Anomaly Detection). Furthermore, as existing time-series anomaly detection algorithms are usually evaluated on relatively small datasets, we released a large-scale one on detecting battery failures in electric vehicles. We benchmarked several popular algorithms on both public datasets and our released new dataset. Our experiments demonstrated that our proposed model achieves state-of-the-art results.

1. INTRODUCTION

Hypothesis testing aims to decide whether the observed data supports or rejects a default belief known as the null hypothesis. Applications are abundant. In this work, we view anomaly detection as an application of hypothesis testing. This perspective is nothing profound-samples from the null hypothesis can be viewed as in-distribution, and rejection can be viewed as detecting anomalies. Despite being rather straightforward, this view was not carefully investigated in large-scale anomaly detection tasks, because most classical hypothesis testing methods suffer from the curse of dimensionality. In this work, we address the problem incurred by high-dimensionality via focusing on time series data collected from unknown dynamical systems. We exploit the structure of dynamical systems and show that although the time series data can be high dimensional due to the long time horizon, the problem still remains tractable. More specifically, the concentration that leads to statistical confidence does not come from independent variables but from martingales. We turn the high dimensionality caused by the long time horizon into our favor. Furthermore, our analysis leads to a detection procedure in which the anomaly in systems (e.g., errors and attacks) can be isolated from the rarity of system input (e.g., control commands), and hence reduces misclassification rates. By combining the above analysis with autoencoder-based probabilistic models, we develop a new model termed DyAD (DYnamical system Anomaly Detection). We show that the theory-motivated DyAD model can achieve state-of-the-art performances on public datasets including MSL (Mars Science Laboratory rover) (Hundman et al., 2018) and SMAP (Soil Moisture Active Passive satellite) (O'Neill et al., 2010) . To further validate our finding, we then release a much larger (roughly 50 times in terms of data points) dataset to benchmark several popular baselines. Our released dataset focuses on the battery safety problem in electric vehicles. In recent years, electric vehicle (EV) adoption rates increased exponentially due to their environmental friendliness, improved cruise range and reduced costs brought by onboard lithium batteries (Schmuch et al., 2018; Mauler et al., 2021 ). Yet, large-scale battery deployment can lead to unexpected fire incidents and product recalls (Deng et al., 2018) . Hence, accurately evaluating the health status of EV batteries is crucial to the safety of drivers and passengers. To promote research in this field, we release a dataset collected from 301 electric vehicles recorded over 3 months to 3 years. Only battery-related data at charging stations was released for anonymity purposes. 50 of the 301 vehicles eventually suffered from battery

