TIME SERIES SUBSEQUENCE ANOMALY DETECTION VIA GRAPH NEURAL NETWORKS

Abstract

Time series subsequence anomaly detection is an important task in a large variety of real-world applications ranging from health monitoring to AIOps, and is challenging due to complicated underlying temporal dynamics and unpredictable anomalous patterns. Firstly, how to effectively learn the temporal dependency in time series remains a challenge. Secondly, diverse and complicated anomalous subsequences as well as the lack of labels make accurate detection difficult. For example, the popular subsequence anomaly detection algorithm-time series discord-fails to handle recurring anomalies. Thirdly, many existing algorithms require a proper subsequence length for effective detection, which is difficult or impossible in practice. In this paper, we present a novel approach to subsequence anomaly detection which combines practical heuristics of time series discords and temporal relationships with deep neural networks. By performing length selection considering multi-scale information and incorporating prior knowledge using graph neural networks, our method can adaptively learn the appropriate subsequence length as well as integrated representations from both priors and raw data favorable to anomaly detection. In particular, our graph incorporates both semantic and temporal relationships between subsequences. The experimental results demonstrate the effectiveness of the proposed algorithm, which achieves superior performance on multiple time series anomaly benchmarks in comparison with state-of-the-art algorithms. Codes and datasets are available online 1 .

1. INTRODUCTION

Detecting anomalies in time series data has a large variety of practical applications, such as tracing patients' bio-signals for disease detection (Chauhan & Vig, 2015) , monitoring operational data of cloud infrastructure for malfunction location (Zhang et al., 2021) , finding risks in IoT sensing time series (Cook et al., 2019) , etc. It has received a great amount of research interests (Keogh et al., 2005; Yankov et al., 2007; Boniol & Palpanas; Shen et al., 2020; Lu et al., 2022) . The time series anomaly detection (TSAD) problem is commonly formulated to locate anomalies at each point of the time series (namely point-wise TSAD). However, this formulation fails in considering temporal relationships of anomalous points as anomalies can go beyond occurring point by point but tend to exist consecutively over a time interval in many real-world scenarios. For instance, some demand patterns from the power system change during holidays. Figure 1 shows a comparison of point-wise anomalies and subsequence anomalies. In this paper, we investigate TSAD from a subsequence perspective by identifying anomalous patterns in a time interval, which is called time series subsequence anomaly detection. Generally speaking, a subsequence anomaly is a sequence of observations that deviates considerably from some concept of normality. The somewhat "vague" definition itself also hints the challenges of the subsequence anomaly detection problem. Also, a distinguishing feature of time series is temporal dependency. Thus, how to learn and utilize the temporal dependency for different time series data is a key challenge in time series anomaly detection. Moreover, another key challenge in time series subsequence anomaly detection is how to determine the appropriate subsequence length, as illustrated in Figure 2 . This problem becomes worse when there are multiple abnormal subsequences with different lengths in one series. An anomalous subsequence is inside the dark grey zone. If we directly detect anomalies using this length, the anomaly might not be found as it is very similar to normal subsequences, e.g., the green zone. Instead, it is better to select a longer window size (marked in light grey) including the anomaly with its context to highlight the anomalous pattern. Early research on anomaly detection mainly relies on shallow representations Breunig et al. ( 2000 They introduce a suitable neural architecture for modeling time series and detecting anomalies by computing the distance between a target with its corresponding reference in latent representation space, where the reference represents normal patterns. The main issue is that these deep learningbased methods are difficult to enforce assumptions of anomalies, and typically require large training datasets to learn accurate models. In contrast, time series discord (Keogh et al., 2005; Yeh et al., 2016; Nakamura et al., 2020; Lu et al., 2022) is another category of distance-based TSAD methods. Discords are subsequences that are maximally different from all others in the time series, where the difference is measured via z-normalized Euclidean distance. The most appealing merit of discords is that anomalies can be discovered by merely examining the test data without a training phase. In spite (or perhaps because) of their extremely simple assumptions, discords are competitive with deep learning methods. However, there are still several important limitations that prevent them from broader applications. First, they fail in detecting anomalous patterns recurring at least twice, as each occurrence will be the others' nearest neighbor. Second, they rely on an explicit distance measure (z-normalized Euclidean distance), which cannot account for diversified anomalies flexibly, as some anomalous patterns might be slight in the data space. Details about the Deep-SVDD and discord algorithms are provided in Appendix B. Moreover, most existing methods utilize a predefined window length to detect anomalies, which is difficult or impossible to tune in practice. Here we emphasize the importance of the appropriate window size which highlights the normal or anomalous pattern of a subsequence. On the one hand, the duration of anomalies varies. For example, if we use a large window to detect spikes and dips, the anomalies might be "hidden" in the normal data. While for a long-term anomaly, a short window cannot depict the full picture of it. On the other hand, even if we have a prior anomaly length, it is still necessary to intelligently infer a suitable length according to data characteristics. To the best of our knowledge, none of the existing algorithms can detect anomalous subsequences with different lengths and characteristics intelligently. For more details of related work, an extensive literature review is provided in Appendix A. In this paper, we devote to resolve the aforementioned challenges in TSAD by fusing practical heuristics of time series discords with deep neural networks, and propose GraphSAD, a graph neural network-based subsequence anomaly detection approach. Specifically, we construct graphs in which nodes represent subsequences and edges encode the relationship between corresponding subsequences. A unique feature of our graph is that we consider both the pair-wise semantic and temporal relationships between subsequences. As a result, the temporal dependency from time series is incorporated in graph and utilized in further detection. Besides, in order to intelligently learn subsequence length, we introduce a multi-scale feature encoder to generate representations of multiscale subsequence length with a length selection strategy to select the proper length. The proposed algorithm GraphSAD can intelligently detect different anomalous subsequences, which greatly im-



https://anonymous.4open.science/r/GraphSAD-B082



Figure 1: Point-wise Anomalies (Top) versus Subsequence Anomalies (Bottom). The top is a website traffic time series with anomalies labeled by red dots that might be caused by cyberattacks. The bottom is an insect's activity signal recorded with an EPG apparatus, where time intervals marked in grey are subsequences exhibiting different anomalous characteristics, including period length variation, spike, and temporal morphological change.

);Schölkopf et al. (2001); Tax & Duin (2004). Later, Deep-SVDD(Ruff et al., 2018)  enhances the representation learning capability using neural networks. Recently, TSAD methods(Carmona et al.,  2022; Shen et al., 2020)  based on Deep-SVDD are prevailing due to their excellent performance.

