DEEP GRAPH-LEVEL ORTHOGONAL HYPERSPHERE COMPRESSION FOR ANOMALY DETECTION Anonymous

Abstract

Graph-level anomaly detection aims to identify abnormal samples of a set of graphs in an unsupervised manner. It is non-trivial to find a reasonable decision boundary between normal data and anomalous data without using any anomalous data in the training stage, especially for data in graphs. This paper first proposes a novel deep graph-level anomaly detection model, which learns the graph representation with maximum mutual information between substructure features and global structure features while exploring a hypersphere anomaly decision boundary. We implement an orthogonal projection layer to keep the training data distribution consistent with the decision hypersphere thus avoiding erroneous evaluations. More importantly, we further propose projecting the normal data into the interval region between two co-centered hyperspheres, which makes the normal data distribution more compact and effectively overcomes the issue of outliers falling close to the center of the hypersphere. The numerical and visualization results on a few graph datasets demonstrate the effectiveness and superiority of our methods in comparison to many baselines and state-of-the-art.

1. INTRODUCTION

Anomaly detection is an essential task with various applications, such as detecting abnormal patterns or actions in credit-card fraud, medical diagnosis, sudden natural disasters (Aggarwal, 2017) , etc. Usually, in anomaly detection, the training data only contain normal data and are used to train a model that can distinguish unusual patterns from abnormal ones. Anomaly detection on tabular data and images has been extensively studied recently (Ruff et al., 2018; Goyal et al., 2020; Chen et al., 2022; Liznerski et al., 2021; Sohn et al., 2021) . In contrast, there is little work on graph data despite the fact that graph data anomaly detection is very useful in various problems, such as identifying abnormal communities in social networks or detecting unusual protein structures in biology experiments. Compared with the other types of data, graph data is inherently complicated and rich in structural and relational information. The complexity of graph structure facilitates us to learn graph-level representations with discriminative patterns in many supervised tasks (e.g., graph classification). As for graph-level anomaly detection, however, the intricate graph structure brings many obstacles to this unsupervised learning problem. Graph anomaly detection usually composes four families: anomalous edge (Ouyang et al., 2020; Xu et al., 2020 ), node (Zhu & Zhu, 2020; Bojchevski & Günnemann, 2018 ), sub-graph (Wang et al., 2018; Zheng et al., 2018) , and graph-level detections (Zheng et al., 2019; Chalapathy et al., 2018) . Herein, the target of the graph-level algorithms is to explore a regular group pattern and distinguish the abnormal manifestations of the group. Group abnormal behaviors usually foreshadow some unusual events and thus play an important role in practical applications. In the past five years, few approaches have focused on graph-level anomaly detection because of the difficulty of representing graphs into feature vectors without using any label information. Graph kernel can measure the similarity between graphs and regard the result as a representation non-strictly or implicitly. Based on this, graph anomaly detection task usually performs as two-stage. In our experiments (see Section 4), we also find that one-class SVM with graph kernels sometimes yields unsatisfying performances since graph kernels may not be effective enough to quantify the similarity between graphs. So there is a large room for improvement regarding graph anomaly detection to our best knowledge. 2022) also sought a hypersphere decision boundary and optimized the representations learned by k GNNs close to the reference GNN while maximizing the differences between k GNNs, but did not consider the relationship between the graph-level representation and node features. Collecting all approaches based on the hypersphere assumption in graph anomaly detection, we find that the practical decision region may be an ellipsoid instead of a standard hypersphere, thus causing the error when the standard hypersphere evaluation is employed. Except for that, our experiment also confirms that anomalous data may appear in decision regions that are not filled with normal data, especially near the center of the hypersphere. In order to effectively explore a better representation without label information and obtain a more suitable decision boundary with high efficiency, in this paper, we propose a one-class deep graph-level anomaly detection method and its improved version. The first proposed model, Deep Orthogonal Hypersphere Contraction (DOHSC), uses the mutual information of local feature maps and the global representation to learn a high-quality representation and simultaneously optimizes it to distribute in a hypersphere area. An orthogonal projection layer then renders the decision region more hyperspherical and compact to decrease evaluation errors. With regard to phenomenon that anomalous data falling close to the hyperspherical center, an improved graph-level Deep Orthogonal Bi-Hypersphere Compression (DO2HSC) for anomaly detection architecture is proposed. From a cross-sectional point of view, DO2HSC limits the decision area (of normal data) to an interval enclosed by two co-centered hyperspheres and learns the orthogonality-projected representation similarly. The framework of the methods mentioned above is shown in Figure 1 correspondingly. Furthermore, we define a new evaluation way according to DO2HSC, and comprehensive experimental results verify the effectiveness of all proposed methods. In summary, the main contributions of our work are listed as follows. • First, we present a new graph-level hypersphere contraction algorithm for anomaly detection tasks, which is jointly trained via mutual information loss between local and global representations and hypersphere decision loss. • Second, we impose an orthogonal projection layer on the proposed model to promote training data distribution close to the standard hypersphere, thus avoiding errors arising from inconsistencies between assessment criteria and actual conditions. • Finally, we propose an improved graph-level deep orthogonal bi-hypersphere compression model to further explore a decision region enclosed by two co-centered hyperspheres, which can effectively prevent anomalous data falling close to the hyperspherical center and surpass baselines significantly in the experiments.



Concerning end-to-end models,Ma et al. (2022)  proposed a global and local knowledge distillation method for graph-level anomaly detection, which learns rich global and local normal pattern information by random joint distillation of graph and node representations. The method needs to train two GCNs jointly at a high time cost. Zhao & Akoglu (2021) combined the Deep SVDD objective function and graph isomorphism network to learn a hypersphere of normal samples. Qiu et al. (

Figure 1: Architecture of the proposed models (right top: DOHSC; right bottom: DO2HSC).

