TOWARDS GRAPH-LEVEL ANOMALY DETECTION VIA DEEP EVOLUTIONARY MAPPING

Abstract

Graph-level anomaly detection aims at capturing anomalous individual graphs in a graph set. Due to its significance in various real-world application fields, such as identifying rare molecules in chemistry and detecting potential frauds in online social networks, graph-level anomaly detection has received great attention. In distinction from node-and edge-level anomaly detection that is devoted to identifying anomalies on a single graph, graph-level anomaly detection faces more significant challenges because both the intraand inter-graph structural and attribute patterns need to be taken into account to distinguish anomalies that exhibit deviating structures, rare attributes or the both. Although deep graph representation learning shows effectiveness in fusing high-level representations and capturing characters of individual graphs, most of the existing works are defective in graph-level anomaly detection because of their limited capability in exploring information across graphs, the imbalanced data distribution of anomalies, and low interpretability of the black-box graph neural networks (GNNs). To overcome these limitations, we propose a novel deep evolutionary graph mapping framework named GmapAD, which can adaptively map each graph into a new feature space based on its similarity to a set of representative nodes chosen from the graph set. By automatically adjusting the candidate nodes using a specially designed evolutionary algorithm, anomalies and normal graphs are mapped to separate areas in the new feature space where a clear boundary between them can be learned. The selected candidate nodes can therefore be regarded as a benchmark for explaining anomalies because anomalies are more dissimilar/similar to the benchmark than normal graphs. Through our extensive experiments on nine real-world datasets, we demonstrate that exploring both intraand inter-graph structural and attribute information are critical to spot anomalous graphs, and our framework outperforms the state of the art on all datasets used in the experiments 1 .

1. INTRODUCTION

Graph-level anomalies are abnormal or rare individual graphs in a graph set. These anomalies can be observed in various application fields, such as rare molecules and abnormal proteins in biochemistry, brain disorders in brain networks/graphs, and frauds in online social networks (Noble & Cook, 2003; Akoglu et al., 2015) . Detecting this category of anomalies has shown great benefits in facilitating downstream anomaly handling process, alleviating anomalies' detrimental impact on society, and boosting real-world applications (e.g., health monitoring and drug discovery). However, graph-level anomaly detection differs significantly from node-and edge-level anomaly detection that investigates an individual graph. Graph-level anomaly detection targets anomalous individuals among various graphs. Not only the unique spatial structure and nodes/edges' attributes associated with each graph, but also the cross-graph structural and attribute patterns should be critically analyzed to identify these potential anomalies in the graph set (Ma et al., 2021) . Recent studies in deep graph representation have put great effort into encoding both the complex graph structural information and attribute information into vectors and then conducting graph analysis within the representation space (Wu et al., 2020) . Although plenty of graph neural networks (GNNs) have been developed to learn expressive node representations via message passing Figure 1 : An overview of the learning feature space for graph-level anomaly detection. Each graph's representation is extracted from its own nodes' representations generated by message passing GNNs. Anomalies are then identified in the feature space. In a better feature space for anomaly detection (Feature Space II), anomalies (G 2 ) and normal graphs (G 1 and G 3 ) should be well separated. schema (Kipf & Welling, 2017; Veličković et al., 2018; Hamilton et al., 2017) and to read out the graph representation from nodes comprised in a single graph (Baek et al., 2021; Xu et al., 2019; Gallicchio & Micheli, 2020a) , there remain significant challenges to directly applying existing GNNs for graph-level anomaly detection. (1) Most importantly, simply reading out graph representations using its own nodes cannot explicitly and fully capture the inter-graph information. For example, G 1 , G 2 , and G 3 's representations shown in Figure 1 only maintain their intra-graph information while the rich cross-graph information is lost. This leads to unsatisfactory detection results and motivates us to design special read out functions to capture intraand inter-graph information. (2) In the feature space, anomalies are also expected to locate away from normal graphs such that a clear boundary between them can be effectively learned such as Feature space II in Figure 1. (3) Lastly, the graph representation or the read out function should be interpretable. Noting that human understandable insights on the detected anomalies are vital for anomaly handling in real applications, but GNNs have been criticized for their low interpretability (Yuan et al., 2022; Pang et al., 2021) . To address the above mentioned challenges, in this work, we propose a novel graph mapping technique to learn effective representations for graph-level anomaly detection. Unlike the existing works that learn a graph's representation using its own nodes, our devised framework, Graph mapping Anomaly Detection (GmapAD), comprehensively explores both the complicated intraand intergraph structural and attribute information to map graphs into an interpretable latent space where anomalies and normal graphs are well separated. Specifically, GmapAD achieves a high degree of discriminativeness between anomalies and normal graphs by considering all nodes in the graph set and maps each single graph into the designed representation space according to the similarity between the graph and nodes. Moreover, we notice that applying a simple graph mapping is non-trivial due to the massive number of nodes in the graph set and some nodes might contain non-valuable (or even misleading and defective) information for distinguishing anomalies, as validated in our experiment in Section 5.2. As a result, we further consider the informativeness of each node and propose a differential evolutionary algorithm to iteratively select the best-performing set of nodes for graph mapping. Eventually, anomalies and normal graphs are projected to different regions in the new feature space and can be distinguished effectively. For validation, we conduct extensive experiments on nine real-world graph datasets by comparison with the state-of-the-arts using four commonly-used metrics, i.e., precision, recall, F1-scores and AUC. We also analyze the challenges and show the effectiveness of GmapAD modules through additional ablation tests. The results demonstrate that our proposed framework is superior to the existing works. In a nutshell, the main contributions of this work are as follows: • To the best of our knowledge, this is the first graph-level anomaly detection framework that explores both the intraand inter-graph information to find clues about anomalous graphs. The structural and attribute information/patterns within single graphs and cross-graphs can be effectively captured by our proposed GmapAD, which is also extendable to work jointly with the state-of-the-art graph neural networks.



The code is available at https://github.com/GmapAD/GmapAD

