GENERATED GRAPH DETECTION

Abstract

Graph generative models become increasingly effective for data distribution approximation and data augmentation. Although still in sandboxes, they have aroused public concerns about their malicious misuses or misinformation broadcasts, just as what Deepfake visual and auditory media has been delivering to society. It is never too early to regulate the prevalence of generated graphs. As a preventive response, we pioneer to formulate the generated graph detection problem to distinguish generated graphs from real ones. We propose the first framework to systematically investigate a set of sophisticated models and their performance in four classification scenarios. Each scenario switches between seen and unseen datasets/generators during testing to get closer to real world settings and progressively challenge the classifiers. Extensive experiments evidence that all the models are qualified for generated graph detection, with specific models having advantages in specific scenarios. Resulting from the validated generality and oblivion of the classifiers to unseen datasets/generators, we draw a safe conclusion that our solution can sustain for a decent while to curb generated graph misuses.

1. INTRODUCTION

However, a coin has two sides, there is a concern that the synthetic graphs can be misused. For example, molecular graphs are used to design new drugs Simonovsky & Komodakis (2018); You et al. (2018a) . The generated graphs can be misused in this process and it is important for the pharmaceutical factory to vet the authenticity of the molecular graphs. Also, synthetic graphs make deep graph learning models more vulnerable against well-designed attacks. Existing graph-level backdoor attacks Xi et al. ( 2021) and membership inference attacks Wu et al. (2021a) require the attackers to train their local models using the same or similar distribution data as those for the target models. Adversarial graph generation enables attackers to generate graphs that are close to the real graphs. It facilitates the attackers to build better attack models locally hence keeping those attacks more stealthy (since the attackers can minimize the interaction with the target models). This advantage also applies to the latest graph attacks such as the property inference attack Zhang et al. As a result, it is essential to regulate the prevalence of generated graphs. In this paper, we propose to proactively target the generated graph detection problem, i.e., to study whether generated graphs can be differentiated from real graphs with machine learning classifiers. To detect generated graphs, we train graph neural network (GNN)-based classifiers and show their effectiveness in encoding and classifying graphs Zhang et al. (2020) ; Kipf & Welling (2017); Hamilton et al. (2017) . Figure 2 illustrates the general pipeline of the generated graph detection. To evaluate their accuracy and generalizability, we test graphs from varying datasets and/or varying generators that are progressively extended towards the unseen during training. The seen concept in dataset or generator means that the graphs used in the training and testing stage are from the same dataset or generated by the same generator, respectively. That is to say, they share the same or similar distribution. And the unseen concept represents the opposite.



Graph generative models aim to learn the distributions of real graphs and generate synthetic ones Xie et al. (2022); Liu et al. (2021); Wu et al. (2021b). Generated graphs have found applications in numerous domains, such as social networks Qiu et al. (2018), e-commerce Li et al. (2020), chemoinformatics Kearnes et al. (2016), etc. In particular, with the development of deep learning, graph generative models have witnessed significant advancement in the past 5 years Stoyanovich et al. (2020); Liao et al. (2019); Kipf & Welling (2016); You et al. (2018a).

(2022)  and GNN model stealing attackShen et al. (2022).

