TOWARDS THE DETECTION OF DIFFUSION MODEL DEEPFAKES Anonymous authors Paper under double-blind review

Abstract

Diffusion models (DMs) have recently emerged as a promising method in image synthesis. They have surpassed generative adversarial networks (GANs) in both diversity and quality, and have achieved impressive results in text-to-image and image-to-image modeling. However, to date, only little attention has been paid to the detection of DM-generated images, which is critical to prevent adverse impacts on our society. Although prior work has shown that GAN-generated images can be reliably detected using automated methods, it is unclear whether the same methods are effective against DMs. In this work, we address this challenge and take a first look at detecting DM-generated images. We approach the problem from two different angles: First, we evaluate the performance of state-of-the-art detectors on a variety of DMs. Second, we analyze DM-generated images in the frequency domain and study different factors that influence the spectral properties of these images. Most importantly, we demonstrate that GANs and DMs produce images with different characteristics, which requires adaptation of existing classifiers to ensure reliable detection. We believe this work provides the foundation and starting point for further research to detect DM deepfakes effectively.

1. INTRODUCTION

In the recent past, diffusion models (DMs) have shown a lot of promise as a method for synthesizing images. Such models provide better (or at least similar) performance compared to generative adversarial networks (GANs) and allow powerful text-to-image models such as DALL-E 2 (Ramesh et al., 2022) , Imagen (Saharia et al., 2022), and Stable Diffusion (Rombach et al., 2022) . Advances in image synthesis have resulted in very high-quality images being generated, and humans can hardly tell if a given picture is an actual or artificially generated image (so-called deepfake) (Nightingale & Farid, 2022) . This progress has many implications in practice and poses a danger to our digital society: Deepfakes can be used for disinformation campaigns, as such images appear particularly credible due to their sensory comprehensibility. Disinformation aims to discredit opponents in public perception, to create sentiment for or against certain social groups, and thus influence public opinion. In their effect, deepfakes lead to an erosion of trust in institutions or individuals, support conspiracy theories, and promote a fundamental political camp formation. Despite the importance of this topic, there is only a limited amount of research on effective deepfake detection. Previous work on the detection of GAN-generated images (e.g., Wang et al. (2020 ), Gragnaniello et al. (2021 ), and Mandelli et al. (2022a) ) showed promising results, but it remains unclear if any of these methods can be applied to DM-generated images. In this paper, we present the first look at detection methods for DM-generated media. We tackle the problem from two different angles. On the one hand, we investigate whether DM-generated images can be effectively detected by existing methods that claim to be universal. We study ten models in total, five GANs and five DMs. We find that existing detection methods suffer from severe performance degradation when applied on DM-generated images, with the area under the receiver operating characteristic curve (AUROC) metric dropping by 15.2% on average compared to GANs. These results hint at a structural difference between synthetic images generated by GANs and DMs. We show that existing detection methods can be improved by fine-tuning, which makes detection almost perfect. However, our results also suggest that recognizing DM-generated images is a more difficult task than recognizing GAN images. On the other hand, we analyze DM-generated images in the frequency domain and compare them to GAN-generated images. Although DMs do not exhibit strong frequency artifacts compared to GANs, their spectrum deviates from real images. We hypothesize that discrepancies in spectral properties are a possible reason for our identified differences. Therefore, we analyze the spectral properties of DM-and GAN-generated images in detail and find that high frequencies are systematically mismatched. Further analysis suggests that too little weight is given to these frequencies during training due to the choice of the training objective. We believe that our results provide the foundation for further research on the effective detection of deepfakes generated by DMs.

2. RELATED WORK

Universal Fake Image Detection While in recent years a variety of successful methods to detect artificially generated images has been proposed (Verdoliva, 2020), generalization to unseen data remains a challenging task (Cozzolino et al., 2019) . Constructing an effective classifier for a specific generator is considered straightforward, which is why more research effort is put into designing universal detectors (Xuan et al., 2019; Chai et al., 2020; Wang et al., 2020; Cozzolino et al., 2021; Gragnaniello et al., 2021; Girish et al., 2021; Mandelli et al., 2022a) . This is especially important in the context of deceptive media, since new generative models emerge on a frequent basis and manually updating detectors is too slow to stop the propagation of harmful contents. 2020), who showed that GANs are unable to correctly reproduce the spectral distribution of the training data. In particular, generated images contain increased magnitudes at high frequencies. While several works attribute these spectral discrepancies to transposed convolutions (Zhang et al., 2019; Durall et al., 2020) or, more general, up-sampling operations (Frank et al., 2020; Chandrasegaran et al., 2021) , no consensus on their origin has yet been reached. Some works explain them by the spectral bias of convolution layers due to linear dependencies (Dzanic et al., 2020; Khayatkhoei & Elgammal, 2022) , while others suggest the discriminator is not able to provide an accurate training signal (Chen et al., 2021; Schwarz et al., 2021) . Detection of DM-Generated Images Despite the massive attention from the scientific community and beyond, DMs have not yet been studied from the perspective of image forensics. A very specific use case is considered by Mandelli et al. (2022b) , where the authors evaluate methods for detecting western blot images synthesized by different models, including DDPM (Ho et al., 2020) . More related to our analysis is the work of Wolter et al. (2022) , which proposes to detect generated images based on their wavelet-packet representation, combining features from pixel-and frequency space. While the focus lies on GAN-generated images, they demonstrate that the images generated by ADM (Dhariwal & Nichol, 2021) can be detected using their approach. They also report that the classifier "appears to focus on the highest frequency packet", which is consistent with our findings. (2022) who analyze the generative process of diffusion models in the frequency domain. They state that diffusion models have an inductive bias according to which, during the reverse process, higher frequencies are added to existing lower frequencies.

3. BACKGROUND ON DMS

DMs were first proposed by Sohl-Dickstein et al. (2015) and later advanced by Ho et al. (2020) , who also pointed out the connections between DMs and score-based generative models (Song & Ermon, 2019; 2020; Song et al., 2022b) . Since then, numerous modifications and improvements have been proposed, leading to higher perceptual quality (Nichol & Dhariwal, 2021; Dhariwal & Nichol, 2021; Choi et al., 2022; Rombach et al., 2022) and increased sampling speed (Song et al., 



Frequency-Based Deepfake DetectionZhang et al. (2019)  were the first to demonstrate that the spectrum of GAN-generated images contains visible artifacts in the form of a periodic, grid-like pattern due to transposed convolution operations. These findings were later reproduced by Wang et al. (2020) and extended to the discrete cosine transform (DCT) by Frank et al. (2020). Another characteristic was discovered by Durall et al. (

Both Kingma et al. (2021) and Song et al. (2022b) experiment with adding Fourier features to improve learning of high-frequency content, the former reporting it leads to much better likelihoods. Another interesting observation is made by Rissanen et al.

