DEEP WATERMARKS FOR ATTRIBUTING GENERATIVE MODELS

Abstract

Generative models have enabled the creation of contents that are indistinguishable from those taken from the Nature. Open-source development of such models raised concerns about the risks in their misuse for malicious purposes. One potential risk mitigation strategy is to attribute generative models via watermarking. Current watermarking methods exhibit significant tradeoff between robust attribution accuracy and generation quality, and also lack principles for designing watermarks to improve this tradeoff. This paper investigates the use of latent semantic dimensions as watermarks, from where we can analyze the effects of design variables, including the choice of watermarking dimensions, watermarking strength, and the capacity of watermarks, on the accuracy-quality tradeoff. Compared with previous SOTA, our method requires minimum compute and is more applicable to large-scale models. We use StyleGAN2 and the latent diffusion model to demonstrate the efficacy of our method.

1. INTRODUCTION

Generative models can now create synthetic contents such as images and audios that are indistinguishable from those taken from the Nature (Karras et al., 2020; Rombach et al., 2022; Ramesh et al., 2022; Hawthorne et al., 2022) . This pose serious threat when used as malicious attempt, such as disinformation (Breland, 2019) and malicious impersonation (Satter, 2019) . Such potential threats delays the industrialization process of the generative model, as conservative model inventors hesitate to release their source code (Yu et al., 2020) . For example in 2020, OpenAI refused to release the source code of their GPT-2 (Radford et al., 2019) One of the potential means of solution is model attribution (Yu et al., 2018; Kim et al., 2020; Yu et al., 2020) , where a model distributor tweaks each user-end model so that they generate contents with model-specific watermarks. In practice, we consider the scenario where the model distributor or regulator maintain a database of user specific keys which corresponds to each users' downloaded model. When malicious attempts has been made, the regulator will be able to identify the user that's responsible for such attempts by attribution. Additionally, we assume the distributed model is white-box, which potentially makes a separate watermarking module appended on top of the generator trivial, as malicious user can simply remove such module from the network. Instead, we propose deep watermarking method that is free from this limitation by embedding the watermarking module directly into the generative model itself. Formally, let a set of n generative models be G := {g i (•)} n i=1 where g i (•) : R dz → R dx is a mapping from an easy-to-sample distribution p z to a watermarked data distribution p x,i in the content space, and is parameterized by a binary-coded key ϕ i ∈ Φ := {0, 1} d ϕ . Let f (•) : R dx → Φ be a mapping that attributes contents to their source models. We consider four performance metrics of a watermarking mechanism: The attribution accuracy of g i is defined as A(g i ) = E z∼pz [1 (f (g i (z)) == ϕ i )] . (1) The generation quality of g i measures the difference between p x,i and the data distribution used for learning G, e.g., the Fréchet Inception Distance (FID) score (Heusel et al., 2017) for images. 1



model due to concerns over potential malicious attempts Brockman et al. (2020), additionally, the source codes of DALL-E (Ramesh et al., 2021) and DALL-E 2 (Ramesh et al., 2022) are also not released for the same reason Mishkin & Ahmad (2022).

