DECENTRALIZED ATTRIBUTION OF GENERATIVE MODELS

Abstract

Growing applications of generative models have led to new threats such as malicious personation and digital copyright infringement. One solution to these threats is model attribution, i.e., the identification of user-end models where the contents under question are generated from. Existing studies showed empirical feasibility of attribution through a centralized classifier trained on all user-end models. However, this approach is not scalable in reality as the number of models ever grows. Neither does it provide an attributability guarantee. To this end, this paper studies decentralized attribution, which relies on binary classifiers associated with each user-end model. Each binary classifier is parameterized by a user-specific key and distinguishes its associated model distribution from the authentic data distribution. We develop sufficient conditions of the keys that guarantee an attributability lower bound. Our method is validated on MNIST, CelebA, and FFHQ datasets. We also examine the trade-off between generation quality and robustness of attribution against adversarial post-processes.

1. INTRODUCTION

Figure 1 : FFHQ dataset projected to the space spanned by two keys φ 1 and φ 2 . We develop sufficient conditions for model attribution: Perturbing the authentic dataset along different keys with mutual angles larger than a data-dependent threshold guarantees attributability of the perturbed distributions. (a) A threshold of 90 deg suffices for benchmark datasets (MNIST, CelebA, FFHQ). (b) Smaller angles may not guarantee attributability. Recent advances in generative models (Goodfellow et al., 2014) have enabled the creation of synthetic contents that are indistinguishable even by naked eyes (Pathak et al., 2016; Zhu et al., 2017; Zhang et al., 2017; Karras et al., 2017; Wang et al., 2018; Brock et al., 2018; Miyato et al., 2018; Choi et al., 2018; Karras et al., 2019a; b; Choi et al., 2019) . Such successes raised serious concerns regarding emerging threats due to the applications of generative models (Kelly, 2019; Breland, 2019) . This paper is concerned about two particular types of threats, namely, malicious personation (Satter, 2019) , and digital copyright infringement. In the former, the attacker uses generative models to create and disseminate inappropriate or illegal contents; in the latter, the attacker steals the ownership of a copyrighted content (e.g., an art piece created through the assistance of a generative model) by making modifications to it. We study model attribution, a solution that may address both threats. Model attribution is defined as the identification of user-end models where the contents under question are generated from. Existing studies demonstrated empirical feasibility of attribution through a centralized classifier trained on all existing user-end models (Yu et al., 2018) . However, this approach is not scalable in reality where the number of models ever grows. Neither does it provide an attributability guarantee. To this end, we propose in this paper a decentralized attribution scheme: Instead of a centralized classifier, we use a set of binary linear classifiers associated with each user-end model. Each classifier is parameterized by a user-specific key and distinguishes its associated model distribution from the authentic data distribution. For correct attribution, we expect one-hot classification outcomes for generated contents, and a zero vector for authentic data. To achieve correct attribution, we study the sufficient conditions of the user-specific keys that guarantee an attributability lower bound. The resultant conditions are used to develop an algorithm for computing the keys. Lastly, we assume that attackers can post-process generated contents to potentially deny the attribution, and study the tradeoff between generation quality and robustness of attribution against post-processes.

Problem formulation

We assume that for a given dataset D ⊂ R dx , the registry generates userspecific keys, Φ := {φ 1 , φ 2 , ...} where φ i ∈ R dx and ||φ i || = 1. || • || is the l 2 norm. A user-end generative model is denoted by G φ (•; θ) : R dz → R dx where z and x are the latent and output variables, respectively, and θ are the model parameters. When necessary, we will suppress θ and φ to reduce the notational burden. The dissemination of the user-end models is accompanied by a public service that tells whether a query content belongs to G φ (labeled as 1) or not (labeled as -1). We model the underlying binary linear classifier as f φ (x) = sign(φ T x). Note that linear models are necessary for the development of sufficient conditions of attribution presented in this paper, although sufficient conditions for nonlinear classifiers are worth exploring in the future. The following quantities are central to our investigation: (1) Distinguishability of G φ measures the accuracy of f φ (x) at classifying G φ against D: D(G φ ) := 1 2 E x∼P G φ ,x0∼P D [1(f φ (x) = 1) + 1(f φ (x 0 ) = -1)] . (1) Here P D is the authentic data distribution, and P G φ the user-end distribution dependent on φ. G is (1 -δ)-distinguishable for some δ ∈ (0, 1] when D(G) ≥ 1 -δ. (2) Attributability measures the averaged multi-class classification accuracy of each model distribution over the collection G := {G φ1 , ..., G φ N }: A(G) := 1 N N i=1 E x∼G φ i 1(φ T j x < 0, ∀ j = i, φ T i x > 0). G is (1 -δ)-attributable when A(G) ≥ 1 -δ. (3) Lastly, We denote by G(•; θ 0 ) (or shortened as G 0 ) the root model trained on D, and assume P G0 = P D . We will measure the (lack of) generation quality of G φ by the FID score (Heusel et al., 2017) and the l 2 norm of the mean output perturbation: ∆x(φ) = E z∼Pz [G φ (z; θ) -G(z; θ 0 )], (3) where P z is the latent distribution. This paper investigates the following question: What are the sufficient conditions of keys so that the user-end generative models can achieve distinguishability individually and attributability collectively, while maintaining their generation quality? Contributions We claim the following contributions: 1. We develop sufficient conditions of keys for distinguishability and attributability, which connect these metrics with the geometry of the data distribution, the angles between keys, and the generation quality. 2. The sufficient conditions lead to simple design rules for the keys: keys should be (1) data compliant, i.e., φ T x < 0 for x ∼ P D , and (2) orthogonal to each other. We validate these rules using DCGAN (Radford et al., 2015) and StyleGAN (Karras et al., 2019a) on benchmark datasets including MNIST (LeCun & Cortes, 2010 ), CelebA (Liu et al., 2015 ), and FFHQ (Karras et al., 2019a) . See Fig. 1 for a visualization of the attributable distributions perturbed from the authentic FFHQ dataset. 3. We empirically test the tradeoff between generation quality and robust attributability under random post-processes including image blurring, cropping, noising, JPEG conversion, and a combination of all.

