DREAM: DOMAIN-FREE REVERSE ENGINEERING AT-TRIBUTES OF BLACK-BOX MODEL

Abstract

Deep learning models are usually black boxes when deployed on machine learning platforms. Prior works have shown that the attributes (e.g., the number of convolutional layers) of a target black-box neural network can be exposed through a sequence of queries. There is a crucial limitation that these works assume the dataset used for training the target model to be known beforehand, and leverage this dataset for model attribute attack. However, it is difficult to access the training dataset of the target black-box model in reality. Therefore, whether the attributes of a target black-box model could be still revealed in this case is doubtful. In this paper, we investigate a new problem of Domain-free Reverse Engineering the Attributes of a black-box target Model, called DREAM, without requiring the availability of target model's training dataset, and put forward a general and principled framework by casting this problem as an out of distribution (OOD) generalization problem. At the heart of our framework, we devise a multi-discriminator generative adversarial network (MDGAN) to learn domain invariant features. Based on these features, we can learn a domain-free model to inversely infer the attributes of a target black-box model with unknown training data. This makes our method one of the kinds that can gracefully apply to an arbitrary domain for model attribute reverse engineering with strong generalization ability. Extensive experimental studies are conducted and the results validate the superiority of our proposed method over the baselines.

1. INTRODUCTION

With its commercialization, machine learning as a service (MLaaS) is becoming more and more popular, and providers are paying more attention to the privacy of models and the protection of intellectual property. Generally speaking, the machine learning service deployed on the cloud platform is a black box, where users can only obtain outputs by providing inputs to the model. The attributes of the model such as architecture, training set, training method, are concealed by provider. However, if such a deployment is safe? Once the attributes of the model are revealed, it will be beneficial to many downstream attacking tasks, e.g., adversarial example generation (Moosavi-Dezfooli et al., 2016) , model inversion (He et al., 2019 ), etc. (Oh et al., 2018) has conducted model reverse engineering to reveal model attributes, as shown in the left of Figure 1 . They first collect a large set of white-box models which are trained based on the same datasets as the target black-box model, e.g., the MNIST hand-written dataset (Lecun et al., 1998) . Given a sequence of input queries, the outputs of white-box models can be obtained. After that, a meta-classifier is trained to learn a mapping between model outputs and model attributes. For inference, outputs of the target black-box model are fed into the meta-classifier to predict model attributes. The promising results demonstrate the feasibility of model reverse engineering. However, a crucial limitation in (Oh et al., 2018) is that they assume the dataset used for training the target model to be known in advance, and leverage this dataset for meta-classifier learning. In most application cases, the training data of a target black-box model is unknown. When the domain of training data of the target black-box model is inconsistent with that of the set of constructed white-box models, the meta-classifier is usually unable to generalize well on the target black-box model. To verify this point, we train three black-box models with the same architecture on three different datasets, Photo, Cartoon and Sketch (Li et al., 2017) , respectively. We use the method in (Oh et al., 2018) to train a meta-classifier on the white-box models which are trained on the Cartoon 

Attributes

Figure 1 : Previous work (left) assumes the dataset used to train the target black-box model is given beforehand, and requires to use the same dataset to train white-box models. Our DREAM framework (right) relaxes the condition that training data of black-box model is no longer required to be available, and proposes a domain-free method to infer attributes of a black-box model. Our idea is casting the problem into an out-of-distribution learning problem, and designing a GAN (Goodfellow et al., 2014) based network (MDGAN) to learn domain invariant features for black-box model attribute inference. dataset. After that, we use the trained meta-classifier to infer attributes of three black-box models, respectively. As shown in Figure 2 , when the training dataset of black-box models and white-box models are the same (i.e., Cartoon), the performance reaches about 80%, otherwise, it sharply drops to about 40%, close to random guess. The huge gap shows that it is not trivial to investigate model reverse engineering with the assumption of the training dataset of black-box model not available. Furthermore, if the training set for black-box model changes, (Oh et al., 2018) needs to retrain the whole set of white-box models to obtain a promising result, which is extremely time-consuming. In light of this, we cast such a problem as an out of distribution (OOD) generalization problem, and propose a novel framework DREAM: Domain-free Reverse Engineering the Attributes of black-box Model. In the field of computer vision, out of distribution generalization learning has been widely studied in recent years (Shen et al., 2021) , where its main goal is to learn a model on data of one or multiple domains, and generalize well on data of another domain unseen during training. One kind of mainstream OOD learning approaches is to extract domain invariant features from data of multiple different domains, and utilize the domain invariant features for downstream tasks (Li et al., 2018; Kim et al., 2021; Zhou et al., 2021b) . These methods mainly focus on image or video data, and have shown powerful performance. Back to our problem, the black-box models deployed on cloud platform provide their functionality and which categories they can output. Therefore, we can collect data with the same label but different distribution as domains to train white-box models and obtain their probability outputs. since the data we concentrate on is related to the outputs of machine learning models, e.g., probability value, how to design an effective OOD learning method over this type of data has not been explored. To this end, we design a multi-discriminator generative adversarial network (MDGAN) to learn domain invariant features from the outputs of white-box models trained on multi-domain data. Based on learnt domain invariant features, we learn a domain-free reverse model which can well infer the attributes of a target black-box model trained using data of an arbitrary domain. Our contributions are summarized as follows: 1) We provide the first study on the problem of domainfree reverse engineering the attributes of black-box models, and cast it as an out of distribution (OOD) generalization problem; 2) We propose a generalized framework, DREAM, which can address the problem of inferring the attributes of a black-box model with an arbitrary training domain; 3) We constitute the first attempt to explore learning domain invariant features from probability representations, in contrast to traditional image representations; 4) We perform extensive experiments and analyze the results, demonstrating the effectiveness of our method.

2. RELATED WORK

Reverse Engineering of Model Attribute. Its goal is to reveal attribute values of a target model, such as model structure, optimization method, hyperparameters, etc. Current research efforts focus on two aspects, hardware (Yan et al., 2020; Hua et al., 2018) and software (Oh et al., 2018; Wang & Gong, 2019) . The hardware-based methods utilize information leaks from side-channel (Hua et al., 2018; Yan et al., 2020) or unencrypted PCIe buses (Zhu et al., 2021) to invert the structure of deep neural networks. Software-based methods reveal model attributes by machine learning. (Wang & Gong, 2019) steals the trade-off weight of loss function and the regularization term. They derive over-determined linear equations and solve the hyperparameters by least-square method. KENNEN (Oh et al., 2018) prepares a set of white-box models, and then trains a meta-classifier to build a mapping between model outputs and their attributes. It is the most related work to ours. However, a significant difference is that, Oh et al. (2018) requires the data used to train the target black-box model to be given beforehand, while our method relaxes this condition, i.e., we no longer require the training data of target model to be available. Thus, we attempt to solve a more practical problem. Model Functionality Extraction. It aims to train a clone model that has similar model functionality to that of the target model. To achieve this goal, many works have been proposed in recent years (Orekondy et al., 2019; Truong et al., 2021; Papernot et al., 2017) . (Orekondy et al., 2019) uses an alternative dataset collected from Internet to query the target model. (Papernot et al., 2017) assumes part of dataset is known, and then presents a dataset augmentation method to construct the dataset for querying the target model. Moreover, data-free extraction methods (Kariyappa et al., 2021; Truong et al., 2021 ) query a target model through data generated by a generator, and use zero-order gradient approximation to approximate the gradient of the target model. Different from the methods mentioned above, our goal is to infer the attributes of a black-box model, rather than stealing the model function. Membership Inference. Its goal is to determine whether a sample belongs to the training set of a model (He et al., 2020; Choquette-Choo et al., 2021; Rezaei & Liu, 2021) . Although inferring model attribute is different from the task of membership inference, the technique in Oh et al. ( 2018) is actually similar to those of membership inference attack. However, as stated aforementioned, when the domain of training data of the target black-box model is inconsistent with that of the set of white-box models, the method is usually unable to generalize well because of the OOD problem. OOD Generalization. The goal of OOD Generalization is to deal with the inevitable shifts from a training distribution to an unknown testing distribution (Shen et al., 2021) . Existing methods mainly fall into three categories: domain generalization (Kim et al., 2021; Li et al., 2018; Zhou et al., 2021b; a; Hu et al., 2020) , causal learning (Arjovsky et al., 2019; Creager et al., 2021; Krueger et al., 2021; Mahajan et al., 2021) and stable learning (Shen et al., 2020; Kuang et al., 2020; Zhang et al., 2021; Kuang et al., 2018) . Domain generalization attempts to learn invariant representations among different domains. The work closest to us is ADA (Ganin et al., 2016) which uses an adversarial strategy between a feature extractor and a discriminator to learn domain invariant features. ADA is designed for domain adaptation task (only two domain). However, we aim to solve a domain generalization problem that handle more than two domains, single discriminator cannot learn domain invariant feature between multiple domains. Causal learning and stable learning aim to search for causal features to ground-truth labels from data and filter out label-unrelated features. The former makes existing causal features invariant, while the later focuses on the effective features strongly related to labels by reweighting attentions. The above methods mainly focus on image or video. How to design an effective OOD learning method for attribute inference of black-box model has not been explored so far. Problem Formulation. As aforementioned, there is a strict constraint in (Oh et al., 2018) that they assume the training dataset D of the target model to be given in advance, and leverage D for learning meta-classifier Φ. In most scenarios, especially on public machine learning platforms, it is difficult to access the training data of a target black-box model, which significantly limits the applications of (Oh et al., 2018) . To mitigate this problem, we provide a new problem setting by relaxing the above constraint, i.e., we no longer require the training data D of the target black-box model to be available. Thus, our goal is to learn a domain-free reverse classifier Φ that is trained based on outputs of white-box models F, and predict well for the target black-box model, even if white-box and black-box models are built based on training data of different domains.

3.2. DREAM FRAMEWORK

To perform domain-free black-box model attribute reverse engineering, we cast this problem into an out-of-distribution (OOD) generalization learning problem, and propose a novel framework DREAM, as shown in Figure 3 . Our DREAM framework consists of two parts: In the left part of Figure 3 , we train a number of white-box models with training sets from different domains. Models of each domain are enumerated with various model attributes. All of these models constitute a model set covering different domains (please refer to Sect. 4.1 for more details). Next, we prepare queries as input to these models. For each domain, we sample an equal number of images from the corresponding dataset, and concatenate them as a batch of queries. These queries are sent to each model, and outputs of the model are fed into the other module of our DREAM framework, as shown in the right part of Figure 3 . The core idea is to design a multi-discriminator generative adversarial network (MDGAN) to learn domain invariant features, where MDGAN consists of multiple discriminators corresponding to different domains and one generator across multiple domains. The generator aims to learn domain invariant features, and each discriminator intends to make the learnt feature distributions of other domains to fit that of the domain itself. In this way, the generator is capable of learning domain invariant features. Based on the learnt domain invariant features, we can learn a domain-free reverse model to infer the attributes of a black-box model with an arbitrary domain.

3.3. MULTI-DOMAIN OUTPUT PREPARATION

The multi-domain output can be taken as a representation of a white-box model, and is fed into MDGAN to learn domain invariant features. Specifically, we sample an equal number of images from the dataset of each domain to obtain a query set Q = {q j } N j=1 , where N is the number of queries. We denote training model set from each domain as F = [f 1 , f 2 , ..., f m ], where f i consists of K models of i th domain. Then, we input each query q j ∈ Q into models f i of i th domain to get an output O i j ∈ R K×C , where O i j represents K outputs of i th domain for a query. We obtain O i ∈ R K×CN by concatenating N outputs. Finally, we derive multi-domain outputs as O = [O 1 , ..., O m ] ∈ R m×K×CN . The core idea of MDGAN is to learn embeddings for each domain by a parameter sharing generator, and make the distributions of different domains as close as possible by multiple discriminators.

3.4. MULTI-DISCRIMINATOR GAN (MDGAN)

After preparing multi-domain outputs, we devised a GAN based network, MDGAN, to learn domain invariant features from the outputs of white-box models trained on multi-domain representation. To better present, we take Figure 4 to illustrate the idea behind MDGAN. Assume there are two kinds of inputs, O 1 and O 2 , from two domains. When feed them into the generator G, we can obtain the corresponding embeddings z 1 and z 2 , respectively. After that, we feed z 1 and z 2 to the discriminator D 1 , where D 1 is expected to output a "real" label for z 1 and output a "fake" label for z 2 . By jointly training G and D 1 based on a minmax optimization, the distribution of z 2 is expected to move towards that of z 1 . In the meantime, we also feed z 1 and z 2 to the discriminator D 2 . Differently, D 2 is expected to output a "real" label for z 2 and output a "fake" label for z 1 . By jointly training G and D 2 , the distribution of z 1 is expected to move towards that of z 2 . In this way, z 1 and z 2 generated by the generator G become domain invariant representations. Formally, we define G(O; θ g ) : O → z. The generator G sharing with parameter θ g across domains maps multi-domain outputs O into the latent feature z. We also define m discriminators {D i (z; θ i d )} m i=1 . Each discriminator D i (z) : z → [0, 1 ] outputs a scalar representing the probability that z comes from the i th domain rather than others. For D i (z), we treat the correct label of an embedding in the i th domain, i.e., O i , as True, while others as False. Then we divide multi-domain outputs into two groups, {O i T } and {O i F }, which are defined as: {O i T } = {O i }; {O j F } = {O j |j ̸ = i}; j̸ =i {O j F } ∪ {O i T } = O. The training goal of D i is to maximize the probability of assigning the correct label to features both from the i th domain and other domains, while the generator G is trained against the discriminator to minimize log(1 -D(G(x))). In other words, it is a min-max game between the i th discriminator D i and generator G with a value function V , formulated as: min G max D i V (D i , G) = E x∼{O i T } [logD i (G(x))] + j̸ =i E x∼{O j F } [log(1 -D i (G(x)))]. (2) During optimizing the min-max adversarial loss for G and D i , the distributions of model outputs from the i th domain and other domains become closer. After G and all D are well trained, G will embed multi-domain model outputs into an invariant feature space, where each discriminator cannot figure out which domain the outputs of white-box models are from.

3.5. DOMAIN-FREE REVERSE MODEL

Then, we use the domain-free reverse classifier to classify the domain invariant features produced by the generator. We denote features z produced by G(O; θ g ) as z = [G(O 1 ); G(O 2 ); ...; G(O m )] ∈ R m×K×d ′ . ( ) Where d ′ is the number of feature dimensions. We define the domain-free reverse classifier as Φ(z; θ c ) parameterized by θ c . We obtain probability p(z i ) for each possible model attribute as: p(z i ) = sof tmax(Φ(z i )) = exp{Φ(z i )} m i=1 exp{Φ(z j )} . ( ) The target is to minimize the cross entropy between the predicted p(z i ) and ground-truth of model attribute values y: min Φ E z∼G(O) C i=1 -y i log(p(z i )) = min Φ E z∼G(O) -y T log(p(z)) . At inference phase, given the same queries as the white-box model, the outputs of a black-box model from an unknown domain are fed into the generator G, and then the output of G is fed into the reverse classifier Φ, achieving domain-free prediction of black-box model attributes.

3.6. OVERALL MODEL AND TRAINING STRATEGY

After introducing all the components, we give the final loss function based on Eq. 2 and 5 as: min G,Φ max D i ,1≤i≤m V (D i , G) = E x∼{O i T } logD i (G(x)) + j̸ =i E x∼{O j F } log(1 -D i (G(x))) + λ E z∼G(O) -y T log(p(z)) . where λ is a trade-off parameter. We observe if we firstly train MDGAN, and then optimize domainfree reverse classifier, the generator of MDGAN will converge to a trivial solution. In other words, it tends to produce identical features, resulting in a reverse model which cannot predict model attributes correctly. Thus, we design a training strategy: we first optimize all discriminators D i , and then jointly optimize the generator and the domain-free reverse classifier. We repeat the above processes, until the algorithm converges. The proposed training strategy is represented in Algorithm 1 of Appendix A.3.

4. EXPERIMENTS

4.1 DATASET CONSTRUCTION Following (Oh et al., 2018) , we train a number of models which are constructed by enumerating all possible attribute values. The details of the attributes and their values are shown in Table 1 . The number of models with all possible combinations of the attributes is 5, 184. We also initialize each model with random seeds from 0 and 999, yielding 5,184,000 unique white-box models. For each domain, we randomly sample and train 10,000 white-box models from 5,184,000 models. Then we also sample 5000, 1000, 1000 from 10,000 white-box models as the training set, validation set, and testing set. Next, we introduce the details of our datasets. PACS-modelset. PACS is an image dataset that has been widely used for OOD learning (Li et al., 2017) . In this experiment, we use it for evaluating our domain-free black-box model attribute inference framework DREAM. We utilize three domains, including Photo (1,670 images), Cartoon (2,344 images) and Sketch (3,929 images), to construct our dataset. In our dataset, each domain contains 7 categories. For each domain we train 10,000 models and we combine them as PACS-modelset (30,000 models in total).

MEDU-modelset.

MEDU is a set of hand-written digit recognition dataset, with 4 domains collected from MNIST (Lecun et al., 1998) , USPS (Hull, 1994) , DIDA (Kusetogullari et al.) and EMNIST (Cohen et al., 2017) . Each domain contains different styles of hand-written digit from 0 to 9. We train 40,000 models as MEDU-modelset and each domain contains 10,000 models. In the experiment, we set the number of queries N to 100. We use Adam (Kingma & Ba, 2014) as the optimizer, where the learning rate α is set to 10 -5 for the generator and discriminators, and the learning rate β is set to 10 -4 for the reverse model. The batch size b is set as 100. The trade-off parameter λ is tuned from {0.001, 0.01, 0.1, 1, 10} based on the validation set. Parameter sensitive analysis can be found in Appendix. In addition, the generator and discriminators are implemented as a two-layer MLP, respectively, where ReLU is used as the non-linear activation function. All experiments are conducted on 4 NVIDIA RTX 3090 GPUs, PyTorch 1.11.0 platform. We compare our DREAM with 6 baselines including Random choice, SVM, KENNEN (Oh et al., 2018) , SelfReg (Kim et al., 2021) , MixStyle (Zhou et al., 2021b) , MMD (Li et al., 2018) . To compare fairly, we select a variant of KENNEN (denoted as KENNEN*) taking fixed queries as input, which is the same as ours. Moreover, we also take three typical OOD generalization methods, SelfReg, MixStyle and MMD, as baselines to verify the effectiveness of our proposed MDGAN network for learning domain invariant features. SelfReg aims to draw samples of similar categories between all domains closer and samples of different categories farther; MixStyle captures style information of images by the CNN layer, and it performs style mixing at the layer; MMD adopts maximum mean discrepancy loss between two domains. To apply OOD baselines, we first take probabilities as input to learn invariant features by them, and then adopt a MLP on these features to predict model attributes. In addition, we take SVM as a basic baseline without considering different domain outputs. We adopt the "leave-one-domain-out" scheme to split the source and target domains. For each dataset, we in turn take one domain as the target domain and the rest domains as source domains. We run the experiment 10 trials and report the average accuracy on each split.

4.2. EXPERIMENTAL RESULTS AND ANALYSIS

Overall Performance. Table 2 and 3 report the overall performance of different methods on the PACS-modelset and MEDU-modelset, respectively. The left-most column in each table indicates the target domain (the rest ones are source domains). The performance achieved by our proposed DREAM is better than that of all baselines in terms of the average result of models attributes. For individual attribute, our method outperforms other methods in most of the cases. Our method is better than KENNEN, which illustrates our method benefits from learning domain invariant features and learning domain-free reverse model. Moreover, our method achieves better performance than the three OOD learning methods, which indicates it is necessary to design new methods to extract domain invariant features for model attribute inference of black-box models. 49 52.93 35.26 33.68 36.92 34.75 59.34 44.25 MMD 39.33 55.87 52.67 53.23 39.20 34.33 35.90 36.90 60.73 45.35 DREAM 42.34 58.72 58.58 54.41 37.90 37.81 40.42 38.36 63.39 47.99 What is more, we observe that DREAM cannot outperform other baselines in some cases. The reasons might be: 1) DREAM vs. OOD learning baselines. As we have mentioned, the OOD learning methods aim to learn a domain invariant space from different domains. Once features of different domains are excessively pulled close, the classification accuracy would be influenced. Thus, the trade-off between invariant space learning and classification learning is vital for the performance of reverse engineering. Moreover, the trade-off for each attribute is not identical. Taking MMD in Table 3 E as an example, the best trade-off hyperparameter conduces to predict attribute #ks better than other attributes. Similarly, the best trade-off hyperparameter of DREAM conduces to better predict attributes except for #ks. 2) DREAM vs. KENNEN and SVM. Our proposed DREAM has stronger ability to fit complicated data, while SVM and KENNEN (only a shallow MLP) are weaker in the scenery of complicated data. In the scenery of easier cases, e.g., #act, #ks, #fc in M of MEDU (shown in Table 3 ), DREAM is more likely to overfit due to more parameters, degrading the performance of our method. However, our method generally performs better than SVM and KENNEN in most cases. Visualization of Generated Feature Space. To further verify the effectiveness of our proposed method, we utilize t-SNE (Van der Maaten & Hinton, 2008) to visualize samples in the domain invariant feature space learnt by the generator G in MDGAN. The visualization is carried out on PACS-modelset. We take C (cartoon) and P (photo) as source domains to train white-box models, and use S (sketch) as the unseen target domain to train black-box model. As shown in Figure 5 a ), samples from the three different domains are grouped into individual clusters at the 1st epoch. This illustrates their distributions are indeed different in the beginning. Distributions of source domains (C and P) become closer from epoch 1 to 5. Then, our method embeds features from the unseen domain (S) and the samples from the target domain also become closer to the source domains at the 10th epoch, indicating that our generator is able to generalize an unseen domain into the feature space where the source domains are in. Finally, both source and target domains are transformed into an invariant feature space. For MMD and Mixstyle in Figure 5 domain S indeed become closer to the source domains at the 100 th epoch. However, these feature distributions are not sufficiently tight.

Convergence Analysis

We study the convergence of our algorithm on the PACS-modelset. The curves of the meta-classifier's loss in the training phase are shown in Figure 6 . For all the three splits of domains (left to right), the loss decreases as the training proceeds and finally levels off.

5. CONCLUSION

In 

A.5 SENSITIVITY ANALYSIS

We study the sensitivity of the trade-off parameter λ in our final loss function on the PACS-modelset. As shown in Figure 7 , the results for each model attribute do not show evident fluctuation when changing λ, suggesting that our proposed method is not sensitive to the choices of λ in a wide range. A.6 QUERY NUMBER AND SIZE OF TRAINING SET ANALYSIS Query Number Analysis. Moreover, we study the performance of DREAM against the number of queries on PACS-modelset. Following (Oh et al., 2018) , we use the normalized accuracy that is linearly scaled according to random choice. As shown in Figure 8 , with the increase of query numbers, the average performance does not improve but fluctuate, which means more queries do not necessarily provide more information for our DREAM framework.

Size of Training Set

Analysis. We further study the performance of our method against the size of training set on PACS-modelset. As shown in Figure 9 , we observe that the performance slightly fluctuates from size of 1K to 5K, and does not consistently increase when the size increases. We suspect it can be attributed to the difficulty of our We use a subset of these K classes for training. when training the white box model, we leave out the "dog" and "elephant" for each domain. Then, we train domain-free meta classifiers using model outputs without the "dog" and "elephant" classes and then test domain-free meta classifiers using model outputs that hold all classes. As shown in the 12, our method can still perform well compared with baselines. 23.80 47.60 47.40 45.80 33.80 34.50 31.80 34.30 53.10 39.12 KENNEN* 34.64 50.10 53.07 52.01 34.61 37.11 35.78 37.04 55.27 43.29 SelfReg 27.07 54.32 51.39 53.07 36.99 36.82 35.47 34.17 61.80 43.46 MixStyle 37.78 51.71 54.16 53.60 34.53 36.16 36.36 36.02 59.42 44.42 MMD 31.96 52.94 56.84 52.78 38.18 38.20 36.20 35.92 57.56 44.51 DREAM 42.24 55.68 61.82 58.34 39.55 38.39 38.51 41.39 74.39 50.03 DREAM* 39.71 57.74 64.73 60.79 40.79 40.14 43.54 43.80 72.51 51.53 A.9 EXPERIMENT OF APPLICATION OF REVERSE ENGINEERING Let us consider the setting of model extraction. The structure of the target model is unknown. We use an arbitrary random network structure to extract the target model with the method DFME [1], and use the structure inferred by our method to extract the target model. As shown in the above table, the experimental result shows that using the structure inferred by our method obtains better extraction performance, indicating that our findings are significant. [1] Kariyappa, S., Prakash, A., & Qureshi, M. K. (2021) . Maze: Data-free model stealing attack using zeroth-order gradient estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 13814-13823) . 



photo

Figure 4: An example to illustrate the idea behind MDGAN.

Figure 5: T-SNE visualization of features of different domains produced by a) DREAM, b) MMD, c) MisStyle and d) SelfReg on PACS-modelset.

Figure 7: Sensitivity analysis of parameter λ on PACS-modelset. From left to right, the results in the P split, C split and S split are shown, respectively.

Figure 8: Performance against query number on PACS-modelset. From left to right, normalized accuracies in the P split, C split and S split are shown, respectively.

Figure 9: Performance against size of training set on PACS-modelset. From left to right, normalized accuracies in the P split, C split and S split are shown, respectively.

Attributes and the corresponding values.

Model attribute classification accuracy (%) on PACS-modelset. Red and blue indicate the best and second best performance, respectively.

Model attribute classification accuracy (%) on MEDU-modelset. Red and blue indicate the best and second best performance, respectively.

this paper, we studied the problem of domain-free reverse engineering towards the attributes of black-box model with unknown domain data, and cast it as an OOD generalization problem. We proposed a new framework, DREAM, which can predict the attributes of a black-box model with an arbitrary training domain, and devised a new GAN based network to learn domain invariant features in the scenario of attribute inference of black-box mode. Extensive experimental results demonstrated the effectiveness of our method.

problem for domain-free attribute inference of black-box model, and the nature of OOD problem, i.e., the noise level increases as the size of training set grows. It is worth studying further.

Distribution of attributes in domain M (MNIST) of MEDU-modelset and classification accuracy of our method on the MNIST validation set.

Distribution of attributes in domain E (EMNIST) of MEDU-modelset and classification accuracy of our method on the EMNIST validation set.

Distribution of attributes in domain D (DIDA) of MEDU-modelset and classification accuracy of our method on the DIDA validation set.

Distribution of attributes in domain U (USPS) of MEDU-modelset and classification accuracy of our method on the USPS validation set.

Model attribute classification accuracy (%) on S of PACS-modelset. Red and blue indicate the best and second best performance, respectively. DREAM* represents that its domain-free meta classifier is trained on model outputs without the "dog" and "elephant" classes and is tested on model outputs that hold all classes.

Accuracy and normalized accuracy of data-free model extraction methods. The structure of student model has three choices, same to victim, randomly generate ten structures and compute average, predicted by DREAM. Results for "DREAM predict" reflect our support for model extraction task in black box setting.

A APPENDIX

A.1 DETAILS OF CONSTRUCTED MODELSETWe construct two modelsets (PACS-modelset and MEDU-modelset) by enumerating combinations of attribute values. The architecture of each model in modelsets follows the scheme: N convolution layers, M fully-connected layers and a linear classifier. Each convolution layer contains a k × k convolution, an optional batch normalization, an optional max-pooling and a non-linear activation function in sequence, where k is the kernel size. Each fully-connected layer consists of a linear transformation, a non-linear activation and an optional dropout in sequence. We set the dropout ratio to 0.1 in our experiments. When training models, optimizers are selected from {SGD, ADAM, RMSprop} with a batch size 32, 64 or 128, respectively.

A.2 DETAILED IMPLEMENTATION OF MDGAN AND META-CLASSIFIER

The MDGAN is composed of a generator and multiple discriminators. The generator consists of two linear layers with ReLU activation. The dimension of the input layer of the generator is determined by the query number N and class category number C. In the experiment of recognizing handwritten digits, the input dimension is 1000 (N = 100, C = 10). In the case of PACS dataset, the input dimension is 700 (N = 100, C = 7), and the output dimension of the successive two layers is respectively 500 and 128. Each discriminator consists of three linear layer, with ReLU activation and a final Sigmoid activation. The output dimension of layers are 512, 256, 1 respectively. There are 9 meta-classifiers as total. Each meta-classifier is composed of two layer MLP with dimension of 128, 64, and the length of attribute values.

A.3 TRAINING STRATEGY ALGORITHM

The training strategy of DREAM is shown in Algorithm 1.

A.4 EXPERIMENTS ON DIFFERENT TRAINING AND TESTING ATTRIBUTES

We study the case that the white-box model and the black-box model to be inferred have completely different attributes. As we mentioned in Section 4.1, there are 5, 184 combinations of model attributes in total. We randomly sample 3000, 1000, 1000 as training, validation and testing sets. None of the models has identical attributes. As shown in Table 4 , DREAM consistently outperforms other baselines on the above setting. all , x 2 all , ..., x bm all } and Z all = G(X all ) = {z 1 all , z 2 all , ..., z bm all } Set the corresponding labels as Y all = {y 1 all , y 2 all , ..., y bm all } Calculate gradient of θ c and θ g by: 34.20 51.70 48.50 56.10 35.70 36.50 37.60 40.50 64.60 45.04 KENNEN* 37.36 53.12 57.79 59.66 38.94 35.93 37.92 41.71 63.91 47.37 SelfReg 26.08 52.35 53.89 52.70 35.11 33.84 37.46 36.42 50.99 42.09 MixStyle 35.98 54.31 57.35 57.43 37.14 35.51 39.31 42.07 57.84 46.33 MMD 38.67 57.16 61.49 58.73 40.65 39.14 38.69 41.06 71.48 49.67 DREAM 39.68 57.61 64.48 60.79 40.78 40.10 43.54 43.80 72.42 51.47 

A.7 STATISTICS OF MODELSET

We represent the statistics of each attribute value in PACS-modelset (Table 5 to Table 7 ) and MEDUmodelset (Table 8 to Table 11 ). The "Ratio" line represents the proportion of models with the attribute value in the whole set of models. The next four lines represent maximal, median, mean and minimal accuracy of models for the attribute value, respectively. 

