DEEPGUISER: LEARNING TO DISGUISE NEURAL AR-CHITECTURES FOR IMPEDING ADVERSARIAL TRANS-FER ATTACKS

Abstract

Security is becoming increasingly critical in deep learning applications. Recent researches demonstrate that NN models are vulnerable to adversarial attacks, which can mislead them with only small input perturbations. Moreover, adversaries who know the architecture of victim models can conduct more effective attacks. Unfortunately, the architectural knowledge can usually be stolen by the adversaries by exploiting the system-level hints through many side channels, which is referred to as the neural architecture extraction attack. Conventional countermeasures for neural architecture extraction can introduce large overhead, and different hardware platforms have diverse types of side-channel leakages such that many expert efforts are needed in developing hardware-specific countermeasures. In this paper, we propose DeepGuiser, an automatic, hardware-agnostic, and retrain-free neural architecture disguising method, to disguise the neural architectures to reduce the harm of neural architecture extraction attacks. In a nutshell, given a trained model, DeepGuiser outputs a deploy model that is functionally equivalent with the trained model but with a different (i.e., disguising) architecture. DeepGuiser can minimize the harm of the follow-up adversarial transfer attacks to the deploy model, even if the disguising architecture is completely stolen by the architecture extraction attack. Experiments demonstrate that DeepGuiser can effectively disguise diverse architectures and impede the adversarial transferability by 13.87% ∼ 32.59%, while only introducing 10% ∼ 40% extra inference latency.

1. INTRODUCTION

Deep neural networks (NNs) have made great success in the field of artificial intelligence (AI) (LeCun et al., 2015) . With NN becoming increasingly complex, a number of NN-specific chips (Jouppi et al., 2017; Liao et al., 2021; Markidis et al., 2018) and intensive innovations (Chen et al., 2020; Qiu et al., 2016; Chen et al., 2014) have been proposed to boost the efficiency of NN computing. Despite the significant progress in hardware performance, security should also be regarded as a higher-priority feature. Especially in safety-critical applications, e.g. autonomous driving, surveillance, and so forth, security vulnerabilities can be exploited by adversaries and lead to uncontrollable consequences. Confidentiality is an essential guarantee for systemic security. The critical confidential information contained in well-trained NN models mainly includes their neural architectures and weight parameters. While the encryption of weight parameters has been well discussed for protecting the weight confidentiality (Orlandi et al., 2007; Cai et al., 2019; Zuo et al., 2021) , the protection of neural architectures is still in lack. Recent researches have alerted that many emerging or even off-the-shelf AI chips are vulnerable to neural architecture extraction attacks (Batina et al., 2018; Hua et al., 2018; Yan et al., 2020; Hu et al., 2020; Wei et al., 2018; Wang et al., 2022) . For example, DeepSniffer (Hu et al., 2020) exploits the system-level hints (e.g. memory access activity, cache miss rate, etc.) of NN processing on GPU platform and proposes a learning-based approach to automatically identify the layer sequences. It also quantitatively shows that the neural architecture extraction can significantly boost the success rate of adversarial transfer attacks by constructing a surrogate model with almost the same neural architecture as the victim model (Demontis et al., 2019; Hu et al., 2020) . The high risk rendered by neural architecture extraction attacks necessitates the protection of neural architectures. On one hand, from the view of intellectual property protection, neural architectures are usually manually designed by experts (He et al., 2016; Simonyan & Zisserman, 2014; Sandler et al., 2018) or automatically designed by neural architecture search (NAS) (Cai et al., 2018; Liu et al., 2018b; Tan et al., 2019) , both of which consume significant labor and resources. On the other hand, from the view of adversarial robustness, if the architecture of the deploy model is leaked, the adversaries can train a surrogate model with the same architecture and use it to craft much more effective adversarial examples to attack the deploy model by exploiting the high transferability between the surrogate and deploy model (Hu et al., 2020) . It is hard to design a universal protection scheme against neural architecture extraction attacks for different kinds of AI chips at the system or hardware level, as various design options affect the hardware characteristics. Diverse run-time side-channel information can be exploited to extract the neural architectures on different hardware platforms, e.g. power (Wei et al., 2018) , cache activity (Yan et al., 2020) , memory access (Hua et al., 2018) , etc. And blocking all these side-channel leakages by designing hardware-specific countermeasures might consume huge system costs and expert efforts. In this work, we propose an "architecture disguising" solution at the algorithm level, DeepGuiser, which protects the architecture information by disguising it before deployment and alleviates the security risk rendered from the architecture extraction and the follow-up adversarial transfer attacks. Fig. 1 (Left) illustrates the attack scenario we are concerned about, and Fig. 1 (Right) demonstrates how DeepGuiser plays its role. And as there exist diverse models to be deployed and the disguising space (introduced in Sec. 4.1) is extremely large, manually finding a good disguising architecture for every possible model is extremely costly and even impossible. Therefore, we design DeepGuiser to automatically and efficiently yield a good disguising architecture for a given trained model. We summarize our contributions as follows: • DeepGuiser is an automatic, hardware-agnostic, and retrain-free neural architecture disguising framework. As shown in Fig. 1 (Right), given a trained model, the disguising policy in DeepGuiser takes the original architecture as the input, and outputs a disguising architecture. Then, with functionality-preserving weight transforms, DeepGuiser yields a "deploy model" that is functionally equivalent with the trained model but with the disguising architecture. This deploy model is deployed, and even if its architecture is stolen, the harm of the follow-up adversarial transfer attacks to the deploy model can be largely reduced. • We use reinforcement learning (RL) to train the disguising policy to output disguising architectures with low adversarial transferability to the original architecture. During the training process, we use a predictor for bridging the evaluation gap of the adversarial transferability between weight-sharing super-net and standalone training. • For training the predictor, we build a dataset TransAdvBench, which collects and evaluates the adversarial transferability of over 8000 pairs of neural architectures. TransAdvBench can also help us study the connection between the architecture characteristics and adversarial transferability, and we list some of the knowledge in the Appendix A.2. • Experimental results show that with DeepGuiser, the adversarial transferability of the surrogate model with the disguising architecture to the deploy model decreases by 13.87% ∼ 32.59%, while only introducing 10% ∼ 40% extra latency for the deploy model on GPU.

2.1. NEURAL ARCHITECTURE EXTRACTION ATTACKS

Many studies have proposed attack methods for extracting the neural architecture of deployed models on a variety of hardware platforms (Batina et al., 2018; Hua et al., 2018; Yan et al., 2020; Hu et al., 2020; Wei et al., 2018) . For example, Hua et al. (2018) reveals the network architectures by utilizing the observed memory access pattern during the NN inference. DeepSniffer (Hu et al., 2020) proposes a learning-based operator recognition method by utilizing long short term memory (LSTM) network. Its basic idea is to learn the correlation between the system hints and the layer types. There are other side-channel information that can also be utilized to recognize the operations and topology, e.g. counting the GEMM calls via cache side-channel (Yan et al., 2020) , observing the patterns and timing of operations (Batina et al., 2018) , cache miss rate (Hu et al., 2020 ), etc. For example, scalar computation (e.g. ReLU) has higher cache miss rate compared to tensor computation (e.g. convolution) as the data reuse rate is much lower. These attack methods pose severe security risks for AI systems.

2.2. DEFENSIVE APPROACHES

There are two main streams of methods for defending against neural architecture attacks. One is to block the side channel leakage such that any adversary cannot obtain corresponding information. For example, memory access pattern and trajectory are important hints for helping the adversary reconstruct the network topology. The most promising solution, oblivious random-access machine (ORAM) protocol (Goldreich & Ostrovsky, 1996) , can prevent the attacker from obtaining the actual access behavior. However, it is practically infeasible to apply ORAM owing to the unacceptable communication blowup. The most efficient implementation, Path ORAM (Stefanov et al., 2018) , still has an overhead of O(log(N )) blowup. Since data moving has already taken a significant proportion of time in NN computing, a great efficiency degradation will occur in bandwidth-limited chips. Another stream of work is to obfuscate the neural architecture. However, the current methods are explicitly designed for defending against specific attack methodologies or with a high obfuscation cost. For example, NeurObfuscator (Li et al., 2021) is an obfuscation framework specifically targeting at increasing layer prediction error rate of the adversaries, without considering the ultimate criterion. ObfuNAS (Zhou et al., 2022) targets on hiding the accuracy performance, while requiring a searching process for every victim architecture.

2.3. NEURAL ARCHITECTURE SEARCH AND TRANSFORMATION

Neural architecture search (NAS) methods (Cai et al., 2018; Tan et al., 2019; Liu et al., 2018b) are widely studied to automatically design advanced architecture with superior performance in substitution of the hand-crafted design process. Essentially, NAS provides a convenient way for designers to explore a large architecture search space. Evaluation is one of the key components of NAS. To achieve fast and accurate estimation on the substantial architectures, predictor and architecture encoding methods are intensively discussed, e.g. GCN (Kipf & Welling, 2017; Guo et al., 2019) , MLP (Liu et al., 2018a) , GATES (Ning et al., 2020) , etc. These schemes encode an architecture into a continuous embedding, which is used in the follow-up performance estimation. 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 1 0 1 1 1 0 0 0   0 0 0 0 0 0 1 0 0   1 1 1 1 1 0 0 1 0   1 1 1 1 1 0 0 0 Neural architecture transformer (NAT) (Guo et al., 2019) aims to transform an architecture into a pruned one. It employs RL to train a policy that takes the original architecture as the input, and decides how to change each operation to get an architecture with fewer FLOPS and higher performance.

3. THREAT MODEL

The threat model considered is neural architecture extraction through hardware side-channel attacks, e.g., controlling the off-chip memory or snooping the communication bus. The adversary will obtain some hints that can help infer the deployed neural architectures through them. In edge computing scenarios, with AI becoming ubiquitous and mobile, more and more edge devices are powered with intelligence. An adversary can easily obtain physical access to the device by acquisition or theft. In a cloud computing scenario, an honest but curious cloud service provider may also seek the knowledge of NN models running on the cloud. They may fail to get the weight parameters due to encryption technologies (e.g., homomorphic encryption (Orlandi et al., 2007) ), while it is much easier to reveal the model architectures because rich side-channel information can be observed. We consider a strong threat model that the attackers can exactly extract the neural architecture of deployed models on some device while having no ability to obtain the weight parameters as the weight encryption techniques are relatively mature and strong. Then by training a surrogate model with the same neural architecture as the victim deploy model, the attackers can achieve a much higher adversarial attack success rate (Hu et al., 2020; Demontis et al., 2019) . Under this scenario (illustrated in Fig. 1 (Left)), the goal of neural architecture disguising is to find a disguised architecture that has lowest possible adversarial transferability to the actual deploy model.

4.1. PROBLEM DEFINITION

Given an architecture A, the objective of neural architecture disguising is to substitute a subset of its operations (layers) or add some operations (layers) to change its topology. Denoting A as a directed acyclic graph (DAG), it can be represented as A = (V, E), where V is a set of nodes that represent the feature maps, and E is a set of edges that represent the operations. For each operation e ij , it computes on the feature map v i and produces feature map v j . Assuming e ij = O k , an operation disguising is to change O k to O m , satisfying that O m ∈ I(O k ). Here we define "∈" as the inclusion relationship, and I(O k ) as the valid transformation set of operation O k . Then, any operation transformation can only occur within its transformation set. Disguising Space. A principle of neural architecture disguising is maintaining equivalent functionality with the original model because functional correctness must be guaranteed. We can derive the disguising space according to this requirement, and design the corresponding functionality-preserving weight transforms for each operation disguising. For example, as shown in Fig. 2(b-c ), a 1x1 convolution can be disguised to a 3x3 convolution by padding its surrounding values as zero, then the 3x3 convolution can perform equivalent function as 1x1 convolution. Fig. 2 (a) concludes the disguising matrix of different operators. The matrix is applicable on any hardware because these rules are independent of the specific hardware implementation. Problem Formalization. Given an architecture A, the goal is to find an architecture B disguised from A to impede the adversarial transferability from B to A. Denoting A E = (e 1 , e 2 , ..., e n ) as the operation set of the architecture A and B E = (e 1d , e 2d , ..., e nd , e (n+1)d , ..., e (n+m)d ) as operation set of the architecture B, the following constraint should be satisfied: e i ∈ I(e id ), 1 ≤ i ≤ n I(null), n < i ≤ n + m (1) The objective is to minimize the loss of A on adversarial examples which are generated based on B. Considering an adversarial example x ′ B (x), it is usually formulated as: x ′ B (x) = argmax z L f B (z, y), s.t.||z -x|| p ≤ ϵ, where f B (•) denotes the forward function of entire model B, x denotes the clean input, y denotes the label, z denotes the adversarial example, L denotes the loss function (often as utilized in training), and ϵ denotes the adversarial perturbation strength. || * || p denotes the l p norm, usually including l 1 , l 2 , and l ∞ , etc. Then the optimization problem for finding architecture B can be formalized as: argmin B (x,y)∈χ E (x,y) L f A (x ′ B (x), y), s.t.B ∈ I(A), where χ represents the dataset, and I(•) represents the disguising space of some architectures. To measure the expectation, we use the boosted adversarial accuracy of A under the transfer attack of B as the reward, denoted as R(B|A). That is, R(B|A) = AdvAcc B→A -AdvAcc A→A . Unfortunately, the optimization problem is challenging to solve because of the problems from two aspects, i.e. evaluation and exploration. (Pham et al., 2018; Ning et al., 2021) is widely used in the NAS literature to accelerate the evaluation of architecture performances, i.e., to evaluate any architecture, the model directly uses the corresponding weights from a weight-sharing super-net. In this way, the above process can be simplified to a generation-test process by training a single super-net before the evaluation. However, we empirically find that the adversarial transferability evaluated by the super-net has a non-negligible gap with the ground-truth evaluations, as is shown in Fig. 3 (left). To address this problem, we propose a transferability predictor to fast and accurately estimate the adversarial performance, as will be introduced in Sec.4.2 Exploration Problem. Exploring the large disguising space also poses a challenge. Even with strict transformation constraints, the disguising spaces for most architectures is still extremely large. For example, a ResNet cell architecture shown in Fig. 1 ) ]. An intuitive idea is to learn a policy π(•|A) that takes A as the input and outputs the disguising architecture distribution for it. We model the disguising architecture distribution using the joint distribution of multiple operation disguising decisions. To learn this disguising policy, we employ policy gradient with our specific reward design R, as will be described in Sec.4.3. The overall framework of DeepGuiser is shown in Fig. 4 (a). In the following, we will introduce the two main parts of DeepGuiser.

4.2. PREDICTOR: RESOLVING THE EVALUATION PROBLEM

To simultaneously achieve a fast and accurate evaluation on the adversarial transferability of any two architectures, we design a predictor to directly predict the adversarial accuracy given a victim architecture A and a surrogate architecture B, i.e. the expectation E (x,y) L f A (x ′ B (x), y). Predictor Construction. The predictor consists of an architecture encoding block for transforming a discrete architecture into a continuous embedding space, and a regression head for predicting the adversarial accuracy from the architecture embedding. Specifically, we adopt a graph-based architecture encoder GATES (Ning et al., 2020) to convert an arbitrary architecture A to an embedding vector. Then, the embedding will be fed into an MLP-based regressor for regressing the adversarial accuracy, as shown in Fig. 4(c ). Loss Function. The output of the predictor is a regression value which indicates the adversarial accuracy of architecture A over the adversarial examples generated by B, we adopt the mean square error (MSE) loss to train the predictor. L(θ p , A, B, t) = (r(g(A), g(B)|θ p ) -t) 2 , where r(•) denotes the regressor over two input architecture embedding, g(•) denotes the architecture encoder GATES, θ p denotes the parameters of the predictor, t denotes the ground-truth adversarial accuracy of A under the attack from B. Fig. 3 (Right) shows the performance of the trained predictor. Compared to the super-net-based evaluation, the predictions are faster and more accurate.

4.3. POLICY LEARNING: RESOLVING THE EXPLORATION PROBLEM

We denote the disguiser function as f d . Specifically, the disguiser parameters θ d will be trained with the estimation of adversarial transferability given by the predictor. Disguiser construction. Similar to the predictor, the disguiser also consists of an architecture encoding module that encodes an arbitrary architecture A to an embedding, and an MLP module that produces the policy π(•|A). The computation of the disguiser can be represented as: π(•|A; θ d ) = Softmax(f d (g(A)|θ d )), Policy Gradient. We apply policy gradient to learn the disguising policy. As the goal is to maximize the final reward R(B|A), the objective function can be formulated as: max π(•|A) E A∼p(•) [E B∼π(•|A;θ d ) [R(B|A)]], Architecture Embedding where p(•) is the probability distribution for sampling some architecture A. In addition, we add two terms into the commonly-used loss function. Firstly, the operation disguising always introduces larger computational FLOPs and latency. Therefore, we add a penalty term on the reward corresponding to the transformed operation count, denoted as c(•|A). Secondly, we introduce a similar entropy regularization term H(π(•|A; θ d )) to encourage exploration. In summary, the objective function can be formulated as:

GATES Encoder

L policy (θ d ) = E A∼p(•) [E B∼π(•|A;θ d ) [R(B|A) • c(B|A)] + λH(π(•|A; θ d ))], = A p(A)[ B π(B|A; θ d )(R(B|A) • c(B|A)) + λH(π(•|A; θ d ))]. Specifically, in our implementation, the involved functions are formulated as follows respectively: R(B|A) = r(g(A), g(B)) -r(g(A), g(A)), c(B|A) = 1 2 n d -n + 1 , where n d is the number of disguised operations in the cell architecture (i.e. n d = #Diff(A, B)), n is a hyper-parameter that controls the penalty intensity. H(•) denotes the entropy of the distribution. Inference. To infer a disguising for an architecture A, we directly obtain the disguised architecture B by B = f d (A; θ d ). Specifically, we obtain a probability distribution for each operation in the cell architecture, and the transformation with the maximum probability will be selected for disguising.

5.1. EXPERIMENT SETTINGS

Benchmark. We evaluate the effectiveness of DeepGuiser on a variety of neural architectures, including popular hand-crafted architectures (ResNet (He et al., 2016) , VGG (Simonyan & Zisserman, 2014) , and MobileNet-v2 (Sandler et al., 2018) ) and randomly-picked architectures. We conduct experiments on three classification datasets, including CIFAR-10 ( Krizhevsky et al., 2009) , CIFAR-100 (Krizhevsky et al., 2009) , and Tiny-ImageNet (Le & Yang, 2015) to demonstrate the generalization of adversarial transferability of specific architecture disguising. We choose projected gradient descent (PGD) (Madry et al., 2017) with 10 steps under the perturbation strength ϵ = 0.031 (8/255) to generate adversarial examples and evaluate the adversarial accuracy. We also try C&W (Carlini & Wagner, 2017) , AutoAttack (Croce & Hein, 2020) , and DI-FGSM (Xie et al., 2019) adversarial example generation methods to evaluate the generalization on diverse attacks. For evaluating the computational cost, we provide statistics of parameter size, FLOPS, and running latency of those architectures. The "DeepGuiser-OS" method represents the disguiser trained upon the one-shot evaluation given by weight-sharing super-net. Search Space. We take DARTS search space (Liu et al., 2018b) as an example to evalaute the effectiveness of our method. Specifically, the cells are classified into normal cell and reduction cell. Both the cells contain 2 input node and 4 intermediate nodes. Every node can be the end of at most two edges. A total of 9 operation types are involved, as listed in Fig. 2 (a). For every conducted architecture, the base channel is set as 20 and the number of cells is set as 8. Structure. The length of embedding vector produced by GATES is set as 128. Then the regressor MLP structure is 256x64x1 (as will concatenate the embedding of two architectures for prediction). The disguiser MLP structure is 128x256x512x144, where 144 equals (8 (operations in normal cell) + 8 (operations in reduction cell)) × 9 (possible disguising options). Training Details. The training of DeepGuiser contains two parts. For the training of predictor, we utilize the constructed dataset TransAdvBench built upon the DARTS search space, including 8,082 pairs of neural architectures as train data and 484 pairs as test data (see Appendix A.1 for more details). We train the predictor by setting the learning rate as 0.001, batch size as 64, epochs as 30, and weight decay as 0.0005. For the training of disguiser, we set the learning rate as 3e-4, iteration number as 10 4 , entropy coefficient λ as 0.003, penalty controller n as 10.

5.2. THE RESULTS OF IMPEDING ADVERSARIAL TRANSFERABILITY

Table 1 concludes the key metrics for evaluating the effectiveness and efficiency of neural architecture disguising. As can be seen, DeepGuiser can find a better policy for disguising diverse architectures to defend against architecture extraction attacks. Compared to constructing a surrogate model with the same architecture as the victim model, when an adversary can only obtain the disguised fake architecture, the attack success rate will significantly drop, e.g. 33.55% higher adversarial accuracy will be maintained for ResNet20 when it is attacked by the disguised ResNet20. We also evaluate the cost for disguising those architectures. Inevitably, disguising will introduce extra computation and parameters. The experiments show that DeepGuiser outperforms the other methods at even a lower latency cost. Moreover, we surprisingly find that the clean accuracy of disguised architectures by DeepGuiser also significantly drops, further reducing the profits of the adversaries from the attacks, i.e. the attacker obtains much worse architectures while the actual architectures are hidden. 

5.3. GENERALIZATION ON DIFFERENT ATTACK METHODS

In this experiment, we evaluate the generalization of adversarial transferability under different attack methods. Table 2 concludes the results. As can be seen, the adversarial transferability from surrogate models to victim models can still be impeded (though not as significant as under PGD attack) with different adversarial example generation methods. This eliminates the need to build a dataset corresponding to each attack method and perform repeated training for all types of attacks.

5.4. GENERALIZATION OF THE DISGUISING POLICY TO OTHER DATASETS

We explore whether the disguising policy trained on CIFAR-10 can be utilized on other datasets. We test the adversarial transferability of the disguising architecture to the original architecture on CIFAR-100 and Tiny-ImageNet. As shown in Table 3 , the disguised architectures still demonstrate consistent results as in the experiments on CIFAR-10. The results suggest that the transferability between architectures is largely determined by their architecture characteristics and general on different datasets. That is to say, DeepGuiser can be trained on a proxy dataset like CIFAR-10 and generalized to other datasets, eliminating the need to collect TransAdvBench for every new dataset.

6. CONCLUSION

In this work, we propose DeepGuiser to automatically disguise neural architectures for impeding the adversarial transfer attacks after neural architecture extraction. DeepGuiser employs a predictor that predicts the adversarial transferability between architectures to learn a disguising policy to transform the architecture operations. DeepGuiser converts the trained model to a functionally-equivalent "deploy model" with a disguising architecture in a hardware-agnostic and post-training way, which can be applied no matter what hardware platform is used and incorporates no retraining or finetuning of the trained model. Experimental results show that DeepGuiser can effectively impede the effectiveness of adversarial transfer attacks following the architecture extraction attack.

A APPENDIX

A.1 TRANSADVBENCH As illustrated in the main manuscript, TransAdvBench is built for benchmarking the adversarial transferability between neural architectures. Here we present the details on the collection, annotation, statistics, and quality of the dataset. Data Source. To make the samples more representative and generalized, we collect diverse neural architectures from the DARTS search space. Every architecture will be built upon a specific normal cell architecture and a specific reduction cell architecture. Both the cells contain 2 input node and 4 intermediate nodes. Every node can be the end of at most two edges. A total of 9 operation types are involved. The base channel number is set as 20 and every architecture will cascade eight cells. Data Construction. Every data point will contain two architectures, one for victim neural architecture (denoted as A), and the other for surrogate neural architecture (denoted as B) to generate adversarial examples for transfer attack. The annotations for every architecture pair include: clean accuracy of A, clean accuracy of B, adversarial accuracy of A under the transfer attack from B. Data Collection. To collect a data point, we fully train the two neural architectures independently. We split the full train set of CIFAR-10, which has 50,000 images in total, then take 80% of the data for training and 20% of the data for validation, to ensure the strict isolation of train data and validation data. Every model will be fully trained on the train subset and tested on the validation subset. We choose projected gradient descent (PGD) with 10 steps under the perturbation strength ϵ = 0.031 (8/255) to generate adversarial examples and evaluate the adversarial accuracy. Every single model is trained with a batch-size of 64, a cosine learning rate scheduling with maximum 0.05 and minimum 0.01, and a total of 50 epochs. Statistics. Here we provide some statistics of the dataset. Overall, TransAdvBench contains 8,082 data points for training and 484 data points for testing. The 8,082 train data come from 5,473 different neural architectures and the 484 data point come from 605 different neural architectures. Among the training dataset, we pick 1,117 different architectures to be the victim architectures, and every architecture will have four surrogate architectures to produce the data points. That is, 1, 117 × 4 = 4, 468 data points are produced. Then we shuffle all architectures and randomly select another 3,614 victim-surrogate pairs to produce the left data points. As the main purpose of this work is to identify the transferability between an original architecture and its disguised architectures, we construct the test dataset by sampling 121 victim architectures. For each victim architecture, 4 other surrogate architectures will be sampled based on the disguising rules.In total, there will be 121 × 4 = 484 data points in test set. Note that we strictly ensure the isolation of the neural architectures in train set and test set. To our best knowledge, this is the first data bench for evaluating the adversarial transferability among diverse neural architectures. Since the collection of each data point is expensive (requires a traininggeneration-test process), the scale of the dataset is still limited currently , and we only take PGD-10 with perturbation strength 0.031 for evaluating the adversarial accuracy. We are working on building a larger dataset with more reasonable sampling method and richer annotations.

A.2 INSIGHTS FROM TRANSADVBENCH

We provide several insights about the connection between adversarial transferability and neural architectures. To be specific, we attempted to answer two questions: What are the features of a easy-to-disguise neural architecture? What kinds of operation transformations can effectively reduce the adversarial transferability between neural architectures? Since TransAdvBench gives accurate transferability evaluation between thousands of neural architectures, we can get relatively credible For example, assume in 10 surrogate architectures there exist at least one sep_conv_3x3 transformed from other operations. The total adversarial accuracy increase of the 10 pairs is 20%, then all operations/sep_conv_3x3 in the table should be 2% insights from analyzing TransAdvBench. We show the statistics on average adversarial accuracy increase in Fig. 6 and the statistics on the portion of operations in victim architectures in Fig. 7 . The statistics involves 1760 victim and surrogate architectures in TransAdvBench in which operation transformations strictly meet the disguising rules. From the statistics we get the following insights. Insight 1: Disguising normal cells leads to higher average adversarial accuracy increase than disguising reduction cells. From Fig. 6 we notice that the biggest average adversarial accuracy increase by transforming operation in normal cell is 4.32% while in reduction cell 3.01%. Additionally, the operation transformations with top 5 average adversarial accuracy increase are all in normal cells. One possible reason for this phenomenon is that in the disguising space of our work, a neural architecture is completed by repeatedly assembling the normal cell architecture and the reduction cell architecture Insight 3: Transforming an operation to sep_conv_3x3 yields the biggest adversarial accuracy increase. The operation transformations in TransAdvBench are completely random. From the bottom row of Fig. 6 we can see that transforming an operation to sep_conv_3x3 yields 3.86% adversarial accuracy increase in normal cell and 2.57% adversarial accuracy increase in reduction cell, both exceed other operations.

A.3 TRAINING CURVES

Fig. 8 shows the curves of predictor training based on MSE loss presented as Equation 5 in the main manuscript. For making the loss more significant, we multiply a scaling factor 100 for the loss values. As can be seen, the predictor converges fast, with a rapid loss value dropping and a significant increasing on the Kendall's Tau of the predictions. Moreover, the trained predictor can generalize to test set, achieving superior prediction on both adversarial accuracy and the ranking quality. The experimental results show that adversarial accuracy of some architecture A under the adversarial transfer attack from another architecture B is predictable, suggesting that there are internal characteristics in neural architectures that affecting the adversarial transferability. Therefore, capturing the characteristics is feasible such that we can learn an automatic disguiser to discover the least transferable architecture to any victim architecture. 



Figure 1: (Left) 1: The adversary can snoop the system to extract the architecture of the deployed model. 2&3: The architecture information can be utilized to train a surrogate with high transferability and then craft effective adversarial examples to attack the trained model. (Right) DeepGuiser disguises the trained model to a functionally equivalent deploy model with a disguising architecture. Then, this deploy model is deployed onto the chip. Even if the adversary extracts the disguising architecture through snooping and trains the surrogate model, the adversarial examples crafted using the surrogate have low transferability to the original trained model and also the actual deployed model.

Figure2: (a) The inclusion matrix of diverse operations in our implementation. The vertical axis represents original type and the horizontal axis represents the transformed type. In the table, "1" means corresponding disguising is valid and "0" means invalid; (b) an example of candidate neural architecture; (c) Disguised neural architecture from (b) based on the rules in (a). For example, transforming an 1x1 convolution kernel to 3x3 only needs to pad a surrounding zeros, then the two architectures are functionally equivalent while computationally different.

Figure 3: Adversarial transferability evaluation. Left: one-shot evaluation within super-net versus the ground-truth attack success rate; Right: evaluation with designed predictor versus the ground-truth transfer adversarial accuracy. The Spearman ranking correlation reflects the ranking quality of corresponding estimation.

Figure 4: The framework and structure of DeepGuiser. (a) Overall flow of learning DeepGuiser; (b) Pros and cons of different evaluation methods; (c) the basic structure of predcitor and disguiser.

Fig.5(Left)  shows the statistical distribution of the clean accuracy of sampled neural architectures. Most of the data distribute on the range of 0.9 to 0.97.Fig.5(Right)  shows the statistical distribution of the adversarial accuracy of the data points. It can be figured out that the adversarial accuracy spreads over a wide range, while the distribution is non-uniform and has long tails, thus might cause biasing to the predictor training. Nevertheless, the training data points can provide sufficient generalization on diverse architectures, and can be used for guiding the training of predictor.

Figure 5: Statistics of TransAdvBench. Left: the distribution of clean accuracy of the sampled neural architectures. Right: the distribution of the adversarial accuracy of the 8,566 (8,082 train data plus 484 test data) pairs of neural architectures.

Figure6: Average adversarial accuracy increase of each operation transformation. The numbers in the figure are the average adversarial accuracy increase on each operation transformation. For example, assume in 10 victim and surrogate architectures at least one skip connection is transformed to convolution kernels with size 5x5, the total adversarial accuracy increase of the 10 pairs are 20%. The the skip_connect/conv_3x3 in the table should be 20% / 10 = 2%. Left: Average adversarial accuracy increase of normal cells; Right: Average adversarial accuracy increase of reduction cells. The bottom row indicates the average reward transformed from all operations to a specific operation. For example, assume in 10 surrogate architectures there exist at least one sep_conv_3x3 transformed from other operations. The total adversarial accuracy increase of the 10 pairs is 20%, then all operations/sep_conv_3x3 in the table should be 2%

Figure 7: Average share of each operations in victim architectures. We sort the 1760 victim and surrogate architectures by their adversarial accuracy increase from high to low. The x-axis indicates the number of architectures (counting from highest) are used to compute the average share. The y-axis indicates the average share of an operation in victim architectures. For example, (50, 0.21) on the avg_pool_3x3 line means that 21% of operations in victim architectures whose surrogate architectures yield the highest 50 adversarial accuracy increase are avg_pool_3x3. Left: Average share of normal cells; Right: Average share of reduction cells.

Figure 10: Probability of operation transformations. 1000 different architectures are sampled to count the probability. Left: probability of normal cells; Right: probability of reduction cells. The zeros numbers means that the disguising is not allowed due to computational rules.

Figure 13: Graph view of sampled neural architectures (III).



Despite the process can give accurate transferability evaluation, all the steps (especially training) are time-consuming. The weight-sharing mechanism

has approximately (6 6 × 2 × 2) 2 > 10 11 possible disguised architectures within the DARTS(Liu et al., 2018b)  search space. Denoting p(•|A) as the probability distribution of sampling architecture B from the disguising space of A, given the original architecture A, we aim to find the disguising architecture distribution p(•|A) that maximizes E B∼p(•|A) [R(B|A

Results of neural architecture disguising by different methods on CIFAR-10. Acc adv means the adversarial accuracy of the original model under black-box transfer attack from the disguised models. The Random-Arch and Random disguising report the average values of 20 randomly sampled architectures. The latency is tested on an NVIDIA GeForce RTX 3090 by averaging 1000 times of forward inference. "↑" represents higher better and "↓" represents lower better.

Results of neural architecture disguising under different attack methods. All numbers are the adversarial accuracy of the original model under black-box transfer attack from the disguised models.

Results of neural architecture disguising by different methods on CIFAR-100 and Tiny-ImageNet. Acc adv means the adversarial accuracy of the original model under black-box transfer attack from the disguised models. "↑" represents higher better and "↓" represents lower better. Acc clean (%) ↓ Acc adv (%) ↑ Latency (ms) ↓ Acc clean (%) ↓ Acc adv (%) ↑

A.4 ABLATION STUDY ON THE ARCHITECTURE ENCODING SCHEMES

We further compare the performance of different architecture encoding schemes, including GATES method and GCN method (utilized by NAT). As shown in Fig. 9 , when applying GCN for generating the architecture embedding, the predictor tends to overfit to the train dataset, with a much higher MSE loss and a much lower Kendall's Tau on test dataset compared to GATES-based encoding.The results demonstrate the superiority of GATES as the architecture encoder as it provides better modeling of the neural architectures and can help the predictor to obtain better ranking quality and lower prediction error.

A.5 DISGUISING PROBABILITY STATISTICS

We show the statistics on the probability of operation transformations in Fig. 10 . Several insights can be figured out. First, DeepGuiser tends to disguise the convolutional layers with expanded kernels, e.g. conv3x3 has a high probability of 87% to be disguised as conv5x5, and conv1x1 has a probability of 77% to be disguised to conv3x3 or conv5x5. Second, changing the network topology is expected to achieve higher reward, as skip connection has a higher probability of 92% to be disguised to other operations and null also has a probability of 66% to be changed. Overall, the probability distributions

