MODEL OBFUSCATION FOR SECURING DEPLOYED NEURAL NETWORKS

Abstract

More and more edge devices and mobile apps are leveraging deep learning (DL) capabilities. Deploying such models on devices -referred to as on-device models -rather than as remote cloud-hosted services, has gained popularity as it avoids transmitting user's data off of the device and for high response time. However, on-device models can be easily attacked, as they can be accessed by unpacking corresponding apps and the model is fully exposed to attackers. Recent studies show that adversaries can easily generate white-box-like attacks for an on-device model or even inverse its training data. To protect on-device models from whitebox attacks, we propose a novel technique called model obfuscation. Specifically, model obfuscation hides and obfuscates the key information -structure, parameters and attributes -of models by renaming, parameter encapsulation, neural structure obfuscation, shortcut injection, and extra layer injection. We have developed a prototype tool ModelObfuscator to automatically obfuscate on-device TFLite models. Our experiments show that this proposed approach can dramatically improve model security by significantly increasing the difficulty of extracting models' inner information, without increasing the latency of DL models. Our proposed on-device model obfuscation has the potential to be a fundamental technique for on-device model deployment. Our prototype tool is publicly available at https://github.com/AnonymousAuthor000/Code2536.

1. INTRODUCTION

Numerous edge and mobile devices are leveraging deep learning (DL) capabilities. Though DL models can be deployed on a cloud platform, data transmission between mobile devices and the cloud may compromise user privacy and suffer from severe latency and throughput issues. To achieve high-level security, users' personal data should not be sent outside the device. To achieve high throughput and short response time, especially for a large number of devices, on-device DL models are needed. The capabilities of newer mobile devices and some edge devices keep increasing, with more powerful systems on a chip (SoCs) and a large amount of memory, making them suitable for running on-device models. Indeed, many intelligent applications have already been deployed on devices (Xu et al., 2019) and benefited millions of users. Unfortunately, it has been shown that on-device DL models can be easily extracted. Then, the extracted model can be used to produce many kinds of attacks, such as adversarial attacks, membership inference attacks, model inversion attacks, etc. (Szegedy et al., 2013; Chen et al., 2017b; Shokri et al., 2017; Fang et al., 2020) . The deployed DL model can be extracted by three kinds of attacks: (1) extracting the model's weights through queries (Tramèr et al., 2016) . ( 2) extracting the entire model from devices using software analysis (Vallée-Rai et al., 2010) or reverse engineering (Li et al., 2021b) . (3) extracting the model's architecture by side-channel attacks (Li et al., 2021a) . According to our observation, existing defense methods can be categorized into two different levels: (1) algorithm level, and (2) side-channel level. For securing the AI model at the algorithm level, some studies (Orekondy et al., 2019b; Kariyappa & Qureshi, 2020; Mazeika et al., 2022) propose methods to degenerate the effectiveness of query-based model extraction. While other studies (Xu et al., 2018; Szentannai et al., 2019; 2020) propose methods to train a simulating model, which has similar performance to the original model but is more resilient to extraction attacks. For securing the AI model at the side-channel level, a recent work modifies the CPU and Memory costs to resist the model extraction attacks (Li et al., 2021a) . Although many attacks have been proposed to extract DL models, it is hard for adversaries to precisely reconstruct DL models that are identical to the original ones using queries or side-channel information. These attack cannot access the inner information of the model, which means they are black-box attacks. In contrast, since on-device models are delivered in mobile apps and hosted on mobile devices, adversaries can easily unpack the mobile apps to extract the original models for exploitation. It will enable serious intellectual property leakage and adversaries can further generate white-box attacks, which are much more effective than black-box attacks (Zhang et al., 2022) . Despite that model extraction using software analysis may lead to severe consequences, to the best of our knowledge, the community has not yet been aware of this attack, and no effective defense method has been proposed against it. In this paper, we propose a novel model protection approach based on model obfuscation, which focuses on improving AI safety for resisting model extraction using software analysis. Given a trained model and its underlying DL library (e.g., PyTorch, TensorFlow, TFLite and etc.), an end2end prototype tool, ModelObfuscator, is developed to generate the obfuscated on-device model and the corresponding DL library. Specifically, ModelObfuscator first extracts the information of the target model and locates the source code in the library used by its layers. Then, it obfuscates the information of the models and builds a customized DL library that is compatible with the obfuscated model. To achieve this, we design five obfuscation methods, including: (1) renaming, (2) parameter encapsulation, (3) neural structure obfuscation, (4) random shortcut injection, and (5) random extra layer injection. These obfuscation methods significantly increase the difficulty of parsing the information of the model. The model obfuscation can prevent adversaries from reconstructing the model. In addition, adversaries also hard to transfer the trained weights and structure of models to steal intellectual property using model conversion because the connection between the obfuscated information and the original one is randomly generated. Experiments on 10 different models show that ModelObfuscator can against state-of-the-art model parsing and attack tools with a negligible time overhead and 20% storage overhead. Our contributions in this work include: • We propose the model obfuscation framework to hide the key information of deployed DL models at the software level. It can prevent adversaries from generating white-box attacks and stealing the knowledge of on-device models. • We design five obfuscation strategies for protecting on-device models and provide an end2end prototype tool, ModelObfuscator. This tool automatically obfuscates the model and builds a compatible DL software library. The tool is open-source available. • We provide a taxonomy and comparison of different obfuscation methods in terms of effectiveness and overhead, to guide model owners in choosing appropriate defense strategies.

2. RELATED WORK

Model Extraction Attacks and Defenses For model extraction attacks, adversaries can effectively extract the model in black-box setting. They can use collected samples to query the target model to reconstruct the target model (Tramèr et al., 2016; Papernot et al., 2017; Orekondy et al., 2019a; He et al., 2021; Rakin et al., 2022) , or use the synthetic sample to steal the information of target models (Zhou et al., 2020; Kariyappa et al., 2021; Yuan et al., 2022; Sanyal et al., 2022) . For defending against the model extraction attacks, various methods (Orekondy et al., 2019b; Kariyappa & Qureshi, 2020; Mazeika et al., 2022) have been proposed to degenerate the performance of model extraction attacks. Some methods (Szentannai et al., 2019; 2020) have been proposed to train a simulating model, which has similar performance to the original model, but can reduce the effectiveness of attacks. In addition, watermarking is also a promising method to defend against the model extraction ( Yang et al., 2019; Fan et al., 2019; Lukas et al., 2019) . Adversarial Machine Learning: Currently, adversaries can use many kinds of attacks to challenge the reliability of DL models, such as adversarial attacks, membership inference attacks, model stealing attacks, and model inversion attacks. For the adversarial attack, depending on the knowledge required by the adversary, adversarial attacks can be categorized into white-box attacks such as gradient-based attacks (Croce & Hein, 2020; Goodfellow et al., 2015; Kurakin et al., 2017; Papernot et al., 2016; Moosavi-Dezfooli et al., 2016; Madry et al., 2018; Moosavi-Dezfooli et al., 2017) , and

