GINN: Fast GPU-TEE Based Integrity for Neural Network Training

Abstract

Machine learning models based on Deep Neural Networks (DNNs) are increasingly deployed in a wide range of applications, ranging from self-driving cars to COVID-19 treatment discovery. To support the computational power necessary to learn a DNN, cloud environments with dedicated hardware support have emerged as critical infrastructure. However, there are many integrity challenges associated with outsourcing computation. Various approaches have been developed to address these challenges, building on trusted execution environments (TEE). Yet, no existing approach scales up to support realistic integrity-preserving DNN model training for heavy workloads (deep architectures and millions of training examples) without sustaining a significant performance hit. To mitigate the time gap between pure TEE (full integrity) and pure GPU (no integrity), we combine random verification of selected computation steps with systematic adjustments of DNN hyperparameters (e.g., a narrow gradient clipping range), hence limiting the attacker's ability to shift the model parameters significantly provided that the step is not selected for verification during its training phase. Experimental results show the new approach achieves 2X to 20X performance improvement over pure TEE based solution while guaranteeing a very high probability of integrity (e.g., 0.999) with respect to state-of-the-art DNN backdoor attacks.

1. INTRODUCTION

Every day, Deep Learning (DL) is incorporated into some new aspects of the society. As a result, numerous industries increasingly rely on DL models to make decisions, ranging from computer vision to natural language processing. The training process for these DL models requires a substantial quantity of computational resources (often in a distributed fashion) for training, which traditional CPUs are unable to fulfill. Hence, special hardware, with massive parallel computing capabilities such as GPUs, is often utilized Shi et al. (2016) . At the same time, the DL model building process is increasingly outsourced to the cloud. This is natural, as applying cloud services (e.g., Amazon EC2, Microsoft Azure or Google Cloud) for DL training can be more fiscally palatable for companies by enabling them to focus on the software aspect of their products. Nevertheless, such outsourcing raises numerous concerns with respect to the privacy and integrity of the learned models. In recognition of the privacy and integrity concerns around DL (and Machine Learning (ML) in general), a considerable amount of research has been dedicated to applied cryptography, in three general areas: 2018)). However, the majority of these investigations are limited in that: 1) they are only applicable to simple shallow network models, 2) they are evaluated with datasets that have a small number of records (such as MNIST LeCun & Cortes (2010) and CIFAR10 Krizhevsky et al.), and 3) they incur a substantial amount of overhead that is unacceptable for real-life DL training workloads. In their effort to mitigate some of these problems, and securely move from CPUs to GPUs, Slalom Tramèr & Boneh (2019) mainly focus on the computational integrity at the test phase while depending on the application context. It can also support enhanced data privacy, however, at a much greater performance cost. To address these limitations, we introduce GINN (See Figure 1 ); a framework for integritypreserving learning as a service that provides integrity guarantees in outsourced DL model training in TEEs. We assume that only the TEE running in the cloud is trusted, and all the other resources



1) Multi-Party Computation (MPC) (e.g., Mohassel & Zhang (2017)), 2) Homomorphic Encryption (HE) (e.g., Gilad-Bachrach et al. (2016)), and 3) Trusted Execution Environment (TEE) (e.g., Hunt et al. (2018); Hynes et al. (

