SPENCNN: ORCHESTRATING ENCODING AND SPARSITY FOR FAST HOMOMORPHICALLY ENCRYPTED NEURAL NETWORK INFERENCE

Abstract

Homomorphic Encryption (HE) is a promising technology for protecting user's data privacy for Machine Learning as a Service (MLaaS) on public clouds. However, the computation overheads associated with the HE operations, which can be orders of magnitude slower than their counterparts for plaintexts, can lead to extremely high latency in neural network inference, seriously hindering its application in practice. While extensive neural network optimization techniques have been proposed, such as sparsification and pruning for plaintext domain, they cannot address this problem effectively. In this paper, we propose an HE-based CNN inference framework, i.e., SpENCNN, that can effectively exploit the single-instruction-multiple-data (SIMD) feature of the HE scheme to improve the CNN inference latency. In particular, we first develop a HE-group convolution technique that can partition channels among different groups based on the data size and ciphertext size, and then encode them into the same ciphertext in an interleaved manner, so as to dramatically reduce the bottlenecked operations in HE convolution. We further develop a sub-block weight pruning technique that can reduce more costly HE-operations for CNN convolutions. Our experiment results show that the SpENCNN-optimized CNN models can achieve overall speedups of 8.37x, 12.11x, and 19.26x for LeNet, VGG-5, and HEFNet, respectively, with negligible accuracy loss.

1. INTRODUCTION

For the past decade, we have witnessed the tremendous progress of the machine-learning technology and the great success achieved in practical applications. Convolution Neural Network (CNN) models, for example, have been widely used for many cognitive tasks such as face recognition, medical imaging, and human action recognition. Meanwhile, there is a growing interest to deploy machine learning models on the cloud as a service (MLaaS). While cloud computing has been well recognized as an attractive solution, especially for computation intensive applications such as the MLaaS, outsourcing sensitive data and data processing on cloud can pose a severe threat to user's privacy. Homomorphic Encryption (HE) is a promising technology for protecting user's privacy when deploying MLaaS on cloud. HE allows computations be performed on encrypted inputs and the decrypted output matches the corresponding results computed from the original inputs. Thus, a client can encrypt the sensitive data locally and send the encrypted ciphertexts to the cloud. All intermediate results will maintain encrypted, and the encrypted results sent from cloud can be correctly decrypted using the secret key hold by the client. Whlie HE can help to maintain the confidentiality for computation process on cloud effectively, one major problem has to deal with is the excessive computational cost associated with the operations over the encrypted data: HE operations (e.g. HE multiplication, additions on encrypted data) can be several (i.e., three to seven) orders of magnitude slower than the corresponding operations on plaintexts. The tremendous computational cost of HE has been the largest bottleneck that hinders its applications on cloud. One of the most effective approaches (e.g. (Gilad-Bachrach et al., 2016; Brutzkus et al., 2019; Dathathri et al., 2019; Kim et al., 2022) ) to reduce the HE computational cost is to take advantage of the single-instruction-multiple-data (SIMD) capability, supported by HE schemes, e.g. CKKS and BFV. Smart & Vercauteren (2010) initially proposed to pack multiple data elements in the plaintext

