HYPHEN: A HYBRID PACKING METHOD AND OPTIMIZATIONS FOR HOMOMORPHIC ENCRYPTION BASED NEURAL NETWORK

Abstract

Private Inference (PI) enables users to enjoy secure AI inference services while companies comply with regulations. Fully Homomorphic Encryption (FHE) based Convolutional Neural Network (CNN) inference is promising as users can offload the whole computation process to the server while protecting the privacy of sensitive data. Recent advances in AI research have enabled HE-friendly deep CNN like ResNet. However, FHE-based CNN (HCNN) suffers from high computational overhead. Prior HCNN approaches rely on dense packing techniques that aggregate as many channels into the ciphertext to reduce element-wise operations like multiplication and bootstrapping. However, these approaches require performing an excessive amount of homomorphic rotations to accumulate channels and maintain dense data organization, which takes up most of the runtime. To overcome this limitation, we present HyPHEN, a deep HCNN implementation that drastically reduces the number of homomorphic rotations. HyPHEN leverages two convolution algorithms, CAConv and RAConv. Alternating between two convolution algorithms leads to a significant reduction in rotation count. Furthermore, we propose hybrid gap packing method for HyPHEN, which gathers sparse convolution results into a dense data organization with a marginal increase in the number of rotations. HyPHEN explores the trade-off between the computational costs of rotations and other operations, and finds the optimal point minimizing the execution time. With these optimizations, HyPHEN takes 3.4-4.4× less execution time than the state-of-the-art HCNN implementation and brings the runtimes of ResNet on CIFAR10 inference down to 1.44-13.37s using a GPU-accelerated HEAAN library.

1. INTRODUCTION

Private inference (PI) has recently gained the spotlight in the MLaaS domain as cloud companies should comply with privacy regulations such as GDPR Regulation (2016) and HIPAA Act (1996) . PI enables inference services at the cloud server while protecting the privacy of the client and the intellectual properties of the service provider. For instance, hospitals can provide a private medical diagnosis of diseases, and security companies can provide private surveillance systems without accessing client's sensitive data (Kumar et al., 2020; Bowditch et al., 2020) . PI can be achieved using various cryptographic primitives (Gentry, 2009; Yao, 1982; Costan & Devadas, 2016) . Fully Homomorphic Encryption (FHE), which is a set of cryptographic schemes that can directly evaluate a rich set of functions on encrypted data, is especially suited for PI. FHEbased PI solution uniquely features 1) full offloading of the computation process to the server, 2) succinct data communication requirement, and 3) non-disclosure of any information about the model except the inference result. Such benefits have driven researchers to investigate convolutional neural network (CNN) PI implementations using FHE (Gilad-Bachrach et al., 2016; Brutzkus et al., 2019; Dathathri et al., 2020; Lee et al., 2022a; Aharoni et al., 2020) . To implement CNN using FHE, activation functions should be replaced with polynomials as FHE only supports arithmetic operations of addition and multiplication. Given the constraint, two classes of polynomial activation functions have been proposed: (i) low-degree polynomials (Gilad-Bachrach et al., 2016; Chabanne et al., 2017) replacing the activation functions in training neural networks, and (ii) more precise high-degree approximation of ReLU (Lee et al., 2021) that replaces RELU during PI without additional training. However, both approaches lack practicality; low-degree polynomials are not applicable to deep neural networks and high-degree approximation significantly degrades the runtime of PI. Recently, Park et al. (2022) showed that deep homomorphic CNNs (HCNNs) can be trained with low-degree polynomials even for complex image datasets with their proposal, AESPA, which utilizes orthogonal polynomial bases and fuses activation functions with batch normalization (BN) to turn them into second-degree polynomials. AESPA does not sacrifice runtime nor accuracy unlike prior approaches, thus we employ AESPA in our work. Another line of research lies in implementing an efficient convolution algorithm in FHE. Gazelle (Juvekar et al., 2018) proposed a convolution algorithm that can compute a single Conv layer on FHE. However, Gazelle's method cannot be directly applied to continuous convolutions as it requires adjusting arrangement of data by re-encrypting ciphertexts after every Conv layer. Lee et al. (2022a) modified Gazelle's convolution by densely mapping data into a ciphertext before entering the next Conv layer. However, the current state of HCNN is far from being practical. Using the convolution algorithm of Lee et al. (2022a) and approximated ReLU, inference times of ResNet20 CIFAR-10 are 1662/174s using a single/64 threads in our CPU environment. Despite the unique advantages of FHE-based PI, the huge runtime overhead prevents FHE from being the go-to solution for PI. We propose Hybrid Packing method and optimizations for Homomophic Encryption-based neural Network (HyPHEN), which mitigates the huge overhead of HCNN with an optimized convolution algorithm and packing method. We observe that after AEPSA is applied, rotation operations in HCNN take up the majority of the runtime (See Appendix A) and most of the rotations (92-99%) are spent to implement the sum of channels within the same ciphertext and maintain data organization. We design a novel convolution algorithm named RAConv that does not require rotations to accumulate channels. In addition, based on the observation that maintaining a single data organization necessitates massive unnecessary rotations, we design RAConv to take the new data organization based on the replication of the images. By alternating between two data organizations, we remove rotations priorly required to adjust the data organization. HyPHEN also includes a novel Hybrid Packing (HP) method that effectively handles the gap arising from strided convolution (Section 3.2). HyPHEN achieves 39.6s and 1.44s of runtime in ResNet20 for the CIFAR-10 dataset on CPU and GPU, respectively. The key contributions of the paper are as follows: • We propose a replication-based convolution method, RAConv, that can effectively reduce two types of unnecessary rotations which are the major bottleneck in HCNN. • We propose a novel hybrid packing (HP) method that can utilize the entire slots of a ciphertext with a marginal increase in the number of rotations. • Our experiments show that our HCNN implementation with HyPHEN improves the inference latency by 3.4-4.4× over prior state-of-the-art HCNNs for ResNet on CIFAR-10.

2. BACKGROUND

2.1 FULLY HOMOMORPHIC ENCRYPTION FHE is a set of public key encryption schemes that can perform computation on encrypted data. Among several popular FHE schemes, RNS-CKKS (Cheon et al., 2018) has been broadly adopted in the PI domain as it supports fixed-point numbers and slot batching. A plaintext in RNS-CKKS is an unencrypted degree-N polynomial in a cyclotomic polynomial ring, R Q = Z Q [X]/(X N +1). A plaintext maps to a message which is a vector of N/2 real (or complex) numbers. Thus a single plaintext batches N/2 slots, which can store complex or real numbers. CKKS encrypts a plaintext into a ciphertext in R 2 Q . Q is a ring modulus which is represented by a set of prime modulus obtained from the Chinese Remainder Theorem (CRT) as l i=0 q i (1 ≤ l ≤ L). L and l denote the initial and current level of a ciphertext. The level is an HE-specific resource that determines the number of multiplications applicable to a given ciphertext. We also denote the associated level of ring modulus using subscript as Q L or Q l . We denote the plaintext and ciphertext of a message a as 〈a〉 and [a]. HE operations of addition, multiplication, and rotation can be described as follows: 



HE.Eval([a],[b],f l ) = HE.Eval([a],〈b〉,f l ) = [f l (a,b)] • HE.Rotate([a],r) = [rot(a,r)]

