THE IMPACT OF ENSEMBLE ON HOMOMORPHIC ENCRYPTED DATA CLASSIFICATION

Abstract

Homomorphic encryption (HE) is encryption that permits users to perform computations on encrypted data without first decrypting it. HE can be used for privacy-preserving outsourced computation and analysis, allowing data to be encrypted and outsourced to commercial cloud environments for processing while encrypted or sensitive data. HE enables new services by removing privacy barriers inhibiting data sharing or increasing the security of existing services. A convolution neural network (CNN) can be homomorphically evaluated using addition and multiplication by replacing the activation function, such as ReLU, with a low polynomial degree. To achieve the same performance as the ReLU activation function, we study the impact of applying the ensemble techniques to solve the accuracy problem. Our experimental results empirically show that the ensemble approach can reduce bias, and variance, increasing accuracy to achieve the same ReLU performance with parallel and sequential techniques. We demonstrate the effectiveness and robustness of our method using three datasets: MNIST, FMNIST, and CIFAR-10 .

1. INTRODUCTION

Homomorphic encryption (HE) is a private artificial intelligence (AI) application that allows users to perform computations on encrypted data without decryption, and the result of calculations will be in an encrypted form when decrypted. As a result, HE is ideal for working with sensitive data to privacy-preserving outsourced storage and computation. In other words, HE allows data to be encrypted and outsourced to commercial cloud environments for processing, all while encrypted. Deep learning on the cloud enables designing, developing, and training deep learning applications faster by leveraging distributed networks and HE and cloud computing, allowing large datasets to be easily ingested and managed to train algorithms. It will enable deep learning models to scale efficiently and lower costs. HE scheme adopting bit-wise encryption performs arbitrary operations with an extensive execution time but to shorten execution time, a method adopts a HE scheme to encrypt integers or complex numbers. An HE scheme is usually defined in a finite field, so it only supports two finite field operations, addition, and multiplication, which can behave entirely differently than floating point numbers operations used in typical AI applications. Accordingly, functions commonly used in deep learning, such as ReLU, Sigmod, and max-pooling, are not compatible with HE (Obla, 2020). To address this issue, the polynomial activation function can evaluate CNN to address since HE straightforwardly supports additions and multiplications. Due to the increased complexity in computing circuits with nested multiplications, it is desirable to restrict the computation to low-degree polynomials (Gilad-Bachrach et al., 2016) . However, replacing ReLU with a low-degree polynomial, combined with other techniques such as batch normalization (BN) (S.IoffeandC.Szegedy, 2015), still suffers from high bias, variance, and low accuracy. Intuitively, the ensemble is a machine learning approach that consists of a set of individual weak learning models working in sequential or parallel. The outputs are combined with a decision fusion strategy to produce a single and better performance than any single model (Huang et al., 2009) . That motivates us to introduce the ensemble approach to enhance accuracy by reducing bias and variance when an HE scheme. Ensemble learning has been recently known to be an essential reason for improving the performance of deep learning models because sub-models do not strongly depend on each other, even though they are trained jointly. Moreover, they exhibit ensemble-like behavior in that their performance smoothly correlates with the number of valid paths and enables them to work with the depth of HE network (Veit et al., 2016) . The success of ensembles is due to the focus on getting a better-performed new model by reducing the bias or variance of weak learners by combining several of them to create a strong learner that achieves better performances. It can be differentiated as bagging will focus on getting an ensemble model with reduced variance, whereas boosting and stacking will produce strong models less biased than their components (Sagi & Rokach, 2018) . Generally, ensemble techniques are considered one of the best approaches to better performance due to lower error and overfitting than any individual method, leading to better performance in the test set. If each learner might have a sort of bias, or variance, combining them can reduce this bias. Generally, we can say ensemble is one of the most approaches to better performance (Zhou et al., 2002) . In this paper, we propose an ensemble approach to improve the accuracy of HE-based privacypreserving data inference with deep neural networks (DNN) for both sequential and parallel ensembles when replacing the ReLU activation function with polynomials. We applied customized sequential ensemble techniques that can be applied to different numbers of CNN models, which will be involved in the multi-class prediction while using polynomials as activation functions in the hidden layer. We applied the bagging method for the parallel ensemble technique and studied the ensemble's impact on bias and variance. Our results indicate that an ensemble could significantly reduce variance and boost accuracy. To the best of our knowledge, this is the first work to investigate the ensemble approach in the context of HE-based privacy-preserving inference with DNN to solve the accuracy problem caused by replacing activation function with polynomials. Most of the previous efforts were focused on choosing a better single polynomial to increase the accuracy. In contrast, our work focuses on improving the low accuracy classification model by combining weak models considering the requirements for encrypted data. Figure 2 illustrates an ensemble approach to increase the accuracy while using polynomials in the hidden layers in the convention network. Figure 1 : An ensemble approach to increase the accuracy while using polynomials in the hidden layers in the convolution network. In summary, our contributions to this paper include: • We have investigated the impact of using the sequential ensemble technique on the accuracy, and the result indicates significant improvement, reaching the same ReLU performance while using polynomials. • We have studied the impact of the parallel ensemble technique, especially bagging, and our result shows an improvement in the variance result without increasing bias. • We have demonstrated the effectiveness and robustness of our method using three datasets; MNIST (LeCun, 1998) , FMNIST (Xiao et al., 2017), and CIFAR-10 (Krizhevsky et al., 2014) . The rest of the paper is organized as follows: Section 2 reviews related works and summarize the proposed approach's advantages. Section 3 discusses the background. Section 4 presents the details of the proposed method, and Section 5 demonstrates the effectiveness of the proposed approach with experimental results. Finally, we conclusion the paper in Section 6.

2. RELATED WORKS

Replacing the ReLU activation function with polynomial: To study the impact of replacing the ReLU activation function with the polynomial activation function in neural networks, several works

