AUTOFHE: AUTOMATED ADAPTION OF CNNS FOR EF-FICIENT EVALUATION OVER FHE

Abstract

Secure inference of deep convolutional neural networks (CNNs) was recently demonstrated under the fully homomorphic encryption (FHE) scheme, specifically the Full Residue Number system variant of Cheon-Kim-Kim-Song (RNS-CKKS). The state-of-the-art solution uses a high-order composite polynomial to approximate non-arithmetic ReLUs and refreshes zero-level ciphertext through bootstrapping. However, this solution suffers from prohibitively high latency, both due to the number of levels consumed by the polynomials (47%) and the inference time consumed by bootstrapping operations (70%). Furthermore, it requires a hand-crafted architecture for homomorphically evaluating CNNs by placing a bootstrapping operation after every Conv-BN layer. To accelerate CNNs on FHE and automatically design a homomorphic evaluation architecture, we propose AutoFHE: Automated adaption of CNNs for evaluation over FHE. AutoFHE exploits the varying sensitivity of approximate activations across different layers in a network and jointly evolves polynomial activations (EvoReLUs) and searches for placement of bootstrapping operations for evaluation under RNS-CKKS. The salient features of AutoFHE include: i) a multi-objective coevolutionary (MOCoEv) search algorithm to maximize validation accuracy and minimize the number of bootstrapping operations, ii) a gradient-free search algorithm, R-CCDE, to optimize EvoReLU coefficients, and iii) polynomial-aware training (PAT) to fine-tune polynomial-only CNNs for a few epochs to adapt trainable weights to EvoReLUs. We demonstrate the efficacy of AutoFHE through the evaluation of ResNets on encrypted CIFAR-10 and CIFAR-100 under RNS-CKKS. Experimental results on CIFAR-10 indicate that in comparison to the state-of-the-art solution, AutoFHE can reduce inference time (50 images on 50 threads) by up to 3,297 seconds (43%) while preserving the accuracy (92.68%). AutoFHE also improves the accuracy of ResNet-32 on CIFAR-10 by 0.48% while accelerating inference by 382 seconds (7%).

1. INTRODUCTION

Fully homomorphic encryption (FHE) is a promising solution for secure inference of neural networks (Gilad-Bachrach et al., 2016; Brutzkus et al., 2019; Lou & Jiang, 2021; Lee et al., 2022b; a) . However, Homomorphically evaluating CNNs on encrypted data is challenging in two respects: 1) the design of homomorphic evaluation architecture of deep CNNs with arbitrary depth and 2) non-arithmetic operations like ReLU. Recently, FHE-MP-CNN (Lee et al., 2022a) successfully implemented a homomorphic evaluation architecture of ResNets by using bootstrapping (Cheon et al., 2018a; Bossuat et al., 2021) to refresh zero-level ciphertext under the full residue number system (RNS) variant of Cheon-Kim-Kim-Song (RNS-CKKS) scheme (Cheon et al., 2017; 2018b) . However, since FHE supports only homomorphic multiplication and addition, non-arithmetic operations are approximated by polynomials (Gilad-Bachrach et al., 2016; Chou et al., 2018; Brutzkus et al., 2019; Lee et al., 2021a; c; 2022a) . For example, FHE-MP-CNN adopts a high-precision Minimax composite polynomial (Lee et al., 2021a; c) with degree {15, 15, 27} to approximate ReLUs (AppReLU). A more comprehensive discussion of related work is in Appendix B. FHE-MP-CNN, the state-of-the-art approach, is limited by three main design choices. First, highprecision approximations like AppReLU only consider function-level approximation and neglect the potential for end-to-end optimization of the entire network response. As such, the same high-precision AppReLU is used to replace all the network's ReLU layers, which necessitates the evaluation of very deep circuits. Secondly, due to the high number of levels required for each AppReLU, ciphertexts encrypted with leveled HE schemes like CKKS quickly exhaust their levels. Therefore, a bootstrapping operation is necessary for each AppReLU to refresh the level of zero-level ciphertexts. While these design choices are collectively very effective at maintaining the performance of the plaintext networks under FHE, they require many multiplicative levels and, consequently, numerous bootstrapping operations. Thirdly, due to the constraints imposed by the cryptographic scheme (RNS-CKKS in this case), inference of networks in FHE requires the codesign of AppReLU and the homomorphic evaluation architecture. These include the careful design of AppReLU (number of composite polynomials and their degrees), cryptographic parameters, placement of bootstrapping operations, and choice of network architectures to evaluate. We illustrate the limitations of FHE-MP-CNN's design choices through a case study (Figure 2 ) of ResNet-20 on CIFAR-10. We consider two plausible solutions to trade-off accuracy and computational burden of FHE-MP-CNN. (i) Same Precision AppReLU: We replace all ReLU layers with AppReLU of a given precision. We can trade-off (purple line in the left panel) accuracy and depth consumption using AppReLU with different precision. However, as the middle panel shows, these solutions (purple dots) do not necessarily translate to a trade-off between accuracy and the number of bootstrapping operations due to many wasted levels. All the trade-off solutions collapse to either 15 or 30 bootstrapping operations. (ii) Mixed-Precision AppReLU: Each ReLU layer in the network can be replaced by AppReLU of any precision. We randomly sample 5,000 combinations of mixed-precision layerwise AppReLUs and show (red dots) their depth consumption and the number of bootstrapping operations in the left and middle panels, respectively. Observe that layerwise mixed-precision AppReLU leads to a better trade-off between accuracy and the number of bootstrapping operations. However, FHE-MP-CNN neglects the layerwise sensitivity (range) of ReLU pre-activations (the right panel shows the distribution of the layerwise maximum absolute value of pre-activation) and uses AppReLU which is optimized for a ReLU with a large pre-activation range. Therefore, the Pareto front of mixed-precision layerwise AppReLU optimized by a multi-objective search algorithm NSGA-II Deb et al. (2002) is still inferior to AutoFHE, our proposed solution, by a significant margin. In summary, while both the solutions we considered were able to reduce the number of bootstrapping operations, unlike AutoFHE, it also lead to severe loss in performance. In this paper, we relax the design choices of FHE-MP-CNN and accelerate the inference of CNNs over homomorphically encrypted data while maximizing performance. The main premise behind our approach is to directly optimize the end-to-end function represented by the network instead of optimizing the function represented by the activation function. This idea allows us to exploit the varying sensitivity of activation function approximation across different layers in a network. Therefore, theoretically, evolving layerwise polynomial approximations of ReLUs (EvoReLU) should reduce the total multiplicative depth required by the resulting polynomial-only networks, and thus the number of time-consuming bootstrapping operations and the inference time on encrypted data. To



Figure 1: Pareto fronts of AutoFHE versus FHE-MP-CNN on encrypted CIFAR-10 under the RNS-CKKS FHE scheme.

Figure 2: Motivating AutoFHE. Left: depth consumption of AppReLUs based on ResNet-20 backbone on CIFAR-10. The purple line is when the same precision AppReLU is used in all layers, while the red circles show 5000 randomly-sampled combinations of mixed-precision layerwise AppReLUs. Middle: the number of bootstrapping operations where we show trade-offs of the same AppReLU and mixed AppReLUs as in the left panel. We also show a multi-objective search result using mixed-precision layerwise AppReLUs and the Pareto front of the proposed AutoFHE. Right: distributions of pre-activations (the maximum absolute values) of ResNets on CIFAR-10 where the green line corresponds to B, the scale value of AppReLU in FHE-MP-CNN.

