LEARNING TO LINEARIZE DEEP NEURAL NETWORKS FOR SECURE AND EFFICIENT PRIVATE INFERENCE

Abstract

The large number of ReLU non-linearity operations in existing deep neural networks makes them ill-suited for latency-efficient private inference (PI). Existing techniques to reduce ReLU operations often involve manual effort and sacrifice significant accuracy. In this paper, we first present a novel measure of non-linearity layers' ReLU sensitivity, enabling mitigation of the time-consuming manual efforts in identifying the same. Based on this sensitivity, we then present SENet, a three-stage training method that for a given ReLU budget, automatically assigns per-layer ReLU counts, decides the ReLU locations for each layer's activation map, and trains a model with significantly fewer ReLUs to potentially yield latency and communication efficient PI. Experimental evaluations with multiple models on various datasets show SENet's superior performance both in terms of reduced ReLUs and improved classification accuracy compared to existing alternatives. In particular, SENet can yield models that require up to ∼2× fewer Re-LUs while yielding similar accuracy. For a similar ReLU budget SENet can yield models with ∼2.32% improved classification accuracy, evaluated on CIFAR-100.

1. INTRODUCTION

With the recent proliferation of several AI-driven client-server applications including image analysis (Litjens et al., 2017) , object detection, speech recognition (Hinton et al., 2012) , and voice assistance services, the demand for machine learning inference as a service (MLaaS) has grown. (Yao, 1986) . However, GCs demand orders of magnitude higher latency and communication than the PI of linear operations, making latency-efficient PI an exceedingly difficult task. In contrast, standard inference latency is dominated by the linear operations (Kundu et al., 2022b) and is significantly lower than that of PI. This has motivated the unique problem of reducing the number of ReLU non-linearity operations to reduce the communication and latency overhead of PI. In particular, recent literature has leveraged * Part of the work was done when the first author was a graduate student at USC. 1



Figure 1: Comparison of various methods in accuracy vs. #ReLU trade-off plot. SENet outperforms the existing approaches with an accuracy improvement of up to ∼4.5% for similar ReLU budget.

