POWERQUANT: AUTOMORPHISM SEARCH FOR NON-UNIFORM QUANTIZATION

Abstract

Deep neural networks (DNNs) are nowadays ubiquitous in many domains such as computer vision. However, due to their high latency, the deployment of DNNs hinges on the development of compression techniques such as quantization which consists in lowering the number of bits used to encode the weights and activations. Growing concerns for privacy and security have motivated the development of data-free techniques, at the expanse of accuracy. In this paper, we identity the uniformity of the quantization operator as a limitation of existing approaches, and propose a data-free non-uniform method. More specifically, we argue that to be readily usable without dedicated hardware and implementation, non-uniform quantization shall not change the nature of the mathematical operations performed by the DNN. This leads to search among the continuous automorphisms of (R * + , ×), which boils down to the power functions defined by their exponent. To find this parameter, we propose to optimize the reconstruction error of each layer: in particular, we show that this procedure is locally convex and admits a unique solution. At inference time, we show that our approach, dubbed PowerQuant, only require simple modifications in the quantized DNN activation functions. As such, with only negligible overhead, it significantly outperforms existing methods in a variety of configurations.

1. INTRODUCTION

Deep neural networks (DNNs) tremendously improved algorithmic solutions for a wide range of tasks. In particular, in computer vision, these achievements come at a consequent price, as DNNs deployment bares a great energetic price. Consequently, the generalization of their usage hinges on the development of compression strategies. Quantization is one of the most promising such technique, that consists in reducing the number of bits needed to encode the DNN weights and/or activations, thus limiting the cost of data processing on a computing device. Existing DNN quantization techniques, for computer vision tasks, are numerous and can be distinguished by their constraints. One such constraint is data usage, as introduced in Nagel et al. ( 2019 2019), exploit heuristics and weight properties in order to perform the most efficient weight quantization without having access to the training data. As compared to data-driven methods, the aforementioned techniques are more convenient to use but usually come with higher accuracy loss at equivalent compression rates. Data-driven methods performance offer an upper bound on what can be expected from data-free approaches and in this work, we aim at further narrowing the gap between these methods. To achieve this goal, we propose to leverage a second aspect of quantization: uniformity. For simplicity reasons, most quantization techniques such as Nagel et al. (2019); Zhao et al. (2019); Cong et al. (2022) perform uniform quantization, i.e. they consist in mapping floating point values to an evenly spread, discrete space. However, non-uniform quantization can theoretically provide a closer fit to the network weight distributions, thus better preserving the network accuracy. Previous work on non-uniform quantization either focused on the search of binary codes (Banner et al., 2019;  



), and is based upon the importance of data privacy and security concerns. Data-free approaches such as Banner et al. (2019); Cai et al. (2020); Choukroun et al. (2019); Fang et al. (2020); Garg et al. (2021); Zhao et al. (2019); Nagel et al. (

