POWERQUANT: AUTOMORPHISM SEARCH FOR NON-UNIFORM QUANTIZATION

Abstract

Deep neural networks (DNNs) are nowadays ubiquitous in many domains such as computer vision. However, due to their high latency, the deployment of DNNs hinges on the development of compression techniques such as quantization which consists in lowering the number of bits used to encode the weights and activations. Growing concerns for privacy and security have motivated the development of data-free techniques, at the expanse of accuracy. In this paper, we identity the uniformity of the quantization operator as a limitation of existing approaches, and propose a data-free non-uniform method. More specifically, we argue that to be readily usable without dedicated hardware and implementation, non-uniform quantization shall not change the nature of the mathematical operations performed by the DNN. This leads to search among the continuous automorphisms of (R * + , ×), which boils down to the power functions defined by their exponent. To find this parameter, we propose to optimize the reconstruction error of each layer: in particular, we show that this procedure is locally convex and admits a unique solution. At inference time, we show that our approach, dubbed PowerQuant, only require simple modifications in the quantized DNN activation functions. As such, with only negligible overhead, it significantly outperforms existing methods in a variety of configurations.

1. INTRODUCTION

Deep neural networks (DNNs) tremendously improved algorithmic solutions for a wide range of tasks. In particular, in computer vision, these achievements come at a consequent price, as DNNs deployment bares a great energetic price. Consequently, the generalization of their usage hinges on the development of compression strategies. Quantization is one of the most promising such technique, that consists in reducing the number of bits needed to encode the DNN weights and/or activations, thus limiting the cost of data processing on a computing device. Existing DNN quantization techniques, for computer vision tasks, are numerous and can be distinguished by their constraints. One such constraint is data usage, as introduced in Nagel et al. ( 2019 2019), exploit heuristics and weight properties in order to perform the most efficient weight quantization without having access to the training data. As compared to data-driven methods, the aforementioned techniques are more convenient to use but usually come with higher accuracy loss at equivalent compression rates. Data-driven methods performance offer an upper bound on what can be expected from data-free approaches and in this work, we aim at further narrowing the gap between these methods. To achieve this goal, we propose to leverage a second aspect of quantization: uniformity. Hubara et al., 2016; Jeon et al., 2020; Wu et al., 2016; Zhang et al., 2018) or leverage logarithmic distribution (Miyashita et al., 2016; Zhou et al., 2017) . However, these approaches map floating point multiplications operations to other operations that are hard to leverage on current hardware (e.g. bit-shift) as opposed to uniform quantization which maps floating point multiplications to integer multiplications (Gholami et al., 2021; Zhou et al., 2016) . To circumvent this limitation and reach a tighter fit between the quantized and original weight distributions, in this work, we propose to search for the best possible quantization operator that preserves the nature of the mathematical operations. We show that this search boils down to the space defined by the continuous automorphisms of (R * + , ×), which is limited to power functions defined by their exponent. We optimize the value of this parameter by minimizing the error introduced by quantization. This allows us to reach superior accuracy, as illustrated in Fig 1 . To sum it up, our contributions are: • We search for the best quantization operator that do not change the nature of the mathematical operations performed by the DNN, i.e. the automorphisms of (R * + , ×). We show that this search can be narrowed down to finding the best exponent for power functions. • We find the optimal exponent parameter to more closely fit the original weight distribution compared with existing (e.g. uniform and logarithmic) baselines. To do so, we propose to optimize the quantization reconstruction error. We show that this problem is locally convex and admits a unique solution. • In practice, we show that the proposed approach, dubbed PowerQuant, only requires simple modifications in the quantized DNN activation functions and accumulation. Furthermore, we demonstrate through extensive experimentation that our method achieves outstanding results on various and challenging benchmarks with negligible computational overhead.

2. RELATED WORK

2.1 QUANTIZATION In this section, we provide a background on the current state of DNNs quantization. Notice that while certain approaches are geared towards memory footprint reduction (e.g. without quantizing inputs and activations) (Chen et al., 2015; Gong et al., 2014; Han et al., 2016; Zhou et al., 2017) , in what follows, we essentially focus on methods that aim at reducing the inference time. In particular, motivated by the growing concerns for privacy and security, data-free quantization methods (Banner et al., 2019; Cai et al., 2020; Choukroun et al., 2019; Fang et al., 2020; Garg et al., 2021; Zhao et al., 2019; Nagel et al., 2019; Cong et al., 2022) are emerging and have significantly improved over the recent years. The first breakthrough in data-free quantization (Nagel et al., 2019) was based on two mathematical ingenuities. First, they exploited the mathematical properties of piece-wise affine activation functions (such as e.g. ReLU based DNNs) in order to balance the per-



), and is based upon the importance of data privacy and security concerns. Data-free approaches such as Banner et al. (2019); Cai et al. (2020); Choukroun et al. (2019); Fang et al. (2020); Garg et al. (2021); Zhao et al. (2019); Nagel et al. (

Figure 1: Comparison of the proposed method to other data-free quantization schemes on DenseNet 121 pretrained on ImageNet. The proposed method (right bin in blue) drastically improves upon the existing data-free methods especially in the challenging W4/A4 quantization.

