SIMPLE AUGMENTATION GOES A LONG WAY: ADRL FOR DNN QUANTIZATION

Abstract

Mixed precision quantization improves DNN performance by assigning different layers with different bit-width values. Searching for the optimal bit-width for each layer, however, remains a challenge. Deep Reinforcement Learning (DRL) shows some recent promise. It however suffers instability due to function approximation errors, causing large variances in the early training stages, slow convergence, and suboptimal policies in the mixed precision quantization problem. This paper proposes augmented DRL (ADRL) as a way to alleviate these issues. This new strategy augments the neural networks in DRL with a complementary scheme to boost the performance of learning. The paper examines the effectiveness of ADRL both analytically and empirically, showing that it can produce more accurate quantized models than the state of the art DRL-based quantization while improving the learning speed by 4.5-64×.

1. INTRODUCTION

By reducing the number of bits needed to represent a model parameter of Deep Neural Networks (DNN), quantization (Lin et al., 2016; Park et al., 2017; Han et al., 2015; Zhou et al., 2018; Zhu et al., 2016; Hwang & Sung, 2014; Wu et al., 2016; Zhang et al., 2018; Köster et al., 2017; Ullrich et al., 2017; Hou & Kwok, 2018; Jacob et al., 2018) is an important way to reduce the size and improve the energy efficiency and speed of DNN. Mixed precision quantization selects a proper bit-width for each layer of a DNN, offering more flexibility than fixed precision quantization. A major challenge to mixed precision quantization (Micikevicius et al., 2017; Cheng et al., 2018) is the configuration search problem, that is, how to find the appropriate bit-width for each DNN layer efficiently. The search space grows exponentially as the number of layers increases, and assessing each candidate configuration requires a long time of training and evaluation of the DNN. Research efforts have been drawn to mitigate the issue for help better tap into the power of mixed precision quantization. Prior methods mainly fall into two categories: (i) automatic methods, such as reinforcement learning (RL) (Lou et al., 2019; Gong et al., 2019; Wang et al., 2018; Yazdanbakhsh et al., 2018; Cai et al., 2020) and neural architecture search (NAS) (Wu et al., 2018; Li et al., 2020) , to learn from feedback signals and automatically determine the quantization configurations; (ii) heuristic methods to reduce the search space under the guidance of metrics such as weight loss or Hessian spectrum (Dong et al., 2019; Wu et al., 2018; Zhou et al., 2018; Park et al., 2017) of each layer. Comparing to the heuristic method, automatic methods, especially Deep Reinforcement Learning (DRL), require little human effort and give the state-of-the-art performance (e.g., via actor-critic set-

