EXPLORING THE POTENTIAL OF LOW-BIT TRAINING OF CONVOLUTIONAL NEURAL NETWORKS Anonymous

Abstract

In this paper, we propose a low-bit training framework for convolutional neural networks. Our framework focuses on reducing the energy and time consumption of convolution kernels, by quantizing all the convolutional operands (activation, weight, and error) to low bit-width. Specifically, we propose a multi-level scaling (MLS) tensor format, in which the element-wise bit-width can be largely reduced to simplify floating-point computations to nearly fixed-point. Then, we describe the dynamic quantization and the low-bit tensor convolution arithmetic to efficiently leverage the MLS tensor format. Experiments show that our framework achieves a superior trade-off between the accuracy and the bit-width than previous methods. When training ResNet-20 on CIFAR-10, all convolution operands can be quantized to 1-bit mantissa and 2-bit exponent, while retaining the same accuracy as the full-precision training. When training ResNet-18 on ImageNet, with 4-bit mantissa and 2-bit exponent, our framework can achieve an accuracy loss of less than 1%. Energy consumption analysis shows that our design can achieve over 6.8× higher energy efficiency than training with floating-point arithmetic.

1. INTRODUCTION

Convolutional neural networks (CNNs) have achieved state-of-the-art performance in many computer vision tasks, such as image classification (Krizhevsky et al., 2012) and object detection (Redmon et al., 2016; Liu et al., 2016) . However, deep CNNs are both computation and storage-intensive. The training process could consume up to hundreds of ExaFLOPs of computations and tens of GBytes of storage (Simonyan & Zisserman, 2014) , thus posing a tremendous challenge for training in resource-constrained environments. At present, the most common training method is to use GPUs, but it consumes much energy. The power of a running GPU is about 250W, and it usually takes more than 10 GPU-days to train one CNN model on ImageNet (Deng et al., 2009) . It makes AI applications expensive and not environment-friendly. Table 1 : The number of different operations in the training process (batch size = 1). Abbreviations: "EW-Add": element-wise addition, ; "F": forward pass; "B": backward pass. Op Name Op Type ResNet18 (ImageNet) ResNet20 (CIFAR-10) Conv (F) Mul&Add 2.72E+10 4.05E+07 Conv (B) Mul&Add 5.44E+10 8.11E+07 BN (F) Mul&Add 3.01E+07 1.88E+05 BN (B) Mul&Add 3.01E+07 1.88E+05 EW-Add (F) Add 1.49E+07 7.37E+04 EW-Add (B) Add 1.20E+07 7.37E+04 Weight Update (B) Add 1.12E+07 2.68E+05 Reducing the precision of NNs has drawn great attention since it can reduce both the storage and computational complexity. It is pointed out that the power consumption and circuit area of fixed-point multiplication and addition units are greatly reduced compared with floating-point ones (Horowitz, 2014) . Many studies (Jacob et al., 2017a; Dong et al., 2019; Banner et al., 2018b) focus on amending the training process to acquire a reduced-precision model with higher inference efficiency.

