IMPROVING THE ACCURACY OF NEURAL NETWORKS IN ANALOG COMPUTING-IN-MEMORY SYSTEMS BY A GENERALIZED QUANTIZATION METHOD

Abstract

Crossbar-enabled analog computing-in-memory (CACIM) systems can significantly improve the computation speed and energy efficiency of deep neural networks (DNNs). However, the transition of DNN from the digital systems to CACIM systems usually reduces its accuracy. The major issue is that the weights of DNN are stored and calculated directly on analog quantities in CACIM systems. The variation and programming overhead of the analog weight limit the precision. Therefore, a suitable quantization algorithm is important when deploying a DNN into CACIM systems to obtain less accuracy loss. The analog weight has its unique advantages when doing quantization. Because there is no encoding and decoding process, the set of quanta will not affect the computing process. Therefore, a generalized quantization method that does not constrain the range of quanta and can obtain less quantization error will be effective in CACIM systems. For the first time, we introduced a generalized quantization method into CACIM systems and showed superior performance on a series of computer vision tasks, such as image classification, object detection, and semantic segmentation. Using the generalized quantization method, the DNN with 8-level analog weights can outperform the 32-bit networks. With fewer levels, the generalized quantization method can obtain less accuracy loss than other uniform quantization methods.

1. INTRODUCTION

Deep neural networks (DNNs) have been widely used in a variety of fields, such as computer vision (Krizhevsky et al., 2012; Simonyan & Zisserman, 2015; He et al., 2016 ), speech recognition (Graves et al., 2013; Hinton et al., 2012; Graves & Jaitly, 2014) , natural language processing (Kim, 2014; Yang et al., 2016; Lai et al., 2015) and so on (Mnih et al., 2015; Silver et al., 2016) . However, the high complexity of DNN models makes them hard to be applied on edge devices (mobile phones, onboard computers, smart sensors, wearable devices, etc.), which can only provide limited computing speed and power (Sze et al., 2017) . Crossbar-enabled analog computing-in-memory (CACIM) systems is a promising approach to facilitate the applications of DNN on edge devices (Yang et al., 2013) . It can carry out some typical operations in situ, exactly where the data are located (Ielmini & Wong, 2018) . Such as the multiplyaccumulate operation (MAC), which is the most frequently performed operation in DNNs. The cost of data transferring for doing the operations can be reduced. Both the computation speed and energy efficiency can be improved significantly (Yao et al., 2020) . The footstone of CACIM systems for DNN is the crossbar array of the computational memory units (Hu et al., 2012) . As shown in Figure 1 , taking the memristor device as an example, each weight (W ij ) of the connection in one layer of a neural network is stored as the conductance state (G ij ) of a memristor. The input data are represented as the voltage (V i ). After applying the voltage (V i ) to each row, the current (I j ) collected at each column is exactly the MAC result according to Kirchhoff's law and Ohm's law, I j = i V i G ij . Before applying a DNN in CACIM systems, an essential step is writing the weights of DNN into the memory units, which is usually called as mapping. However, the mapping overhead is directly

