HARDWARE-RESTRICTION-AWARE TRAINING (HRAT) FOR MEMRISTOR NEURAL NETWORKS

Abstract

Memristor neural network (MNN), which utilizes memristor crossbars for vectormatrix multiplication, has huge advantages in terms of scalability and energy efficiency for neuromorphic computing. MNN weights are usually trained offline and then deployed as memristor conductances through a sequence of programming voltage pulses. Although weight uncertainties caused by process variation have been addressed in variation-aware training algorithms, efficient design and training of MNNs have not been systematically explored to date. In this work, we propose Hardware-Restriction-Aware Training (HRAT), which takes into account various non-negligible limitations and non-idealities of memristor devices, circuits, and systems. HRAT considers MNN's realistic behavior and circuit restrictions during offline training, thereby bridging the gap between offline training and hardware deployment. HRAT uses a new batch normalization (BN) fusing strategy to align the distortion caused by hardware restrictions between offline training and hardware inference. This not only improves inference accuracy but also eliminates the need for dedicated circuitry for BN operations. Furthermore, most normal scale signals are limited in amplitude due to the restriction of nondestructive threshold voltage of memristors. To avoid input signal distortion of memristor crossbars, HRAT dynamically adjusts the input signal magnitude during training using a learned scale factor. These scale factors can be incorporated into the parameters of linear operation together with fused BN, so no additional signal scaling circuits are required. To evaluate the proposed HRAT methodology, FC-4 and LeNet-5 on MNIST are firstly trained by HRAT and then deployed in hardware. Hardware simulations match well with the offline HRAT results. We also carried out various experiments using VGG-16 on the CIFAR datasets. The study shows that HRAT leads to high-performance MNNs without device calibration or on-chip training, thus greatly facilitating commercial MNN deployment.

1. INTRODUCTION

Memristor neural network (MNN) has emerged as an increasingly feasible option to alleviate the scalability and energy efficiency challenges in neuromorphic computing. While several small-scale MNNs have been prototyped Li et al. (2018) ; Yao et al. (2020) ; Wan et al. (2022) , efficient design and training of MNNs require an in-depth understanding of various restrictions from device, circuit, and system perspectives. These hardware restrictions include weight uncertainty noise caused by memristor variability and a limited number of programming pulse cycles to tune memristor conductance (e.g., 500 in Yao et al. (2015) . The scale and shift operations of BN can be merged into the previous linear operation (e.g., fully connected or convolutional layer) after training. In this way, the hardware complexity and cost of MNNs are alleviated, as BN does not require explicit memristor crossbars in on-chip deployment. We envision that the aforementioned hardware restrictions have a significant impact on BN fusion (also known as BN folding) in MNNs. For example, bias distortion caused by DACs and the limited output swing of operational amplifiers should be considered during BN fusion. Although several BN fusing strategies Jacob et al. ( 2018 This not only improves inference accuracy but also eliminates the need for dedicated circuitry for BN operation. To avoid input signal distortion of memristor crossbars, HRAT dynamically adjusts the signal magnitude during training using a learned scale factor. These scale factors can be incorporated into the parameters of linear operation together with fused BN, so no additional signal scaling circuits are required. • We conduct various experiments on baseline networks (FC-4, LeNet-5, and VGG-16) on datasets (MNIST, CIFAR-10, and CIFAR-100) to demonstrate the performance of HRAT. To evaluate the proposed HRAT methodology, FC-4 and LeNet-5 are firstly trained by HRAT and then deployed in hardware. Hardware simulation results match well with the offline HRAT results, indicating that HRAT can bridge the gap between offline training and hardware deployment. To investigate the effectiveness of HRAT on large-scale networks, we conduct experiments using VGG-16 on the CIFAR datasets. Experimental results demonstrate that HRAT can lead to state-of-the-art MNNs without performing prohibitively expensive and time-consuming on-chip retraining, enabling low-cost highperformance MNNs for large-scale commercialization of neuromorphic systems. 



(2020)), weight quantization noise due to limited states of memristor conductance (e.g., 5-and 4-bit in Yao et al. (2020); Wan et al. (2022)), non-destructive threshold voltage of memristors Jo et al. (2010), limited output swing of operational amplifiers Karki (2021), and bias quantization noise from finite-resolution digital-to-analog converters (DACs). These hardware restrictions collectively reduce the accuracy of MNN inference. Ignoring these hardware restrictions during software offline training may result in poor inference or even functional failure. As a critical step in network training, batch normalization (BN) can accelerate training convergence Ioffe & Szegedy

); Krishnamoorthi (2018); PyTorch (2022); Wan et al. (2022) have been reported for quantization-aware training, unfortunately, dedicated BN fusion strategies for MNN training and hardware deployment has so far received little attention. As a result, it is imperative to develop hardware-restriction-aware BN fusing strategies to align signal distortion caused by hardware restrictions before and after BN fusion in MNNs. In this work, we propose a Hardware-Restriction-Aware Training (HRAT) method, which takes into account various non-negligible restrictions and non-idealities from device, circuit, and system perspectives. HRAT considers realistic behavior and hardware restrictions of MNNs during offline training, thereby bridging the gap between offline training and hardware deployment. The key contributions of this work are summarized as follows: • We model various hardware restrictions of MNNs and integrate them into training, enabling hardware-restriction-aware training (HRAT). HRAT uses a new BN fusing strategy to align the restriction-induced distortion between offline training and hardware inference.

Variation-aware MNN training. Prior work on MNN offline training has predominately focused on variation-aware training. Liu et al. (2015) address memristor conductance variability (i.e., weight uncertainties) by introducing an additional term called "penalty of variation" into the training constraints. Then, an upper bound for the variation penalty is estimated and used during training. This method improves the tolerance of trained weights to memristor variability by applying tighter training constraints. Zhu et al. (2020) propose statistical training to deal with MNN weight uncertainties. Trained weights are modeled as linear functions of random variables, representing memristor variability. Subsequent network computations are modified to propagate the effects of weight uncertainties. The effectiveness of statistical training has only been validated on small MNNs. Gao et al. (2021) propose a variation-aware MNN training framework, which develops an analytical model for weight uncertainties and uses it as a constraint during training. Yang et al. (2021) propose stochasticnoise-aware training to inject stochastic noise during training. Stochastic noise includes memristor programming noise (i.e., memristor conductance variability), thermal noise, shot noise, and random telegraph noise. Mao et al. (2022) propose defect-aware training to account for the effects of memristor conductance variability, relaxation, and failure during training. Büchel et al. (2022) address the issue of memristor conductance variability via adversarial regularization. In order to enhance the MNN robustness to memristor programming noise, weight space is attacked by adding Gaussian noise Murray & Edwards (1994) to parameters during training. Wan et al. (2022) propose noiseresilient training to inject Gaussian noise into MNN weights during the forward pass. In summary, the aforementioned MNN training methods mainly improve noise immunity to memristor conductance (i.e., weight) uncertainties and address weight mismatch between offline training and on-chip deployment. However, other non-negligible MNN hardware restrictions, such as the non-destructive threshold voltage of memristors, have not been incorporated into training.

