PRUNING WITH OUTPUT ERROR MINIMIZATION FOR PRODUCING EFFICIENT NEURAL NETWORKS Anonymous authors Paper under double-blind review

Abstract

Deep Neural Networks (DNNs) are dominant in the field of machine learning. However, because DNN models have large computational complexity, implementation with resource-limited equipment is challenging. Therefore, techniques of compressing DNN models without degrading their accuracy is desired. Pruning is one such technique to remove redundant neurons (or channels). In this paper, we present Pruning with Output Error Minimization (POEM), a method that performs not only pruning but also reconstruction to compensate the error caused by pruning. The strength of POEM lies in its reconstruction to minimize the output error of the activation function, while the previous methods minimize the error of the value before applying the activation function. The experiments with well-known DNN models MobileNet) and image recognition datasets (ImageNet, CUB-200-2011) were conducted. The results show that POEM significantly outperformed the previous methods in maintaining the accuracy of the compressed models.

1. INTRODUCTION

Nowadays, Deep Neural Networks (DNNs) are dominant in the field of machine learning. The demand for DNNs is increasing in various applications. However, DNNs are known to be overparameterized and require large computational cost. This makes them computationally slow, powerconsuming, and difficult to be implemented in resource-limited equipment. Therefore, there is a need for the techniques to create efficient DNN models by compressing large models while maintaining the accuracy. Pruning is one such technique to remove redundant weights from trained DNN models. Pruning methods can be divided into two groups: unstructured pruning and structured pruning. The former removes weight parameters in order to make the weight tensor sparse. Since the shape of the weight tensor remains the same, the compressed model should be implemented using hardware and libraries that can perform calculations only on non-zero weights. The latter removes neurons (or channels) in order to make the shape of the weight tensor smaller. Therefore, the effect of compression can be achieved by implementing the compressed model using general hardware and libraries. In this paper, we focus on structured pruning. How well a pruned model maintains its accuracy depends on two factors. The first is compression ratio optimization, in other words, how many neurons are reduced in each layer. The other is layerwise optimization, in other words, which neurons to be preserved in each layer. In recent years, there is a growing awareness that the value of pruning lies in the search for an efficient sub-architecture out of a large redundant architecture. This is due to the research results showing that a DNN model with the pruned architecture trained from scratch can achieve at least as good accuracy as the pruned and fine-tuned model (Liu et al., 2019) . For this reason, the recent trend is to focus on compression ratio optimization problem. However, does it mean that the layer-wise optimization is no more important? It is reasonable to claim that combining a compression ratio optimization method with a better layer-wise optimization method should result in more effective pruning. Therefore, layer-wise optimization is still important and worth investigating. In this paper, we propose a pruning method named Pruning with Output Error Minimization (POEM) that performs layer-wise optimization. The strength of POEM lies in its reconstruction using the

