STRUCTURED PRUNING OF CNNS AT INITIALIZATION

Abstract

Pruning-at-initialization (PAI) methods can prune the individual weights of a convolutional neural network (CNN) before training, thus avoiding expensive fine-tuning or retraining of the pruned model. While PAI shows promising results in reducing model size, the pruned model still requires unstructured sparse matrix computation, making it difficult to achieve a real speedup. In this work, we show both theoretically and empirically that the accuracy of CNN models pruned by a PAI method depends on the layer-wise density (i.e., the fraction of the remaining parameters in each layer), irrespective of the granularity of pruning. We formulate PAI as a convex optimization problem based on an expectation-based proxy for model accuracy, which can produce the optimal allocation of the layer-wise densities with respect to the proxy model. Using our formulation, we further propose a structured and hardware-friendly PAI method, named PreCrop, to prune or reconfigure CNNs in the channel dimension. Our empirical results show that PreCrop achieves a higher accuracy than existing PAI methods on several popular CNN architectures, including ResNet, MobileNetV2, and EfficientNet, on both CIFAR-10 and Ima-geNet. Notably, PreCrop achieves an accuracy improvement of up to 2.7% over a state-of-the-art PAI algorithm when pruning MobileNetV2 on ImageNet. PreCrop also improves the accuracy of EfficientNetB0 by 0.3% on ImageNet with only 80% of the parameters and the same FLOPs.

1. INTRODUCTION

Convolutional neural networks (CNNs) have achieved state-of-the-art accuracy in a wide range of machine learning (ML) applications. However, the massive computational and memory requirements of CNNs remain a major barrier to more widespread deployment on resource-limited edge and mobile devices. This challenge has motivated a large and active body of research on CNN compression, which attempts to simplify the original model without significantly compromising the accuracy. Weight pruning [15, 7, 17, 4, 8] has been extensively explored to reduce the computational and memory demands of CNNs. Existing methods create a sparse CNN model by iteratively removing ineffective weights/activations and training the resulting sparse model. Such an iterative pruning approach usually enjoys the least accuracy degradation but at the cost of a more computationally expensive training procedure. Moreover, training-based pruning methods introduce additional hyperparameters, such as the learning rate for fine-tuning and the number of epochs before rewinding [20] , which make the pruning process even more complicated and less reproducible. To minimize the cost of pruning, a new line of research proposes pruning-at-initialization (PAI) [16, 27, 24] , which identifies and removes unimportant weights in a CNN before training. Similar to training-based pruning, PAI assigns an importance score to each individual weight and retains only a subset of them by maximizing the sum of the importance scores of all remaining weights. The compressed model is then trained using the same hyperparameters (e.g., learning rate and the number of epochs) as the baseline model. Thus, the pruning and training of CNNs are cleanly decoupled, greatly reducing the complexity of obtaining a pruned model. Currently, SynFlow [24] is considered the state-of-the-art PAI technique -it eliminates the need for data during pruning as required in prior arts [16, 27] and achieves a higher accuracy with the same compression ratio. However, existing PAI methods mostly focus on fine-grained weight pruning, which removes individual weights from the CNN model without preserving any structures. As a result, both inference and training of the pruned model require sparse matrix computation, which is challenging to accelerate

