SUCCINCT NETWORK CHANNEL AND SPATIAL PRUN-ING VIA DISCRETE VARIABLE QCQP

Abstract

Reducing the heavy computational cost of large convolutional neural networks is crucial when deploying the networks to resource-constrained environments. In this context, recent works propose channel pruning via greedy channel selection to achieve practical acceleration and memory footprint reduction. We first show this channel-wise approach ignores the inherent quadratic coupling between channels in the neighboring layers and cannot safely remove inactive weights during the pruning procedure. Furthermore, we show that these pruning methods cannot guarantee the given resource constraints are satisfied and cause discrepancy with the true objective. To this end, we formulate a principled optimization framework with discrete variable QCQP, which provably prevents any inactive weights and enables the exact guarantee of meeting the resource constraints in terms of FLOPs and memory. Also, we extend the pruning granularity beyond channels and jointly prune individual 2D convolution filters spatially for greater efficiency. Our experiments show competitive pruning results under the target resource constraints on CIFAR-10 and ImageNet datasets on various network architectures.

1. INTRODUCTION

Deep neural networks are the bedrock of artificial intelligence tasks such as object detection, speech recognition, and natural language processing (Redmon & Farhadi, 2018; Chorowski et al., 2015; Devlin et al., 2019) . While modern networks have hundreds of millions to billions of parameters to train, it has been recently shown that these parameters are highly redundant and can be pruned without significant loss in accuracy (Han et al., 2015; Guo et al., 2016) . This discovery has led practitioners to desire training and running the models on resource-constrained mobile devices, provoking a large body of research on network pruning. Unstructured pruning, however, does not directly lead to any practical acceleration or memory footprint reduction due to poor data locality (Wen et al., 2016) , and this motivated research on structured pruning to achieve practical usage under limited resource budgets. To this end, a line of research on channel pruning considers completely pruning the convolution filters along the input and output channel dimensions, where the resulting pruned model becomes a smaller dense network suited for practical acceleration and memory footprint reduction (Li et al., 2017; Luo et al., 2017; He et al., 2019; Wen et al., 2016; He et al., 2018a) . However, existing channel pruning methods perform the pruning operations with a greedy approach and does not consider the inherent quadratic coupling between channels in the neighboring layers. Although these methods are easy to model and optimize, they cannot safely remove inactive weights during the pruning procedure, suffer from discrepancies with the true objective, and prohibit the strict satisfaction of the required resource constraints during the pruning process. The ability to specify hard target resource constraints into the pruning optimization process is important since this allows the user to run the pruning and optional finetuning process only once. When the pruning process ignores the target specifications, the users may need to apply multiple rounds of pruning and finetuning until the specifications are eventually met, resulting in an extra computation overhead (Han et al., 2015; He et al., 2018a; Liu et al., 2017) . In this paper, we formulate a principled optimization problem that prunes the network layer channels while respecting the quadratic coupling and exactly satisfying the user-specified FLOPs and 1

