CNN COMPRESSION AND SEARCH USING SET TRANS-FORMATIONS WITH WIDTH MODIFIERS ON NETWORK ARCHITECTURES

Abstract

We propose a new approach, based on discrete filter pruning, to adapt off-the-shelf models into an embedded environment. Importantly, we circumvent the usually prohibitive costs of model compression. Our method, Structured Coarse Block Pruning (SCBP), prunes whole CNN kernels using width modifiers applied to a novel transformation of convlayers into superblocks. SCBP uses set representations to construct a rudimentary search to provide candidate networks. To test our approach, the original ResNet architectures serve as the baseline and also provide the 'seeds' for our candidate search. The search produces a configurable number of compressed (derived) models. These derived models are often 20% faster and 50% smaller than their unmodified counterparts. At the expense of accuracy, the size can become even smaller and the inference latency lowered even further. The unique SCBP transformations yield many new model variants, each with their own trade-offs, and does not require GPU clusters or expert humans for training or design.

1. INTRODUCTION

Modern Computer Vision (CV) is dominated by the convolution operation introduced by Fukushima & Miyake (1982) and later advanced into a Convolutional Neural Network (CNN or convnet) by LeCun et al. (1989) . Until recently, these convnets were limited to rudimentary CV tasks such as classifying handwritten digits LeCun et al. (1998) However effective, convnets are held back by their high resource consumption. Utilizing an effective convnet on the edge presents new challenges in latency, energy, and memory costs Chen & Ran (2019). Additionally, many tasks, such as autonomous robotics, require realtime processing and cannot be offloaded to the cloud. As such. resource constrained platforms, such as embedded systems, lack the compute and memory to use convnets in their default constructions. Analysis into convnets reveals that they are overparameterized Denil et al. (2013) and that reducing this overparameterization can be a key mechanism in compressing convnets Hanson & Pratt (1988) ; LeCun et al. (1990); Han et al. (2015a) . The many weights that form a network are not necessarily of the same entropy and can therefore be seen as scaffolding to be removed during a compression step Hassibi & Stork (1993); Han et al. (2015b); Tessier et al. (2021) . In this work, our objective is to reduce the size of any given convnet using an automated approach requiring little human engineering and compute resources. To that end, we design Structured Coarse Block Pruning (SCBP), a compressing mechanism that requires no iterative retraining or fine-tuning. SCBP uses a low-cost search method, seeded with an off-the-shelf network, to generate compressed models derivatives with unique accuracy, size, and latency trade-offs.



. Present-day convnets have far surpassed other CV approaches by improving their framework to include faster activations Nair & Hinton (2010), stacked convolutional layers (convlayers) Krizhevsky et al. (2012), and better optimizers Kingma & Ba (2014). These multi-layer deep convnets require big data in the form of datasets such as ImageNet Deng et al. (2009) to enable deep learning LeCun et al. (2015) of the feature space.

