MODEL COMPRESSION VIA HYPER-STRUCTURE NET-WORK

Abstract

In this paper, we propose a novel channel pruning method to solve the problem of compression and acceleration of Convolutional Neural Networks (CNNs). Previous channel pruning methods usually ignore the relationships between channels and layers. Many of them parameterize each channel independently by using gates or similar concepts. To fill this gap, a hyper-structure network is proposed to generate the architecture of the main network. Like the existing hypernet, our hyperstructure network can be optimized by regular backpropagation. Moreover, we use a regularization term to specify the computational resource of the compact network. Usually, FLOPs is used as the criterion of computational resource. However, if FLOPs is used in the regularization, it may over penalize early layers. To address this issue, we further introduce learnable layer-wise scaling factors to balance the gradients from different terms, and they can be optimized by hyper-gradient descent. Extensive experimental results on CIFAR-10 and ImageNet show that our method is competitive with state-of-the-art methods.

1. INTRODUCTION

Convolutional Neural Networks (CNNs) have accomplished great success in many machine learning and computer vision tasks (Krizhevsky et al., 2012; Redmon et al., 2016; Ren et al., 2015; Simonyan & Zisserman, 2014a; Bojarski et al., 2016) . To deal with real world applications, recently, the design of CNNs becomes more and more complicated in terms of width, depth, etc. (Krizhevsky et al., 2012; Simonyan & Zisserman, 2014b; He et al., 2016; Huang et al., 2017) . Although these complex CNNs can attain better performance on benchmark tasks, their computational and storage costs increase dramatically. As a result, a typical application based on CNNs can easily exhaust an embedded or mobile device due to its enormous costs. Given such costs, the application can hardly be deployed on resource-limited platforms. To tackle these problems, many methods (Han et al., 2015b; a) have been devoted to compressing the original large CNNs into compact models. Among these methods, weight pruning and structural pruning are two popular directions. Unlike weight pruning or sparsification, structural pruning, especially channel pruning, is an effective way to truncate the computational cost of a model because it does not require any post-processing steps to achieve actual acceleration and compression. Many existing works (Liu et al., 2017; Ye et al., 2018; Huang & Wang, 2018; Kim et al., 2020; You et al., 2019) try to solve the problem of structure pruning by applying gates or similar concepts on channels of a layer. Although these ideas have achieved many successes in channel pruning, there are some potential problems. Usually, each gate has its own parameter, but parameters from different gates do not have dependence. As a result, they can hardly learn inter-channel or inter-layer relationships. Due to the same reason, the slimmed models from these methods could overlook the information between different channels and layers, potentially bringing sub-optimal model compression results. To address these challenges, we propose a novel channel pruning method inspired by hypernet (Ha et al., 2016) . In hypernet, they propose to use a hyper network to generate the weights for another network, while the hypernet can be optimized through backpropagation. We extend a hypernet to a hyper-structure network to generate an architecture vector for a CNN instead of weights. Each architecture vector corresponds to a sub-network from the main (original) network. By doing so, the inter-channel and inter-layer relationships can be captured by our hyper-structure network.

