SEKRON: A DECOMPOSITION METHOD SUPPORTING MANY FACTORIZATION STRUCTURES Anonymous

Abstract

While convolutional neural networks (CNNs) have become the de facto standard for most image processing and computer vision applications, their deployment on edge devices remains challenging. Tensor decomposition methods provide a means of compressing CNNs to meet the wide range of device constraints by imposing certain factorization structures on their convolution tensors. However, being limited to the small set of factorization structures presented by state-of-the-art decomposition approaches can lead to sub-optimal performance. We propose SeKron, a novel tensor decomposition method that offers a wide variety of factorization structures, using sequences of Kronecker products. The flexibility of SeKron leads to many compression rates and also allows it to cover commonly used factorizations such as Tensor-Train (TT), Tensor-Ring (TR), Canonical Polyadic (CP) and Tucker. Crucially, we derive an efficient convolution projection algorithm shared by all SeKron structures, leading to seamless compression of CNN models. We validate our approach for model compression on both high-level and low-level computer vision tasks and find that it outperforms state-of-the-art decomposition methods.

1. INTRODUCTION

Deep learning models have introduced new state-of-the-art solutions to both high-level computer vision problems (He et al. 2016; Ren et al. 2015) , and low-level image processing tasks (Wang et al. 2018b; Schuler et al. 2015; Kokkinos & Lefkimmiatis 2018) through convolutional neural networks (CNNs). Such models are obtained at the expense of millions of training parameters that come along deep CNNs making them computationally intensive. As a result, many of these models are of limited use as they are challenging to deploy on resource-constrained edge devices. Compared with neural networks for high-level computer vision tasks (e.g., ResNet-50 (He et al. 2016 )), models for low-level imaging problems such as single image super-resolution have much a higher computational complexity due to the larger feature map sizes. Moreover, they are typically infeasible to run on cloud computing servers. Thus, their deployment on edge devices is even more critical. rely on finding low-rank approximations of tensors under some imposed factorization structure as illustrated in Figure 1a . In practice, some structures are more suitable than others when decomposing tensors. Choosing from a limited set of factorization structures can lead to sub-optimal compressions as well as lengthy runtimes depending on the hardware. This limitation can be alleviated by reshaping tensors prior to their compression to improve performance as shown in (Garipov et al. 2016 ). However, this approach requires time-consuming development of customized convolution algorithms.



In recent years an increasing trend has begun in reducing the size of state-of-the-art CNN backbones through efficient architecture designs such as Xception (Chollet 2017), MobileNet (Howard et al. 2019), ShuffleNet (Zhang et al. 2018c), and EfficientNet (Tan & Le 2019), to name a few. On the other hand, there have been studies demonstrating significant redundancy in the parameters of large CNN models, implying that in theory the number of model parameters can be reduced while maintaining performance (Denil et al. 2013). These studies provide the basis for the development of many model compression techniques such as pruning (He et al. 2020), quantization (Hubara et al. 2017), knowledge distillation (Hinton et al. 2015), and tensor decomposition (Phan et al. 2020). Tensor decomposition methods such as Tucker (Kim et al. 2016), Canonical Polyadic (CP) (Lebedev et al. 2015), Tensor-Train (TT) (Novikov et al. 2015) and Tensor-Ring (TR) (Wang et al. 2018a)

