ELRT: EFFICIENT LOW-RANK TRAINING FOR COM-PACT CONVOLUTIONAL NEURAL NETWORKS

Abstract

Low-rank compression, a popular model compression technique that produces compact convolutional neural networks (CNNs) with low rankness, has been well studied in the literature. On the other hand, low-rank training, as an alternative way to train low-rank CNNs from scratch, is little exploited yet. Unlike low-rank compression, low-rank training does not need pre-trained full-rank models and the entire training phase is always performed on the low-rank structure, bringing attractive benefits for practical applications. However, the existing low-rank training solutions are still facing several challenges, such as considerable accuracy drop and/or still needing to update full-size models during the training. In this paper, we perform a systematic investigation on low-rank CNN training. By identifying the proper low-rank format and performance-improving strategy, we propose ELRT, an efficient low-rank training solution for high-accuracy high-compactness low-rank CNN models. Our extensive evaluation results for training various CNNs on different datasets demonstrate the effectiveness of ELRT.

1. INTRODUCTION

Convolutional neural networks (CNNs) have obtained widespread adoption in numerous real-world computer vision applications, such as image classification, video recognition and object detection. However, modern CNN models are typically storage-intensive and computation-intensive, potentially hindering their efficient deployment in many resource-constrained scenarios, especially at the edge and embedded computing platforms. To address this challenge, many prior efforts have been proposed and conducted to produce low-cost compact CNN models. Among them, low-rank compression is a popular model compression solution. By leveraging matrix or tensor decomposition techniques, low-rank compression aims to explore the potential low-rankness exhibited in the fullrank CNN models, enabling simultaneous reductions in both memory footprint and computational cost. To date, numerous low-rank CNN compression solutions have been reported in the literature (Phan et al. ( 2020 Low-rank Training: A Promising Alternative Towards Low-rank CNNs. From the perspective of model production, performing low-rank compression on the full-rank networks is not the only approach to obtaining low-rank CNNs. In principle, we can also adopt low-rank training strategy to directly train a low-rank model from scratch. As illustrated in Fig. 1 , low-rank training starts from a low-rank initialization and keeps the desired low-rank structure in the entire training phase. Compared with low-rank compression that is built on two-stage pipeline ("pre-training-thencompressing"), the single-stage low-rank training enjoys two attractive benefits: relaxed operational requirement and reduced training cost. More specifically, first, the underlying training-fromscratch scheme, by its nature, completely eliminates the need for pre-trained full-rank high-accuracy models, thereby lowering the barrier to obtaining low-rank CNNs. In other words, producing lowrank networks becomes more feasible and accessible. Second, the overall computational cost for the entire low-rank CNN production pipeline is significantly reduced. This is because: 1) the removal of the pre-training phase completely saves the incurred computations that were originally needed for pre-training the full-rank models; and 2) directly training on the compact low-rank CNNs naturally consumes much fewer floating point operations (FLOPs) than full-rank pre-training. Existing Works and Limitations. Despite the above analyzed benefits, low-rank training is currently little exploited in the literature. Unlike the prosperity of studying low-rank compression, to



); Kossaifi et al. (2020); Li et al. (2021b); Liebenwein et al. (2021)).

