OTOV2: AUTOMATIC, GENERIC, USER-FRIENDLY

Abstract

The existing model compression methods via structured pruning typically require complicated multi-stage procedures. Each individual stage necessitates numerous engineering efforts and domain-knowledge from the end-users which prevent their wider applications onto broader scenarios. We propose the second generation of Only-Train-Once (OTOv2), which first automatically trains and compresses a general DNN only once from scratch to produce a more compact model with competitive performance without fine-tuning. OTOv2 is automatic and pluggable into various deep learning applications, and requires almost minimal engineering efforts from the users. Methodologically, OTOv2 proposes two major improvements: (i) Autonomy: automatically exploits the dependency of general DNNs, partitions the trainable variables into Zero-Invariant Groups (ZIGs), and constructs the compressed model; and (ii) Dual Half-Space Projected Gradient (DHSPG): a novel optimizer to more reliably solve structured-sparsity problems. Numerically, we demonstrate the generality and autonomy of OTOv2 on a variety of model architectures such as VGG, ResNet, CARN, ConvNeXt, DenseNet and StackedUnets, the majority of which cannot be handled by other methods without extensive handcrafting efforts. Together with benchmark datasets including CIFAR10/100, DIV2K, Fashion-MNIST, SVNH and ImageNet, its effectiveness is validated by performing competitively or even better than the state-of-the-arts. The source code is available at https://github.com/tianyic/only_train_once.

1. INTRODUCTION

Large-scale Deep Neural Networks (DNNs) have demonstrated successful in a variety of applications (He et al., 2016) . However, how to deploy such heavy DNNs onto resource-constrained environments is facing severe challenges. Consequently, in both academy and industry, compressing full DNNs into slimmer ones with negligible performance regression becomes popular. Although this area has been explored in the past decade, it is still far away from being fully solved. Weight pruning is perhaps the most popular compression method because of its generality and ability in achieving significant reduction of FLOPs and model size by identifying and removing redundant structures (Gale et al., 2019; Han et al., 2015; Lin et al., 2019) . However, most existing pruning methods typically proceed a complicated multi-stage procedure as shown in Figure 1 , which has apparent limitations: (i) Hand-Craft and User-Hardness: requires significant engineering efforts and expertise from users to apply the methods onto their own scenarios; (ii) Expensiveness: conducts DNN training multiple times including the foremost pre-training, the intermediate training for identifying redundancy and the afterwards fine-tuning; and (iii) Low-Generality: many methods are designed for specific architectures and tasks and need additional efforts to be extended to others. To address those drawbacks, we naturally need a DNN training and pruning method to achieve the Goal. Given a general DNN, automatically train it only once to achieve both high performance and slimmer model architecture simultaneously without pre-training and fine-tuning. To realize, the following key problems need to be resolved systematically. (i) What are the removal structures (see Section 3.1 for a formal definition) of DNNs? (ii) How to identify the redundant

funding

† Partially supported by NSF grant CCF-2240708. 1

