NASOA: TOWARDS FASTER TASK-ORIENTED ONLINE FINE-TUNING

Abstract

Fine-tuning from pre-trained ImageNet models has been a simple, effective, and popular approach for various computer vision tasks. The common practice of finetuning is to adopt a default hyperparameter setting with a fixed pre-trained model, while both of them are not optimized for specific tasks and time constraints. Moreover, in cloud computing or GPU clusters where the tasks arrive sequentially in a stream, faster online fine-tuning is a more desired and realistic strategy for saving money, energy consumption, and CO2 emission. In this paper, we propose a joint Neural Architecture Search and Online Adaption framework named NASOA towards a faster task-oriented fine-tuning upon the request of users. Specifically, NASOA first adopts an offline NAS to identify a group of training-efficient networks to form a pretrained model zoo. We propose a novel joint block and macro level search space to enable a flexible and efficient search. Then, by estimating fine-tuning performance via an adaptive model by accumulating experience from the past tasks, an online schedule generator is proposed to pick up the most suitable model and generate a personalized training regime with respect to each desired task in a one-shot fashion. The resulting model zoo 1 is more training efficient than SOTA NAS models, e.g. 6x faster than RegNetY-16GF, and 1.7x faster than EfficientNetB3. Experiments on multiple datasets also show that NASOA achieves much better fine-tuning results, i.e. improving around 2.1% accuracy than the best performance in RegNet series under various time constraints and tasks; 40x faster compared to the BOHB method.

1. INTRODUCTION

Fine-tuning using pre-trained models becomes the de-facto standard in the field of computer vision because of its impressive results on various downstream tasks such as fine-grained image classification (Nilsback & Zisserman, 2008; Welinder et al., 2010) , object detection (He et al., 2019; Jiang et al., 2018; Xu et al., 2019) and segmentation (Chen et al., 2017; Liu et al., 2019) . Kornblith et al. (2019) ; He et al. (2019) verified that fine-tuning pre-trained networks outperform training from scratch. It can further help to avoid over-fitting (Cui et al., 2018) as well as reduce training time significantly (He et al., 2019) . Due to those merits, many cloud computing and AutoML pipelines provide fine-tuning services for an online stream of upcoming users with new data, different tasks and time limits. In order to save the user's time, money, energy consumption, or even CO2 emission, an efficient online automated fine-tuning framework is practically useful and in great demand. Thus, in this work, we propose to explore the problem of faster online fine-tuning. The conventional practice of fine-tuning is to adopt a set of predefined hyperparameters for training a predefined model (Li et al., 2020) . It has three drawbacks in the current online setting: 1) The design of the backbone model is not optimized for the upcoming fine-tuning task and the selection of the backbone model is not data-specific. 2) A default setting of hyperparameters may not be optimal across tasks and the training settings may not meet the time constraints provided by users. 3) With the incoming tasks, the regular diagram is not suitable for this online setting since it cannot memorize and accumulate experience from the past fine-tuning tasks. Thus, we propose to decouple our faster fine-tuning problem into two parts: finding efficient fine-tuning networks and generating optimal fine-tuning schedules pertinent to specific time constraints in an online learning fashion. Recently, Neural Architecture Search (NAS) algorithms demonstrate promising results on discovering top-accuracy architectures, which surpass the performance of hand-crafted networks and saves human's efforts (Zoph et al., 2018; Liu et al., 2018a; b; Radosavovic et al., 2019; Tan et al., 2019b; Real et al., 2019a; Tan & Le, 2019; Yao et al., 2020) . However, those NAS works usually focus on inference time/FLOPS optimization and their search space is not flexible enough which cannot guarantee the optimality for fast fine-tuning. In contrast, we resort to developing a NAS scheme with a novel flexible search space for fast fine-tuning. On the other hand, hyperparameter optimization (HPO) methods such as grid search (Bergstra & Bengio, 2012) , Bayesian optimization (BO) (Strubell et al., 2019a; Mendoza et al., 2016), and BOHB (Falkner et al., 2018) are used in deep learning and achieve good performance. However, those search-based methods are computationally expensive and require iterative "trial and error", which violate our goal for faster adaptation time. In this work, we propose a novel Neural Architecture Search and Online Adaption framework named NASOA. First, we conduct an offline NAS for generating an efficient fine-tuning model zoo. We design a novel block-level and macro-structure search space to allow a flexible choice of the networks. Once the efficient training model zoo is created offline NAS by Pareto optimal models, the online user can enjoy the benefit of those efficient training networks without any marginal cost. We then propose an online learning algorithm with an adaptive predictor to modeling the relation between different hyperparameter, model, dataset meta-info and the final fine-tuning performance. The final training schedule is generated directly from selecting the fine-tuning regime with the best predicted performance. Benefiting from the experience accumulation via online learning, the diversity of the data and the increasing results can further continuously improve our regime generator. Our method behaves in a one-shot fashion and doesn't involve additional searching cost as HPO, endowing the capability of providing various training regimes under different time constraints. Extensive experiments are conducted on multiple widely used fine-tuning datasets. The searched model zoo ET-NAS is more training efficient than SOTA ImageNet models, e.g. 5x training faster than RegNetY-16GF, and 1.7x faster than EfficientNetB3. Moreover, by using the whole NASOA, our online algorithm achieves superior fine-tuning results in terms of both accuracy and fine-tuning speed, i.e. improving around 2.1% accuracy than the best performance in RegNet series under various tasks; saving 40x computational cost comparing to the BOHB method. In summary, our contributions are summarized as follows: • To the best of our knowledge, we make the first effort to propose a faster fine-tuning pipeline that seamlessly combines the training-efficient NAS and online adaption algorithm. Our NASOA can effectively generate a personalized fine-tuning schedule of each desired task via an adaptive model for accumulating experience from the past tasks. • The proposed novel joint block/macro level search space enables a flexible and efficient search. The resulting model zoo ET-NAS is more training efficient than very strong ImageNet SOTA models e.g. EfficientNet, RegNet. All the ET-NAS models have been released to help the community skipping the computation-heavy NAS stage and directly enjoy the benefit of NASOA. • The whole NASOA pipeline achieves much better fine-tuning results in terms of both accuracy and fine-tuning efficiency than current fine-tuning best practice and HPO method,e.g. BOHB.

2. RELATED WORK

Neural Architecture Search (NAS). The goal of NAS is to automatically optimize network architecture and release human effort from this handcraft network architecture engineering. Most previous works (Liu et al., 2018b; Cai et al., 2019b; Liu et al., 2018a; Tan et al., 2019a; Xie et al., 2019; Howard et al., 2019) Generating Hyperparameters for Fine-tuning. HPO methods such as Bayesian optimization (BO) (Strubell et al., 2019a; Mendoza et al., 2016) , BOHB (Falkner et al., 2018) achieves very promising result but require a lot of computational resources which is contradictory to our original



The efficient training model zoo (ET-NAS) has been released at: https://github.com/NAS-OA/ NASOA



aim at searching for CNN architectures with better inference and fewer FLOPS. Baker et al. (2017); Cai et al. (2018); Zhong et al. (2018) apply reinforcement learning to train an RNN controller to generate a cell architecture. Liu et al. (2018b); Xie et al. (2019); Cai et al. (2019b) try to search a cell structure by weight-sharing and differentiable optimization. Tan & Le (2019) use a grid search for an efficient network by altering the depth/width of the network with a fixed block structure. On the contrary, our NAS focuses creating an efficient training model zoo for fast fine-tuning. Moreover, the existing search space design cannot meet the purpose of our search.

