DIFFAUTOML: DIFFERENTIABLE JOINT OPTIMIZA-TION FOR EFFICIENT END-TO-END AUTOMATED MA-CHINE LEARNING

Abstract

The automated machine learning (AutoML) pipeline comprises several crucial components such as automated data augmentation (DA), neural architecture search (NAS) and hyper-parameter optimization (HPO). Although many strategies have been developed for automating each component in separation, joint optimization of these components remains challenging due to the largely increased search dimension and different input types required for each component. While conducting these components in sequence is usually adopted as a workaround, it often requires careful coordination by human experts and may lead to sub-optimal results. In parallel to this, the common practice of searching for the optimal architecture first and then retraining it before deployment in NAS often suffers from architecture performance difference in the search and retraining stages. An end-to-end solution that integrates the two stages and returns a ready-to-use model at the end of the search is desirable. In view of these, we propose a differentiable joint optimization solution for efficient end-to-end AutoML (DiffAutoML). Our method performs co-optimization of the neural architectures, training hyper-parameters and data augmentation policies in an end-to-end fashion without the need of model retraining. Experiments show that DiffAutoML achieves state-of-the-art results on ImageNet compared with end-to-end AutoML algorithms, and achieves superior performance compared with multi-stage AutoML algorithms with higher computational efficiency. To the best of our knowledge, we are the first to jointly optimize automated DA, NAS and HPO in an en-to-end manner without retraining.

1. INTRODUCTION

While deep learning has achieved remarkable progress in various tasks such as computer vision and natural language processing, it usually requires tremendous human involvement to design and train a satisfactory deep model for one task (He et al., 2016; Sandler et al., 2018) . To alleviate such burden on human users, a dozen of AutoML algorithms are proposed in recent years to enable training a model from data automatically without human experiences, including automated data augmentation (DA), neural architecture search (NAS), and hyper-parameter optimization (HPO) (e.g., Chen et al., 2019; Cubuk et al., 2018; Mittal et al., 2020) . These AutoML components are usually developed independently. However, implementing these AutoML components for a specific task in separate stages not only suffers from low efficiency but also leads to sub-optimal results (Dai et al., 2020; Dong et al., 2020) . How to achieve full-pipeline "from data to model" automation efficiently and effectively is still a challenging problem. One main difficulty for achieving automated "from data to model" is how to combine different Au-toML components (e.g., NAS and HPO) appropriately for a specific task. Optimizing these components in a joint manner is an intuitive solution but usually suffers from the enormous and impractical search space. Dai et al. (2020) and Wang et al. (2020) introduced pre-trained predictors to achieve the joint optimization of NAS and HPO, and the joint optimization of NAS and automated model compression, respectively. For a new coming task, however, it is usually burdensome to pre-train such a predictor. On the other hand, Dong et al. ( 2020) investigated the joint optimization between NAS and HPO via differentiable architecture and hyper-parameter search spaces. Automated DA is seldom considered in the joint optimization of AutoML components. Nevertheless, our experimental

