PERFORMANCE PREDICTION VIA UNSUPERVISED DO-MAIN ADAPTATION FOR ARCHITECTURE SEARCH

Abstract

Performance predictors can directly predict the performance value of given neural architectures without training, thus broadly being studied to alleviate the prohibitive cost of Neural Architecture Search (NAS). However, existing performance predictors still require training a large number of architectures from scratch to get their performance labels as the training dataset, which is still computationally expensive. To solve this issue, we develop an performance predictor by applying the unsupervised domain adaptation technique called USPP, which can avoid costly dataset construction by using existing fully-trained architectures. Specifically, a progressive domain-invariant feature extraction method is proposed to assist in extracting domain-invariant features due to the great transferability challenge caused by the rich domain-specific features. Furthermore, a learnable representation (denoted as operation embedding) is designed to replace the fixed encoding of the operations to transfer more knowledge about operations to the target search space. In experiments, we train the predictor by the labeled architectures in NAS-Bench-101 and predict the architectures in the DARTS search space. Compared with other state-of-the-art NAS methods, the proposed USPP only costs 0.02 GPU days but finds the architecture with 97.86% on CIFAR-10 and 76.50% top-1 accuracy on ImageNet.

1. INTRODUCTION

Neural Architecture Search (NAS) (Elsken et al., 2019) aims to automatically design highperformance neural architectures and has been a popular research field of machine learning. In recent years, the architectures searched by NAS have outperformed manually-designed architectures in many fields (Howard et al., 2019; Real et al., 2019) . However, NAS generally requires massive computation resources to estimate the performance of architectures obtained during the search process (Real et al., 2019; Zoph et al., 2018) . In practice, this is unaffordable for most researchers interested. As a result, how to speed up the estimation of neural architectures has become a hot topic among the NAS community. Performance predictor (Wen et al., 2020 ) is a popular accelerated method for NAS. It can directly predict the performance of neural architectures without training, thus greatly accelerating the NAS process. A large number of related works are carried out because of its superiority in reducing the costs of NAS. For example, E2EPP (Sun et al., 2019) adopted a random forest (Breiman, 2001) as the regression model to effectively find promising architectures. ReNAS (Xu et al., 2021 ) used a simple LeNet-5 network (LeCun et al., 1998) as the regression model, and creatively employed a ranking-based loss function to train the predictor, thus improving the prediction ability of the performance predictor. Although existing performance predictors gain huge success in improving the efficiency of NAS, sufficient architectures need to be sampled from the target search space and be fully trained to obtain their performance value as the label (Wen et al., 2020) . The performance predictor is trained by these labeled architectures, and then is used to predict the performance of architectures. In order to ensure the prediction accuracy of the performance predictor, it is usually necessary to train at least hundreds of architectures as the dataset, which is a huge cost. In recent years, many benchmark datasets such as NAS-Bench-101 (Ying et al., 2019) , NAS-Bench-201 (Dong & Yang, 2020) , NAS-Bench-NLP (Klyuchnikov et al., 2020) are released for promoting the research on NAS. There are a large number of architecture pairs (i.e., the architecture and its performance) in these datasets. As a result, we are motivated to utilize the rich architecture knowledge in these datasets to predict the architectures in the target search space (i.e., the search space in which the architecture needs to be predicted.). In this way, we can avoid training a large number of architectures in the target search space, thereby alleviating the expensive cost of building the dataset for performance predictors. However, the search space designed in the benchmark datasets is very different from the real-world search spaces. The performance predictor trained on existing labeled architectures cannot be applied to the target search space. In this paper, we proposed an UnSupervised domain adaptation-based Performance Predictor (USPP) with the usage of the domain adaptation technique. Different from the traditional performance predictors that need the training data and the predicted data in the same search space, USPP can leverage the labeled architectures in existing benchmark datasets (e.g., NAS-Bench-101 (Ying et al., 2019) ) to build a powerful performance predictor for the target search space (e.g., the DARTS search space (Liu et al., 2018b) ). As a result, USPP can avoid expensive data collection for the target search space. Specifically, the contributions can be summarized as follows: • A progressive domain-invariant feature extraction method is proposed to reduce the transfer difficulty caused by the huge difference between source and target search spaces. The progressive method explicitly models the domain-specific features and gradually separates them from the domain-invariant features, thus assisting in the alignment of the source and target search spaces. • A learnable representation for the operations in architectures, i.e., operation embedding, is designed to transfer more knowledge about operations to the target search space. Compared to the widely used fixed encoding method, the operation embedding can effectively capture the inner meaning and structural role of each operation in the source search space and applied them in the target search space to reduce transfer difficulty. • USPP only costs 0.02 GPU days to search for the architectures in the DARTS search space because there is no need to annotate the architectures in the target search space. Furthermore, the searched architecture by USPP achieves 97.86% classification accuracy on CIFAR-10 and 76.50% classification accuracy on ImageNet and outperforms all the stateof-the-art methods compared.

2. RELATED WORK

2.1 NAS AND PERFORMANCE PREDICTORS NAS can automatically design high-performance neural architecture and consists of search space, search strategy, and performance estimation strategy (Elsken et al., 2019) . Specifically, the search space defines the collections of the candidate architectures. The search strategy corresponds to the employed optimization algorithms for the search, which can be mainly classified into evolutionary algorithms (Bäck et al., 1997 ), reinforcement learning (Kaelbling et al., 1996) and gradient descent (Liu et al., 2018b) . The performance estimation strategy defines how to evaluate the architectures to obtain their performance. During the search, the NAS algorithms use the search strategy to search architecture in the predefined search space and obtain the performance value of the searched architectures by the performance estimation strategy. No matter what search strategy is used, a lot of neural architectures need to be estimated. Because of the heavy cost of the traditional GPU-based estimation method, many accelerated methods are proposed such as early stopping policy (Sun et al., 2018) , proxy dataset (Sapra & Pimentel, 2020), weight-sharing method (Bender et al., 2018) . However, the first two methods may lead to poor generalization and low-fidelity approximation of performance value, and the weight-sharing method may be unreliable in predicting the relative ranking among architectures (Li et al.; Yang et al.) , which disobeys the goal of finding the architecture with the highest ranking. Performance predictor is free from the aforementioned shortcomings and has received great attention in recent years. However, existing predictors have the constraint that the source search space and the target search space must be the same. The proposed USPP breaks the limitation and creatively uses the existing labeled architectures in the source search space to predict the architectures in the target search space, thus removing the reliance on potentially costly labels in the target search space.

