IMPROVING RANDOM-SAMPLING NEURAL ARCHI-TECTURE SEARCH BY EVOLVING THE PROXY SEARCH SPACE Anonymous authors Paper under double-blind review

Abstract

Random-sampling Neural Architecture Search (RandomNAS) has recently become a prevailing NAS approach because of its search efficiency and simplicity. There are two main steps in RandomNAS: the training step that randomly samples the weight-sharing architectures from a supernet and iteratively updates their weights, and the search step that ranks architectures by their respective validation performance. Key to both steps is the assumption of a high correlation between estimated performance(i.e., accuracy) for weight-sharing architectures and their respective achievable accuracy (i.e., ground truth) when trained from scratch. We examine such a phenomenon via NASBench-201, whose ground truth is known for its entire NAS search space. We observe that existing RandomNAS can rank a set of architectures uniformly sampled from the entire global search space(GS), that correlates well with its ground-truth ranking. However, if we only focus on the top-performing architectures (such as top 20% according to the ground truth) in the GS, such a correlation drops dramatically. This raises the question of whether we can find an effective proxy search space (PS) that is only a small subset of GS to dramatically improve RandomNAS's search efficiency while at the same time keeping a good correlation for the top-performing architectures. This paper proposes a new RandomNAS-based approach called EPS (Evolving the Proxy Search Space) to address this problem. We show that, when applied to NASBench-201, EPS can achieve near-optimal NAS performance and surpass all existing state-of-the-art. When applied to different-variants of DARTS-like search spaces for tasks such as image classification and natural language processing, EPS is able to robustly achieve superior performance with shorter or similar search time compared to some leading NAS works.

1. INTRODUCTION

Neural architecture search (NAS) has been successfully utilized to discover novel DNN architectures in complex search spaces and outperformed human-crafted designs. Early NAS works like NASNet (Zoph et al. (2018) ) and AmoebaNet (Real et al. (2019) ) used reinforcement learning or evolutionary algorithms to search for the DNN architectures by training a substantial amount of independent network architectures from scratch. Although these searched architectures can deliver high accuracy, they come with tremendous computation and time costs. Therefore, researchers gradually shift their focuses to one-shot NAS, which is more efficient and can deliver satisfying outputs within a few GPU-days. There are two main types of one-shot NAS. One is the differentiable NAS (DNAS), such as Liu et al. (2) Search phase: after supernet training, the desired architectures are selected based on their performance ranking on the validation dataset using inherited weights from the supernet, which is called weight-sharing performance. Finally, the selected architectures will be retrained from scratch to get their actual (retrained) performance for deployment. Compared to DNAS, RandomNAS usually consumes less GPU memories by partially updating the weights. Also, it generates multiple target architectures, while DNAS generally retrieves a single architecture based on the maxima of the representation distribution. There are, however, two major drawbacks preventing RandomNAS from achieving higher search efficiency. First, although RandomNAS achieves a promising ranking correlation between the weight-sharing estimation and the retrained performance over all architecture candidates, it delivers a low ranking correlation among "good" architectures (e.g., top-20%-performing architectures among the search space) that researchers are more interested in. Second, by following the RandomNAS approach, smaller network architectures (with less parameters) tend to converge faster than the larger ones, which significantly degrades the ranking correlation. To address the drawbacks of RandomNAS, we first introduce a proxy search space(PS): a subset of the architectures flexibly sampled from the global search space(GS) to study the features of Ran-domNAS. We then evaluate the RandomNAS with the proposed PS using the NASBench-201 benchmark (Dong & Yang (2020)) and notice two interesting phenomena: (1) When uniformly sampling a PS from a global search space, RandomNAS in the PS will maintain a similar ranking correlation with the one in the GS, even the size of the PS is extremely small (e.g. 16 architectures). ( 2) If the PS consists of the "good" architectures, the PS-based search can significantly improve the ranking correlation among the PS compared to the RandomNAS trained in GS and validated in PS. Based on these observations, the PS constructed from "good" architectures can help overcome the first drawback of RandomNAS to search for more promising architectures. So, the remaining question is how to find a suitable PS containing sufficient "better" architectures? In this paper, we consider it a natural selection problem and solve it by the evolutionary algorithm. The architectures in the initial PS are iteratively evolved and gradually upgraded, while the average ranking of the architectures in the PS is improved. Meanwhile, it also helps improve the ranking correlation of the PS. We propose a new RandomNAS approach, named Evolving the Proxy Search Space (EPS). EPS runs in three stages iteratively: Training the supernet by randomly sampling from a PS; Validating the architectures among the PS on a subset of the validation dataset in the training interval; Evolving the PS by a tournament selection evolutionary algorithm with the aging mechanism. In this way, EPS gradually includes more high quality architectures in the proxy search space, while improves its ranking correlation. To solve the second issue in which smaller architectures converge faster than larger ones, we introduce a simple model-size-based regularization in the final selection stage. Our result on NASBench-201 shows 17.2% improvement in ranking correlation measured by Spearman's ρ by adding the regularization on NASBench-201. In the experiments, we demonstrate that EPS delivers a near-optimal performance on NASBench-201. We also extend the EPS on DARTS search space. By using the 5-search-runs measurement, EPS demonstrates a robust search ability compared with the previous works in 8 hours search time with little hyper-parameters fine-tuning effort. Also, EPS is evaluated in 4 DARTS sub search space (Zela et al. ( 2019)) using 3 datasets, on which DARTS easily fails. EPS surpasses the DARTS-ADA and DARTS-ES on most cases and can often find the global state-of-the-art architectures. EPS also shows a high performance on a language modeling task which consolidates the generalization ability and robustness of EPS.

2. RANDOMNAS ON NASBENCH-201

In this section, we present two major drawbacks we found in the existing RandomNAS methods and investigate the ranking correlation using a proxy search space. Proxy search space (PS) is a subset of the global search space (GS) and the RandomNAS is more flexible to be analyzed by training in the PS including different architectures from the GS. We run the following experiments on NASBench-201, which is a unified and fair benchmark designed for NAS algorithm evaluation and contains the ground-truth architecture accuracy on three datasets. The GS of NASBench-201 contains 15,625 architectures and they are constructed by 4 nodes and 5 associated operation options. (Please refer to Fig. 1 in Dong & Yang (2020) for details.)



(2019b); Cai et al. (2018); Xie et al. (2018); Dong & Yang (2019b); Xu et al. (2019); Chen et al. (2019a), which uses a continuous relaxation of the architecture representations and introduces architecture parameters to distinguish the architectures. The other is Random-Sampling NAS (RandomNAS), such as Li & Talwalkar (2019); Chen et al. (2019b); Zhang et al. (2020); Guo et al. (2019); Bender (2019); Yang et al. (2020). RandomNAS approaches typically have two phases: (1) Training phase: in each iteration, RandomNAS randomly samples one architecture or a set of architectures and updates their shared weights in the supernet;

availability

Our code is available at https://github.com/IcLr2020SuBmIsSiOn/EPS 

