DARTS-: ROBUSTLY STEPPING OUT OF PERFOR-MANCE COLLAPSE WITHOUT INDICATORS

Abstract

Despite the fast development of differentiable architecture search (DARTS), it suffers from long-standing performance instability, which extremely limits its application. Existing robustifying methods draw clues from the resulting deteriorated behavior instead of finding out its causing factor. Various indicators such as Hessian eigenvalues are proposed as a signal to stop searching before the performance collapses. However, these indicator-based methods tend to easily reject good architectures if the thresholds are inappropriately set, let alone the searching is intrinsically noisy. In this paper, we undertake a more subtle and direct approach to resolve the collapse. We first demonstrate that skip connections have a clear advantage over other candidate operations, where it can easily recover from a disadvantageous state and become dominant. We conjecture that this privilege is causing degenerated performance. Therefore, we propose to factor out this benefit with an auxiliary skip connection, ensuring a fairer competition for all operations. We call this approach DARTS-. Extensive experiments on various datasets verify that it can substantially improve robustness.

1. INTRODUCTION

Recent studies (Zela et al., 2020; Liang et al., 2019; Chu et al., 2020b) have shown that one critical issue for differentiable architecture search (Liu et al., 2019b) regarding the performance collapse due to superfluous skip connections. Accordingly, some empirical indicators for detecting the occurrence of collapse have been produced. R-DARTS (Zela et al., 2020) shows that the loss landscape has more curvatures (characterized by higher Hessian eigenvalues w.r.t. architectural weights) when the derived architecture generalizes poorly. By regularizing for a lower Hessian eigenvalue, Zela et al. ( 2020); Chen & Hsieh (2020) attempt to stabilize the search process. Meanwhile, by directly constraining the number of skip connections to a fixed number (typically 2), the collapse issue becomes less pronounced (Chen et al., 2019b; Liang et al., 2019) . These indicator-based approaches have several main drawbacks. Firstly, robustness relies heavily on the quality of the indicator. An imprecise indicator either inevitably accepts poor models or mistakenly reject good ones. Secondly, indicators impose strong priors by directly manipulating the inferred model, which is somewhat suspicious, akin to touching the test set. Thirdly, extra computing cost (Zela et al., 2020) or careful tuning of hyper-parameters (Chen et al., 2019b; Liang et al., 2019) are required. Therefore, it's natural to ask the following questions: • Can we resolve the collapse without handcrafted indicators and restrictions to interfere with the searching and/or discretization procedure? • Is it possible to achieve robustness in DARTS without tuning extra hyper-parameters? Fair DARTS (Chu et al., 2020b) argues that the collapse results from the unfair advantage in an exclusive competitive environment, from which skip connections overly benefit to cause an abundant aggregation. To suppress such an advantage from overshooting, they convert the competition into collaboration where each operation is independent of others. It is however an indirect approach. SGAS (Li et al., 2020) , instead, circumvents the problem with a greedy strategy where the unfair advantage can be prevented from taking effect. Nevertheless, potentially good operations might be pruned out too early because of greedy underestimation.



We name it so as we undertake an inward way, as opposed to those outward ones who design new indicators, add extra cost and introduce new hyper-parameters.



Strong Robustness and Stabilization. We conduct thorough experiments across seven search spaces and three datasets to demonstrate the effectiveness of our method. Specifically, our approach robustly obtains state-of-the-art results on 4 search space with 3× fewer search cost than R-DARTS(Zela et al., 2020), which requires four independent runs to report the final performance. Among many proposed approaches, Differentiable Architecture Search(Liu et al., 2019b)  features weight-sharing and resolves the searching problem via gradient descent, which is very efficient and easy to generalize. A short description of DARTS can be found in A.1. Since then, many subsequent works have been dedicated to accelerating the process (Dong & Yang, 2019b), reducing memory cost(Xu et al., 2020), or fostering its ability such as hardware-awareness(Cai et al., 2019; Wu et al.,  2019), finer granularity(Mei et al., 2020)  and so on. However, regardless of these endeavors, a fundamental issue of DARTS over its searching performance collapse remains not properly solved, which extremely prohibits its application.Robustifying DARTS. As DARTS(Liu et al., 2019b) is known to be unstable as a result of performance collapse(Chu et al., 2020b), some recent works have devoted to resolving it by either designing indicators like Hessian eigenvalues for the collapse(Zela et al., 2020)  or adding perturbations to regularize such an indicator(Chen & Hsieh, 2020). Both methods rely heavily on the indicator's accurateness, i.e., to what extent does the indicator correlate with the performance collapse? Other methods like Progressive DARTS(Chen et al., 2019b) and DARTS+ (Liang et al.,  2019)  employ a strong human prior, i.e., limiting the number of skip-connections to be a fixed value.

availability

Our code is available at https://github.com/Meituan-AutoML/

