DARTS-: ROBUSTLY STEPPING OUT OF PERFOR-MANCE COLLAPSE WITHOUT INDICATORS

Abstract

Despite the fast development of differentiable architecture search (DARTS), it suffers from long-standing performance instability, which extremely limits its application. Existing robustifying methods draw clues from the resulting deteriorated behavior instead of finding out its causing factor. Various indicators such as Hessian eigenvalues are proposed as a signal to stop searching before the performance collapses. However, these indicator-based methods tend to easily reject good architectures if the thresholds are inappropriately set, let alone the searching is intrinsically noisy. In this paper, we undertake a more subtle and direct approach to resolve the collapse. We first demonstrate that skip connections have a clear advantage over other candidate operations, where it can easily recover from a disadvantageous state and become dominant. We conjecture that this privilege is causing degenerated performance. Therefore, we propose to factor out this benefit with an auxiliary skip connection, ensuring a fairer competition for all operations. We call this approach DARTS-. Extensive experiments on various datasets verify that it can substantially improve robustness.

1. INTRODUCTION

Recent studies (Zela et al., 2020; Liang et al., 2019; Chu et al., 2020b) have shown that one critical issue for differentiable architecture search (Liu et al., 2019b) regarding the performance collapse due to superfluous skip connections. Accordingly, some empirical indicators for detecting the occurrence of collapse have been produced. R-DARTS (Zela et al., 2020) shows that the loss landscape has more curvatures (characterized by higher Hessian eigenvalues w.r.t. architectural weights) when the derived architecture generalizes poorly. By regularizing for a lower Hessian eigenvalue, Zela et al. ( 2020); Chen & Hsieh (2020) attempt to stabilize the search process. Meanwhile, by directly constraining the number of skip connections to a fixed number (typically 2), the collapse issue becomes less pronounced (Chen et al., 2019b; Liang et al., 2019) . These indicator-based approaches have several main drawbacks. Firstly, robustness relies heavily on the quality of the indicator. An imprecise indicator either inevitably accepts poor models or mistakenly reject good ones. Secondly, indicators impose strong priors by directly manipulating the inferred model, which is somewhat suspicious, akin to touching the test set. Thirdly, extra computing cost (Zela et al., 2020) or careful tuning of hyper-parameters (Chen et al., 2019b; Liang et al., 2019) are required. Therefore, it's natural to ask the following questions: • Can we resolve the collapse without handcrafted indicators and restrictions to interfere with the searching and/or discretization procedure? • Is it possible to achieve robustness in DARTS without tuning extra hyper-parameters? * Work done as an intern at Meituan Inc. † Correspondent Author.

availability

Our code is available at https://github.com/Meituan-AutoML/

