TOWARDS DISCOVERING NEURAL ARCHITECTURES FROM SCRATCH

Abstract

The discovery of neural architectures from scratch is the long-standing goal of Neural Architecture Search (NAS). Searching over a wide spectrum of neural architectures can facilitate the discovery of previously unconsidered but wellperforming architectures. In this work, we take a large step towards discovering neural architectures from scratch by expressing architectures algebraically. This algebraic view leads to a more general method for designing search spaces, which allows us to compactly represent search spaces that are 100s of orders of magnitude larger than common spaces from the literature. Further, we propose a Bayesian Optimization strategy to efficiently search over such huge spaces, and demonstrate empirically that both our search space design and our search strategy can be superior to existing baselines. We open source our algebraic NAS approach and provide APIs for PyTorch and TensorFlow.

1. INTRODUCTION

Neural Architecture Search (NAS), a field with over 1 000 papers in the last two years (Deng & Lindauer, 2022) , is widely touted to automatically discover novel, well-performing architectural patterns. However, while state-of-the-art performance has already been demonstrated in hundreds of NAS papers (prominently, e.g., (Tan & Le, 2019; 2021; Liu et al., 2019a) ), success in automatically finding truly novel architectural patterns has been very scarce (Ramachandran et al., 2017; Liu et al., 2020) . For example, novel architectures, such as transformers (Vaswani et al., 2017; Dosovitskiy et al., 2021) have been crafted manually and were not found by NAS. There is an accumulating amount of evidence that over-engineered, restrictive search spaces (e.g., cell-based ones) are major impediments for NAS to discover truly novel architectures. Yang et al. (2020b) showed that in the DARTS search space (Liu et al., 2019b) the manually-defined macro architecture is more important than the searched cells, while Xie et al. ( 2019) and Ru et al. ( 2020) achieved competitive performance with randomly wired neural architectures that do not adhere to common search space limitations. As a result, there are increasing efforts to break these impediments, and the discovery of novel neural architectures has been referred to as the holy grail of NAS. Hierarchical search spaces are a promising step towards this holy grail. In an initial work, Liu et al. ( 2018) proposed a hierarchical cell, which is shared across a fixed macro architecture, imitating the compositional neural architecture design pattern widely used by human experts. However, subsequent works showed the importance of both layer diversity (Tan & Le, 2019) and macro architecture (Xie et al., 2019; Ru et al., 2020) . In this work, we introduce a general formalism for the representation of hierarchical search spaces, allowing both for layer diversity and a flexible macro architecture. The key observation is that any neural architecture can be represented algebraically; e.g., two residual blocks followed by a fullyconnected layer in a linear macro topology can be represented as the algebraic term ω = Linear(Residual(conv, id, conv), Residual(conv, id, conv), fc) . We build upon this observation and employ Context-Free Grammars (CFGs) to construct large spaces of such algebraic architecture terms. Although a particular search space is of course limited in its overall expressiveness, with this approach, we could effectively represent any neural architecture, facilitating the discovery of truly novel ones. 1

