DIFFERENTIABLE GRAPH OPTIMIZATION FOR NEU-RAL ARCHITECTURE SEARCH

Abstract

In this paper, we propose Graph Optimized Neural Architecture Learning (GOAL), a novel gradient-based method for Neural Architecture Search (NAS), to find better architectures with fewer evaluated samples. Popular NAS methods usually employ black-box optimization based approaches like reinforcement learning, evolution algorithm or Bayesian optimization, which may be inefficient when having huge combinatorial NAS search spaces. In contrast, we aim to explicitly model the NAS search space as graphs, and then perform gradient-based optimization to learn graph structure with efficient exploitation. To this end, we learn a differentiable graph neural network as a surrogate model to rank candidate architectures, which enable us to obtain gradient w.r.t the input architectures. To cope with the difficulty in gradient-based optimization on the discrete graph structures, we propose to leverage proximal gradient descent to find potentially better architectures. Our empirical results show that GOAL outperforms mainstream black-box methods on existing NAS benchmarks in terms of search efficiency.

1. INTRODUCTION

Neural Architecture Search (NAS) methods achieve great success and outperform hand-crafted models in many deep learning applications, such as image recognition, object detection and natural language processing (Zoph et al., 2017; Liu et al., 2019; Ghiasi et al., 2019; Chen et al., 2020) . Due to the expensive cost of training-evaluating a neural architecture, the key challenge of NAS is to explore possible good candidates effectively. To cope with this challenge, various methods have been proposed, such as reinforcement learning (RL), evolution algorithm (EA), Bayesian optimization (BO) and weight-sharing strategy (WS), to perform efficient search (Zoph & Le, 2016; Real et al., 2019; Hutter et al., 2011; Liu et al., 2019; Guo et al., 2019) . While the weight-sharing strategy improves overall efficiency by reusing trained weights to reduce the total training cost, zeroth-order algorithms like RL, EA and BO employ black-box optimization, with the goal of finding optimal solutions with fewer samples. However, the search space of NAS is exponentially growing with the increasing number of choices. As a result, such huge combinatorial search spaces lead to insufficient exploitation of black-box learning framework (Luo et al., 2018) . Another line of research has been focused on formulating the NAS search space as graph structures, typically directed acyclic graphs (DAGs), and then the search target is cast as choosing an optimal combination of the nodes and edges in the graph structure (Pham et al., 2018; Liu et al., 2019; Xie et al., 2019) . However, existing methods tend to perform the optimization in the indirect manner using black-box optimization. In contrast, we aim to explicitly model the search space as graphs and optimize graph structures directly. We thus propose Graph Optimized Neural Architecture Learning (GOAL), a novel NAS approach combined with graph learning for efficient exploitation, as briefly shown in Fig. 1 . Unlike other black-box approaches, we use a differentiable surrogate model to directly optimize the graph structures. The surrogate model takes a graph structure corresponds to a neural architecture as input, and predicts a relative ranking score as the searching signal. We then apply gradient descent on the input graph structure to optimize the corresponding architecture, which attempts to obtain a better predicted ranking score. As we optimize the surrogate model and the architectures iteratively, the optimal architectures could be typically obtained after a few iterations. In particular, to cope with the difficulty of using gradient-based optimization on the discrete graph structure, we adapt the proximal algorithm for allowing us to optimize discrete variables in a 1

